Boxplots Diagnostic

Description

The Boxplots diagnostic computes and visualizes boxplots of spatial field means from climate model datasets, for one or multiple variables, over a specified time period. The diagnostic is designed with a class that analyzes a single model and generates the NetCDF files with the field means, and another class that produces the plots.

Classes

There is one class for the analysis and one for the plotting:

  • Boxplots: retrieves the data and prepares it for plotting (e.g., regridding, unit conversion). It also handles the computation of field means, which are saved as class attributes and as NetCDF files.

  • PlotBoxplots: provides methods for plotting the boxplots of the field means computed by the Boxplots class.

File structure

  • The diagnostic is located in the aqua/diagnostics/boxplots directory, which contains both the source code and the command line interface (CLI) script.

  • A template configuration file is available at aqua/diagnostics/templates/diagnostics/config-boxplots.yaml

  • Notebooks are available in the notebooks/diagnostics/boxplots directory and contain examples of how to use the diagnostic.

Input variables and datasets

The diagnostic can be used with any dataset that contains spatial fields. Multimodel datasets can be analyzed, and the diagnostic can be configured to compare against multiple reference datasets.

Some of the variables that are typically used in this diagnostic are:

  • tnlwrf (top net longwave radiation flux)

  • tnswrf (top net shortwave radiation flux)

  • slhtf (surface latent heat flux)

  • ishf (instantaneous surface sensible heat flux)

The diagnostic is designed to work with data from the Low Resolution Archive (LRA), generated by the Data Reduction OPerator (DROP) of the AQUA project, which provides monthly data at a 1x1 degree resolution.

Note

All analyzed variables should share the same units to ensure meaningful comparisons; otherwise, the diagnostic will raise an error.

Basic usage

The basic usage of this diagnostic is explained with a working example in the notebook. The basic structure of the analysis is the following:

from aqua.diagnostics import Boxplots, PlotBoxplots

variables = ['-tnlwrf', 'tnswrf']

boxplots = Boxplots(model='IFS-NEMO', exp='historical-1990', source='lra-r100-monthly')
boxplots.run(var=variables)

boxplots_era5 = Boxplots(model='ERA5', exp='era5', source='monthly')
boxplots_era5.run(var=variables)

boxplots_ceres = Boxplots(model='CERES', exp='ebaf-toa41', source='monthly', regrid='r100')
boxplots_ceres.run(var=variables)

datasets = boxplots.fldmeans
datasets_ref = [boxplots_ceres.fldmeans, boxplots_era5.fldmeans]

plot = PlotBoxplots(diagnostic='radiation')
plot.plot_boxplots(data=datasets, data_ref=datasets_ref, var=variables)

Start/end dates and reference datasets can be customized. If not specified otherwise, plots will be saved in PNG and PDF format in the current working directory.

CLI usage

The diagnostic can be run from the command line interface (CLI) by running the following command:

cd $AQUA/aqua/diagnostics/boxplots
python cli_boxplots.py --config <path_to_config_file>

Additionally, the CLI can be run with the following optional arguments:

  • --config, -c: Path to the configuration file.

  • --nworkers, -n: Number of workers to use for parallel processing.

  • --cluster: Cluster to use for parallel processing. By default, a local cluster is used.

  • --loglevel, -l: Logging level. Default is WARNING.

  • --catalog: Catalog to use for the analysis. Can be defined in the config file.

  • --model: Model to analyse. Can be defined in the config file.

  • --exp: Experiment to analyse. Can be defined in the config file.

  • --source: Source to analyse. Can be defined in the config file.

  • --outputdir: Output directory for the plots.

  • --startdate: Start date for the analysis.

  • --enddate: End date for the analysis.

Configuration file structure

The configuration file is a YAML file that contains the details on the dataset to analyse or use as reference, the output directory and the diagnostic settings. Most of the settings are common to all the diagnostics (see Diagnostics configuration files). Here we describe only the specific settings for the boxplots diagnostic.

  • boxplots: a block (nested in the diagnostics block) containing options for the Boxplots diagnostic. Variable-specific parameters override the defaults.

    • run: enable/disable the diagnostic.

    • diagnostic_name: name of the diagnostic. boxplots by default, but can be changed when the boxplots CLI is invoked within another recipe diagnostic, as is currently done for Radiation.

    • variables: list of variables to analyse.

diagnostics:
  boxplots:
    run: true
    diagnostic_name: 'radiation_toa'
    variables: ['-tnlwrf', 'tnswrf']
    - vars: ['-tnlwrf', 'tnswrf']
      add_mean_line: true
      anomalies: true
      ref_number: 0 # use ERA5 as reference for anomalies

Output

The diagnostic produces a single plot:

  • A boxplot showing the distribution of the field means for each variable across the specified models and reference datasets. If reference datasets are provided and the anomalies option is set to True, the boxplot will show anomalies with respect to the mean of the selected reference dataset. With the add_mean_line option set to True, dashed lines indicating the absolute mean values will be added to the boxplots. Plots are saved in both PDF and PNG format.

Observations

This diagnostic can be applied to different variables and datasets, although it is currently used primarily for radiation analyses.

The default reference datasets are:

  • ERA5 reanalysis for atmospheric variables

  • CERES EBAF for radiation variables at top of atmosphere

Details are available on the CERES website.

Custom reference datasets can be configured in the configuration file.

Example plots

All plots can be reproduced using the notebooks in the notebooks directory on LUMI HPC.

../_images/radiation_boxplot.png

Box plot showing the globally averaged incoming and outgoing TOA radiation of IFS-NEMO historical-1990 with respect to ERA5 and CERES climatologies.

../_images/radiation_boxplot_anomalies.png

Box plot showing the anomalies of the globally averaged incoming and outgoing TOA radiation of IFS-NEMO historical-1990 with respect to the ERA5 climatology. The dashed lines indicate the absolute mean values.

Available demo notebooks

Notebooks are stored in notebooks/diagnostics/boxplots:

Authors and contributors

This diagnostic is maintained by Silvia Caprioli (@silviacaprioli, silvia.caprioli@polito.it). Contributions are welcome — please open an issue or a pull request. For questions or suggestions, contact the AQUA team or the maintainers.

Detailed API

This section provides a detailed reference for the Application Programming Interface (API) of the Boxplots diagnostic, produced from the diagnostic function docstrings.

class aqua.diagnostics.boxplots.Boxplots(catalog: str = None, model: str = None, exp: str = None, source: str = None, var: str | list[str] = None, startdate: str = None, enddate: str = None, regrid: str = None, diagnostic: str = 'boxplots', save_netcdf: bool = False, outputdir: str = './', loglevel: str = 'WARNING')

Bases: Diagnostic

Class for computing and plotting boxplots of field means from climate model datasets. This class retrieves data from specified datasets, computes field means for given variables, and optionally saves the results to NetCDF files. :param catalog: Catalog name. :type catalog: str :param model: Model name. :type model: str :param exp: Experiment name. :type exp: str :param source: Data source. :type source: str :param var: Variable(s) to retrieve. Defaults to None. :type var: str or list of str, optional :param startdate: Start date for data retrieval. Defaults to None. :type startdate: str, optional :param enddate: End date for data retrieval. Defaults to None. :type enddate: str, optional :param regrid: Target grid for regridding. If None, no regridding. :type regrid: str :param diagnostic: Name of the diagnostic. :type diagnostic: str :param save_netcdf: Whether to save results as NetCDF files. Defaults to False. :type save_netcdf: bool, optional :param outputdir: Directory to save output files. Defaults to ‘./’. :type outputdir: str, optional :param loglevel: Logging level. Defaults to ‘WARNING’. :type loglevel: str, optional

Initialize the diagnostic class. This is a general purpose class that can be used by the diagnostic classes to retrieve data from a single model and to save the data to a netcdf file. It is not a working diagnostic class by itself.

Parameters:
  • model (str) – The model to be used.

  • exp (str) – The experiment to be used.

  • source (str) – The source to be used.

  • catalog (str) – The catalog to be used. If None, the catalog will be determined by the Reader.

  • regrid (str | None) – The target grid to be used for regridding. If None, no regridding will be done.

  • startdate (str | None) – The start date of the plot/analysis period. If None, all available data will be used.

  • enddate (str | None) – The end date of the plot/analysis period. If None, all available data will be used.

  • std_startdate (str | None) – The start date of the standard deviation period. If None, no std period is tracked at the Diagnostic level.

  • std_enddate (str | None) – The end date of the standard deviation period. If None, no std period is tracked at the Diagnostic level.

  • loglevel (str) – The log level to be used. Default is ‘WARNING’.

MINIMUM_MONTHS_REQUIRED = 2
run(var: str = None, save_netcdf: bool = False, units: str = None, reader_kwargs: dict = {}) None

Retrieve and preprocess dataset, selecting pressure level and/or converting units if needed.

Parameters:
  • var (str or list of str, optional) – list of variables to retrieve. If None, uses self.var.

  • save_netcdf (bool, optional) – If True, saves output fldmeans as netcdf file. Defaults to False.

  • units (str or list of str, optional) – Target units (e.g., ‘mm/day’).

  • reader_kwargs (dict, optional) – Additional keyword arguments for the Reader.

Raises:
  • NoDataError – If variable not found in dataset.

  • KeyError – If the variable is missing from the data.

class aqua.diagnostics.boxplots.PlotBoxplots(diagnostic='boxplots', save_format=['png', 'pdf', 'svg'], dpi=300, outputdir='./', loglevel='WARNING')

Bases: object

Initialize the PlotBoxplots class.

Parameters:
  • diagnostic (str) – Name of the diagnostic.

  • save_format (str, list) – Format(s) to save the figure in (e.g. ‘png’, ‘pdf’, ‘svg’).

  • dpi (int) – Resolution of saved figures.

  • outputdir (str) – Output directory for saved plots.

  • loglevel (str) – Logging level.

plot_boxplots(data, data_ref=None, var=None, anomalies=False, add_mean_line=False, ref_number=0, title=None)

Plot boxplots for specified variables in the dataset.

Parameters:
  • data (xarray.Dataset or list of xarray.Dataset) – Input dataset(s) containing the fldmeans of the variables to plot.

  • data_ref (xarray.Dataset or list of xarray.Dataset, optional) – Reference dataset(s) for comparison.

  • var (str or list of str) – Variable name(s) to plot. If None, uses all variables in the dataset.

  • anomalies (bool) – Whether to plot anomalies instead of absolute values.

  • add_mean_line (bool) – Whether to add dashed lines for means.

  • ref_number (int) – Position of reference dataset in data_ref list to use when plotting anomalies.

  • title (str, optional) – Title for the plot. If None, a default title will be generated.