State of the art diagnostics

AQUA provides a collection of built-in diagnostics to analyze climate model outputs. The family of diagnostics named state-of-the-art lists diagnostics which can be used for the simulation monitoring and make use of low resolution data as input (1 degree in both latitude and longitude and monthly frequency). Most of these diagnostics can be compared with observations to produce metrics of evaluation and aim at providing an assessment of the model against observational datasets and, in some selected occasions, pre-existing climate simulations.

List of diagnostics

This list includes such diagnostics whose goal is to monitor and diagnose possible model drifts, imbalances and biases.

Currently implemented diagnostics are:

Diagnostics configuration files

Each diagnostic has a corresponding YAML configuration file that specifies the options and parameters for the diagnostic. These configuration files are located in the config/diagnostics/<diagnostic-name> directory of the AQUA package and copied to the AQUA_CONFIG folder during installation (by default $HOME/.aqua/).

Each diagnostic has its own configuration file, with a block devoted to the individual diagnostic settings. However, general settings common to all the diagnostics have a common structure here described. Please refer to the individual diagnostic documentation for the specific settings. See Configuration file for an example of diagnostic specific block.

  • datasets: a list of models to analyse (defined by the catalog, model, exp, source arguments). If the diagnostic can handle multiple datasets, all the models in the list will be processed, otherwise only the first one will be used. For simplicity the default in the repository should refer to only one model.

datasets:
  - catalog: climatedt-phase1
    model: IFS-NEMO
    exp: historical-1990
    source: lra-r100-monthly
    regrid: null
    reader_kwargs: null # it can be a dictionary with reader kwargs

  - catalog: climatedt-phase1
    model: ICON
    exp: historical-1990
    source: lra-r100-monthly
    regrid: null
    reader_kwargs: null # it can be a dictionary with reader kwargs
  • references: a list of reference datasets to use for the analysis. Some diagnostics may not work with multiple references, it is better to specify it in the documentation and in the configuration file.

references:
  - catalog: obs
    model: ERA5
    exp: era5
    source: monthly
    regrid: null
    reader_kwargs: null # it can be a dictionary with reader kwargs
  • output: a block describing the details of the output. It contains:

    • outputdir: the output directory for the plots.

    • rebuild: a boolean that enables the rebuilding of the plots.

    • save_format: a list (or single string) that selects the image formats to save plots. Default is SAVE_FORMAT.

    • dpi: the resolution of the plots.

    • create_catalog_entry: a boolean that enables the creation of a catalog entry.

output:
  outputdir: "/path/to/output"
  rebuild: true
  save_format: ['png', 'svg'] # default is SAVE_FORMAT (['png', 'pdf', 'svg'])
  dpi: 300
  create_catalog_entry: true

Note

Not all the diagnostics support yet the create_catalog_entry keyword.

Diagnostics CLI arguments

The following command line arguments are available for all the diagnostics:

  • --config, -c: Path to the configuration file.

  • --nworkers, -n: Number of workers to use for parallel processing.

  • --cluster: Cluster to use for parallel processing. By default a local cluster is used.

  • --loglevel, -l: Logging level. Default is WARNING.

  • --catalog: Catalog to use for the analysis. It can be defined in the config file.

  • --model: Model to analyse. It can be defined in the config file.

  • --exp: Experiment to analyse. It can be defined in the config file.

  • --source: Source to analyse. It can be defined in the config file.

  • --outputdir: Output directory for the plots.

If a diagnostic has extra arguments, these will be described in the individual diagnostic documentation.

Running the monitoring diagnostics

Each state-of-the-art diagnostic is implemented as a Python class and can be run independently. All the diagnostic have a command line interface that can be used to run them. A YAML configuration file is provided to set the options for the diagnostics.

Together with the individual diagnostics command line interfaces, AQUA provides an entry point to run all the diagnostics in a single command, with a shared Dask cluster, shared output directory and with parallelization. The entry point is called aqua analysis and all the details can be found in AQUA analysis wrapper.

Warning

The analysis has to be performed preferrably on Low Resolution Archive (LRA) data, meaning that data should be aggregated to a resolution of 1 degree in both latitude and longitude and to a monthly frequency. It is available the option to regrid the data on the fly, but the memory usage may be highly increased and it may be preferrable to run the diagnostics individually.

Minimum Data Requirements

In order to obtain meaningful results, the diagnostics require a minimum amount of data. Here you can find the minimum requirements for each diagnostic.

Diagnostic

Minimum Data Required

Global Biases

12 months (1 year)

ecmean

12 months (1 year)

Timeseries

2 months

Seasonal Cycles

2 months

Gregory Plot

2 months

Lat-Lon Profiles

12 months (1 year)

Histogram

12 months (1 year)

Ocean Stratification

12 months (1 year)

Ocean Trends

12 months (1 year)

Ocean Drift (Hovmoller)

2 months

Radiation

2 months

Seaice

1 month (12 months when computing seasonal cycle)

Teleconnections

24 months (2 years)

Note

All diagnostics enforce the minimum data requirement at retrieval time via a NotEnoughDataError. If the available data falls below the threshold, the diagnostic will not run and the error will be reported in the log. Some diagnostics (e.g. Seasonal Cycles) may produce less meaningful results at their minimum threshold — the value reflects the technical lower bound, not the recommended input size.

Note

If you are a developer you can enforce the minimum data requirements by using the months_required argument in the retrieve and _retrieve methods available in the diagnostic core. The conventional way is to define a class-level constant MINIMUM_MONTHS_REQUIRED and pass it to the retrieve call.