State of the art diagnostics
AQUA provides a collection of built-in diagnostics to analyze climate model outputs. The family of diagnostics named state-of-the-art lists diagnostics which can be used for the simulation monitoring and make use of low resolution data as input (1 degree in both latitude and longitude and monthly frequency). Most of these diagnostics can be compared with observations to produce metrics of evaluation and aim at providing an assessment of the model against observational datasets and, in some selected occasions, pre-existing climate simulations.
List of diagnostics
This list includes such diagnostics whose goal is to monitor and diagnose possible model drifts, imbalances and biases.
Currently implemented diagnostics are:
Diagnostics configuration files
Each diagnostic has a corresponding YAML configuration file that specifies the options and parameters for the diagnostic.
These configuration files are located in the config/diagnostics/<diagnostic-name> directory of the AQUA package and copied
to the AQUA_CONFIG folder during installation (by default $HOME/.aqua/).
Each diagnostic has its own configuration file, with a block devoted to the individual diagnostic settings. However, general settings common to all the diagnostics have a common structure here described. Please refer to the individual diagnostic documentation for the specific settings. See Configuration file for an example of diagnostic specific block.
datasets: a list of models to analyse (defined by the catalog, model, exp, source arguments). If the diagnostic can handle multiple datasets, all the models in the list will be processed, otherwise only the first one will be used. For simplicity the default in the repository should refer to only one model.
datasets:
- catalog: climatedt-phase1
model: IFS-NEMO
exp: historical-1990
source: lra-r100-monthly
regrid: null
reader_kwargs: null # it can be a dictionary with reader kwargs
- catalog: climatedt-phase1
model: ICON
exp: historical-1990
source: lra-r100-monthly
regrid: null
reader_kwargs: null # it can be a dictionary with reader kwargs
references: a list of reference datasets to use for the analysis. Some diagnostics may not work with multiple references, it is better to specify it in the documentation and in the configuration file.
references:
- catalog: obs
model: ERA5
exp: era5
source: monthly
regrid: null
reader_kwargs: null # it can be a dictionary with reader kwargs
output: a block describing the details of the output. It contains:outputdir: the output directory for the plots.rebuild: a boolean that enables the rebuilding of the plots.save_format: a list (or single string) that selects the image formats to save plots. Default is SAVE_FORMAT.dpi: the resolution of the plots.create_catalog_entry: a boolean that enables the creation of a catalog entry.
output:
outputdir: "/path/to/output"
rebuild: true
save_format: ['png', 'svg'] # default is SAVE_FORMAT (['png', 'pdf', 'svg'])
dpi: 300
create_catalog_entry: true
Note
Not all the diagnostics support yet the create_catalog_entry keyword.
Diagnostics CLI arguments
The following command line arguments are available for all the diagnostics:
--config,-c: Path to the configuration file.--nworkers,-n: Number of workers to use for parallel processing.--cluster: Cluster to use for parallel processing. By default a local cluster is used.--loglevel,-l: Logging level. Default isWARNING.--catalog: Catalog to use for the analysis. It can be defined in the config file.--model: Model to analyse. It can be defined in the config file.--exp: Experiment to analyse. It can be defined in the config file.--source: Source to analyse. It can be defined in the config file.--outputdir: Output directory for the plots.
If a diagnostic has extra arguments, these will be described in the individual diagnostic documentation.
Running the monitoring diagnostics
Each state-of-the-art diagnostic is implemented as a Python class and can be run independently. All the diagnostic have a command line interface that can be used to run them. A YAML configuration file is provided to set the options for the diagnostics.
Together with the individual diagnostics command line interfaces, AQUA provides an entry point to run all the diagnostics in a single command, with a shared Dask cluster, shared output directory and with parallelization. The entry point is called aqua analysis and all the details can be found in AQUA analysis wrapper.
Warning
The analysis has to be performed preferrably on Low Resolution Archive (LRA) data, meaning that data should be aggregated to a resolution of 1 degree in both latitude and longitude and to a monthly frequency. It is available the option to regrid the data on the fly, but the memory usage may be highly increased and it may be preferrable to run the diagnostics individually.
Minimum Data Requirements
In order to obtain meaningful results, the diagnostics require a minimum amount of data. Here you can find the minimum requirements for each diagnostic.
Diagnostic |
Minimum Data Required |
|---|---|
Global Biases |
12 months (1 year) |
ecmean |
12 months (1 year) |
Timeseries |
2 months |
Seasonal Cycles |
2 months |
Gregory Plot |
2 months |
Lat-Lon Profiles |
12 months (1 year) |
Histogram |
12 months (1 year) |
Ocean Stratification |
12 months (1 year) |
Ocean Trends |
12 months (1 year) |
Ocean Drift (Hovmoller) |
2 months |
Radiation |
2 months |
Seaice |
1 month (12 months when computing seasonal cycle) |
Teleconnections |
24 months (2 years) |
Note
All diagnostics enforce the minimum data requirement at retrieval time via a NotEnoughDataError.
If the available data falls below the threshold, the diagnostic will not run and the error will be
reported in the log. Some diagnostics (e.g. Seasonal Cycles) may produce less meaningful results
at their minimum threshold — the value reflects the technical lower bound, not the recommended input size.
Note
If you are a developer you can enforce the minimum data requirements by using the months_required argument in the retrieve and _retrieve methods
available in the diagnostic core. The conventional way is to define a class-level constant MINIMUM_MONTHS_REQUIRED
and pass it to the retrieve call.