Histogram Diagnostic
Description
The Histogram diagnostic is a set of tools for computing and visualizing histograms or probability density functions (PDFs) of climate variables. It supports comparative analysis between a target dataset (typically a climate model) and a reference dataset, commonly an observational or reanalysis product such as ERA5.
Histogram provides tools to plot:
Raw histograms (counts per bin)
Normalized PDFs (probability density functions)
Multi-model comparisons with reference data overlay
Classes
There is one class for the analysis and one for the plotting:
Histogram: retrieves the data and computes histograms or PDFs over specified regions. It handles latitudinal weighting, bin configuration, and regional selection. Results are saved as class attributes and as NetCDF files.
PlotHistogram: provides methods for plotting histograms and PDFs. It generates plots with optional logarithmic scales, smoothing, and customizable axis limits.
Note
The diagnostic computes histograms over the entire temporal period specified (no seasonal decomposition).
File structure
The diagnostic is located in the
aqua/diagnostics/histogram/directory, which contains both the source code and the command line interface (CLI) script.A template configuration file is available at
aqua/diagnostics/templates/diagnostics/config-histogram.yamlRegion definitions are available in
aqua/diagnostics/config/tools/histogram/definitions/regions.yamlNotebooks are available in the
notebooks/diagnostics/histogram/directory and contain examples of how to use the diagnostic.
Input variables and datasets
The diagnostic works with climate variables on regular latitude-longitude grids: Some of the variables that are typically used in this diagnostic are:
2t(2 metre temperature)tprate(total precipitation rate)sst(sea surface temperature)
It also supports derived variables using EvaluateFormula syntax (e.g., 2t - 273.15 for temperature in °C).
Basic usage
The basic usage of this diagnostic is explained with a working example in the notebook. The basic structure of the analysis is the following:
from aqua.diagnostics import Histogram, PlotHistogram
hist_dataset = Histogram(
catalog='climatedt-phase1',
model='ICON',
exp='historical-1990',
source='lra-r100-monthly',
startdate='1990-01-01',
enddate='1999-12-31',
bins=100,
weighted=True,
loglevel='INFO'
)
hist_obs = Histogram(
catalog='obs',
model='ERA5',
exp='era5',
source='monthly',
startdate='1990-01-01',
enddate='1999-12-31',
bins=100,
weighted=True,
loglevel='INFO'
)
hist_dataset.run(var='tprate', units='mm/day', density=True)
hist_obs.run(var='tprate', units='mm/day', density=True)
plot = PlotHistogram(
data=[hist_dataset.histogram_data],
ref_data=hist_obs.histogram_data,
loglevel='INFO'
)
plot.run(ylogscale=True, xlogscale=False, smooth=False)
Note
Start/end dates and reference dataset can be customized. If not specified otherwise, plots will be saved in PNG and PDF format in the current working directory.
CLI usage
The diagnostic can be run from the command line interface (CLI) by running the following command:
cd $AQUA/aqua/diagnostics/histogram
python cli_histogram.py --config <path_to_config_file>
Additionally, the CLI can be run with the following optional arguments:
--config,-c: Path to the configuration file.--nworkers,-n: Number of workers to use for parallel processing.--cluster: Cluster to use for parallel processing. By default a local cluster is used.--loglevel,-l: Logging level. Default isWARNING.--catalog: Catalog to use for the analysis. Can be defined in the config file.--model: Model to analyse. Can be defined in the config file.--exp: Experiment to analyse. Can be defined in the config file.--source: Source to analyse. Can be defined in the config file.--outputdir: Output directory for the plots.--startdate: Start date for the analysis.--enddate: End date for the analysis.
Configuration file structure
The configuration file is a YAML file that contains the details on the dataset to analyse or use as reference, the output directory and the diagnostic settings. Most of the settings are common to all the diagnostics (see Diagnostics configuration files). Here we describe only the specific settings for the histogram diagnostic.
histogram: a block (nested in thediagnosticsblock) containing options for the Histogram diagnostic. Variable-specific parameters override the defaults.run: enable/disable the diagnostic.diagnostic_name: name of the diagnostic.histogramby default.variables: list of variables to analyse with their regions.formulae: list of formulae to compute new variables from existing ones.bins: number of bins for histogram computation.range: range for histogram bins as [min, max], or null for auto.weighted: use latitudinal weights to account for grid cell area.density: if true, compute probability density function (PDF) instead of counts.box_brd: apply box boundaries for region selection.xlogscale/ylogscale: use logarithmic scale for x/y axes in plots.smooth: apply smoothing to histogram.smooth_window: window size for smoothing.
histogram:
run: true
diagnostic_name: 'histogram'
bins: 100
range: null
weighted: true
density: true
box_brd: true
xlogscale: false
ylogscale: true
smooth: false
smooth_window: 5
variables:
- name: '2t'
regions: [null, 'tropics']
Output
The diagnostic produces the following outputs:
Histogram/PDF line plots
Multi-model comparisons with reference data
Optional smoothing and custom axis limits
Plots are saved in both PDF and PNG format. Data outputs are saved as NetCDF files.
Observations
The default reference dataset is ERA5 reanalysis, provided by ECMWF.
Other common reference datasets include MSWEP (Multi-Source Weighted-Ensemble Precipitation) and BERKELEY-EARTH (Berkeley Earth Surface Temperature).
Custom reference datasets can be configured in the configuration file.
Available demo notebooks
Notebooks are stored in notebooks/diagnostics/histogram:
Detailed API
This section provides a detailed reference for the Application Programming Interface (API) of the histogram diagnostic,
generated from the function docstrings.
- class aqua.diagnostics.histogram.Histogram(model: str, exp: str, source: str, catalog: str = None, regrid: str = None, startdate: str = None, enddate: str = None, region: str = None, lon_limits: list = None, lat_limits: list = None, regions_file_path: str = None, bins: int = 100, range: tuple = None, weighted: bool = True, diagnostic_name: str = 'histogram', loglevel: str = 'WARNING')
Bases:
DiagnosticClass to compute histograms and probability density functions (PDFs) of a variable over a specified region. Retrieves data from catalog, computes histograms/PDFs for the entire period, and saves results to netcdf files.
Initialize the Histogram diagnostic class.
- Parameters:
model (str) – Model to be used for data retrieval.
exp (str) – Experiment to be used for data retrieval.
source (str) – Source to be used for data retrieval.
catalog (str, optional) – Catalog for data retrieval.
regrid (str, optional) – Regridding method.
startdate (str, optional) – Start date of data to retrieve.
enddate (str, optional) – End date of data to retrieve.
region (str, optional) – Region for data retrieval.
lon_limits (list, optional) – Longitude limits of region.
lat_limits (list, optional) – Latitude limits of region.
regions_file_path (str, optional) – Path to regions file.
bins (int, optional) – Number of bins for histogram. Default 100.
range (tuple, optional) – Range for histogram bins (min, max).
weighted (bool, optional) – Use latitudinal weights. Default True.
diagnostic_name (str, optional) – Name of diagnostic. Default ‘histogram’.
loglevel (str, optional) – Log level.
- MINIMUM_MONTHS_REQUIRED = 12
- compute_histogram(box_brd: bool = True, density: bool = True)
Compute histogram of the data for the entire period.
- Parameters:
box_brd (bool) – Include box boundaries in area selection.
density (bool) – If True, returns PDF normalized to integrate to 1.
- retrieve(var: str, formula: bool = False, long_name: str = None, units: str = None, standard_name: str = None, reader_kwargs: dict = {})
Retrieve data for the specified variable using the parent Diagnostic class.
- Parameters:
var (str) – Variable to retrieve.
formula (bool) – Whether to use formula for variable.
long_name (str) – Long name of variable.
units (str) – Units of variable.
standard_name (str) – Standard name of variable.
reader_kwargs (dict) – Additional Reader kwargs.
- run(var: str, formula: bool = False, long_name: str = None, units: str = None, standard_name: str = None, box_brd: bool = True, density: bool = True, outputdir: str = './', rebuild: bool = True, reader_kwargs: dict = {})
Run all steps for histogram computation.
- Parameters:
var (str) – Variable to retrieve and compute.
formula (bool) – Use formula for variable.
long_name (str) – Long name of variable.
units (str) – Units of variable.
standard_name (str) – Standard name of variable.
box_brd (bool) – Include box boundaries.
density (bool) – Return PDF (normalized) instead of counts.
outputdir (str) – Output directory.
rebuild (bool) – Rebuild existing files.
reader_kwargs (dict) – Additional Reader kwargs.
- save_netcdf(outputdir: str = './', rebuild: bool = True)
Save histogram data to netcdf file.
- Parameters:
outputdir (str) – Output directory.
rebuild (bool) – Rebuild if file exists.
- class aqua.diagnostics.histogram.PlotHistogram(data=None, ref_data=None, diagnostic_name='histogram', density=True, loglevel: str = 'WARNING')
Bases:
objectClass for plotting Histogram diagnostics. Provides methods to plot histogram/PDF data with customizable labels, titles, and styling options.
Initialize the PlotHistogram class.
- Parameters:
data – List of histogram DataArrays to plot, or single DataArray.
ref_data – Reference histogram DataArray.
diagnostic_name (str) – Name of the diagnostic. Default is ‘histogram’.
density (bool) – Whether data represents PDF (True) or counts (False).
loglevel (str) – Logging level. Default is ‘WARNING’.
- get_data_info()
Extract metadata from data arrays.
- plot(data_labels=None, ref_label=None, title=None, style=None, xlogscale=False, ylogscale=True, xmax=None, xmin=None, ymax=None, ymin=None, smooth=False, smooth_window=5, labelsize=None)
Plot histogram data.
- Parameters:
data_labels (list, optional) – Labels for the data.
ref_label (str, optional) – Label for the reference data.
title (str, optional) – Title for the plot.
style (str, optional) – Plotting style.
xlogscale (bool) – Use log scale for x-axis.
ylogscale (bool) – Use log scale for y-axis.
xmax (float, optional) – Maximum x value.
xmin (float, optional) – Minimum x value.
ymax (float, optional) – Maximum y value.
ymin (float, optional) – Minimum y value.
smooth (bool) – Apply smoothing to data.
smooth_window (int) – Window size for smoothing.
labelsize (int, optional) – Font size for labels.
- Returns:
Matplotlib figure and axes objects.
- Return type:
tuple
- run(outputdir='./', rebuild=True, dpi=300, style=None, format: str | list = ['png', 'pdf', 'svg'], xlogscale=False, ylogscale=True, xmax=None, xmin=None, ymax=None, ymin=None, smooth=False, smooth_window=5, labelsize=None, show=False)
Run the complete plotting workflow.
- Parameters:
outputdir (str) – Output directory to save the plot.
rebuild (bool) – If True, rebuild the plot even if it already exists.
dpi (int) – Dots per inch for the plot.
style (str) – Plotting style.
format (str or list) – Format(s) to save the figure. Default is SAVE_FORMAT.
xlogscale (bool) – Use log scale for x-axis.
ylogscale (bool) – Use log scale for y-axis.
xmax (float, optional) – Maximum x value.
xmin (float, optional) – Minimum x value.
ymax (float, optional) – Maximum y value.
ymin (float, optional) – Minimum y value.
smooth (bool) – Apply smoothing to data.
smooth_window (int) – Window size for smoothing.
show (bool) – If True, display the plot interactively.
- save_plot(fig, description: str = None, rebuild: bool = True, outputdir: str = './', dpi: int = 300, format: str | list = ['png', 'pdf', 'svg'])
Save the plot to a file.
- Parameters:
fig (matplotlib.figure.Figure) – Figure object.
description (str) – Description of the plot.
rebuild (bool) – If True, rebuild the plot even if it already exists.
outputdir (str) – Output directory to save the plot.
dpi (int) – Dots per inch for the plot.
format (str or list) – Format(s) to save the figure. Default is SAVE_FORMAT.
- set_data_labels()
Set the data labels for the plot.
- set_description()
Set the description for the plot.
- set_ref_label()
Set the reference label for the plot.
- set_title()
Set the title for the plot.