Automatic Standardized File Naming
Class Overview
The OutputSaver class is designed to manage output file naming conventions for scientific data.
It supports generating filenames for various file types (e.g., NetCDF, PDF, PNG) with metadata integration to enhance data management and traceability.
The class ensures consistent and descriptive filenames, facilitating better data organization and reproducibility.
It also allows for the generation of a catalog entry, to store the resulting NetCDF files in the AQUA catalog.
Attributes
diagnostic (str): Name of the diagnostic.
catalog (str or list): Catalog name (e.g.,
lumi-phase2).model (str or list): Model name (e.g.,
IFS-NEMO).exp (str or list): Experiment name (e.g.,
historical).realization (str or list): Realization name (default is
r1).catalog_ref (str or list, optional): Reference catalog name.
model_ref (str or list, optional): Reference model name.
exp_ref (str or list, optional): Reference experiment name.
outdir (str, optional): Output directory where files will be saved. Defaults to the current directory.
loglevel (str, optional): Logging level for the class’s logger. Defaults to
WARNING.
Note
The OutputSaver class automatically includes the current date and time when saving files as metadata.
This ensures each file has a timestamp indicating when it was generated.
The version of the AQUA package is also included in the metadata for traceability.
Note
The OutputSaver should be initialized with the single-model info when storing NetCDF files,
while with all the involved models and reference when storing plots.
Example Usage
Initializing the OutputSaver Class
The following example demonstrates how to initialize the OutputSaver class:
from aqua.diagnostics.core import OutputSaver
# Initializing with the system-defined default catalog
outputsaver = OutputSaver(diagnostic='dummy',
catalog='climatedt-phase1', model='IFS-NEMO', exp='historical-1990',
catalog_ref='obs', model_ref='ERA5', exp_ref='era5',
outdir='.', loglevel='DEBUG')
Generating a Filename
This example shows how to generate a filename with the ‘mean’ diagnostic product for the previously initialized class.
filename = outputsaver.generate_name(diagnostic_product='mean')
# Output: 'dummy.mean.climatedt-phase1.IFS-NEMO.historical-1990.obs.ERA5.era5'
Note
The generated filename includes the diagnostic name, diagnostic product, catalog, model, and experiment.
If the reference dataset is specified in the OutputSaver constructor, it will also be included in the filename.
Alternatively, the catalog-model-experiment triplets for the main and reference datasets
can be specified directly in the generate_name method.
Generating a Filename with Extra Keys
The user can also specify extra parameters that will be added to the filename, such as variable, region, period, pressure level, etc.
Extra keys are not mandatory, but if specified, they will be appended to the filename.
They are entirely flexible and can include any relevant information the user wishes to capture.
extra_keys = {'variable': '2t', 'region': 'global', 'period': '1990-2000'}
filename = outputsaver.generate_name(diagnostic_product='mean',
extra_keys=extra_keys)
# Output: 'dummy.mean.climatedt-phase1.IFS-NEMO.historical-1990.obs.ERA5.era5.2t.global.1990-2000'
Saving a NetCDF File with Metadata
Here is an example of saving a NetCDF file with metadata. The metadata includes the title, author, and description of the file.
import xarray as xr
# Example dataset
dataset = xr.Dataset()
# Define metadata for the NetCDF file
metadata = {
'title': 'Testing the saving of NetCDF files',
'author': 'OutputSaver',
'description': 'Demonstrating netCDF Metadata Addition'
}
outputsaver.save_netcdf(dataset, 'test', extra_keys=extra_keys, metadata=metadata)
Note
If the history metadata field is provided, the OutputSaver class will append
the current message to the existing history.
Saving a Plot with Metadata
This example demonstrates saving multiple plot formats (e.g. PNG/PDF/SVG) with metadata. The metadata includes the title, author, subject, and keywords of the file.
import matplotlib.pyplot as plt
# Create a sample figure
fig, ax = plt.subplots()
ax.plot([0, 1], [0, 1])
# Define metadata for the PDF file
metadata = {
'/Title': 'Sample PDF',
'/Author': 'OutputSaver',
'/Subject': 'Demonstrating PDF Metadata Addition',
'/Keywords': 'PDF, OutputSaver, Metadata'
}
# Save the PDF and PNG with metadata
outputsaver.save_figure(fig, 'test', extra_keys=extra_keys, metadata=metadata,
extension=['png', 'pdf'], # optional; default is SAVE_FORMAT (['png', 'pdf', 'svg'])
dpi=300)
Note
We suggest using the metadata field /Caption to store the plot description.
This is currently used by the AQUA dashboard to generate plot descriptions.
Opening a PDF File and Displaying Metadata
To open a PDF file and display its metadata:
from aqua.core.util import open_image
open_image("/path/to/my/file/dummy.mean.climatedt-phase1.IFS-NEMO.historical-1990.obs.ERA5.era5.pdf")
Generating a Filename for Multimodel or Multireference Comparisons
In some diagnostics, multimodel or multireference comparisons may be required.
In this case, the user can specify a list of catalog-model-experiment triplets for the main and/or the reference dataset.
To avoid overly long filenames, the keyword multimodel or multiref will be used to indicate that the dataset is a list.
Complete information about the datasets is preserved in the output file’s metadata.
outputsaver = OutputSaver(diagnostic='dummy',
catalog=['climatedt-phase1', 'climatedt-phase1'],
model=['IFS-NEMO', 'ICON'],
exp=['historical-1990', 'historical-1990'],
catalog_ref='obs', model_ref='ERA5', exp_ref='era5',
outdir='.', loglevel='DEBUG')
filename = outputsaver.generate_name(diagnostic_product='test')
# Output: 'dummy.test.multimodel.obs.ERA5.era5'
Creating a catalog entry
The save_netcdf method allows for the creation of a catalog entry for the saved NetCDF file.
This entry is created in the same experiment file of the input dataset.
In order to enable the catalog entry creation, the method should include as arguments:
create_catalog_entry(bool, optional): Set asTrueto create a catalog entry for the saved NetCDF file.dict_catalog_entry(dict, optional): A dictionary containing the catalog entry information. The catalog can specify ajinjalist(by default['freq', 'stat', 'region', 'realization']) and awildcardlist(by default['var']). For each matchingextra_keys, the code will create an intake parameter for the first list and a wildcard for the second. This allows for catalog entry which can access the relevant NetCDF files when used with theReader.