Installation

In this section we will provide a step-by-step guide to install the Python package aqua-diagnostics. AQUA-diagnostics is developed and tested with Python 3.12 and it supports Python 3.9 or later (with the exclusions of 3.13).

AQUA-diagnostics extends the AQUA-core package (https://github.com/DestinE-Climate-DT/AQUA), which provides the core functionalities required for running diagnostics. When you install AQUA-diagnostics, AQUA-core will be automatically installed as a dependency, giving you access to both packages.

Conda/Mamba installation with pip

Prerequisites

  • Miniforge : Miniforge is a package manager for conda-forge, and it is the recommended package manager for the installation process.

Installation with Miniforge

AQUA-diagnostics is available on the Python Package Index (PyPI) repository and can be installed with pip. However, some dependencies are not available on PyPI, so you may need to install them manually: recommended way to do this is to use Mamba/Conda package manager for the installation process of the dependencies, and then use pip to install AQUA-diagnostics itself. This can be achieved with:

mamba create -n aquarium -c conda-forge python=3.12 cdo eccodes=2.41.0 esmpy
mamba activate aquarium
pip install aqua-diagnostics[core]

The same environment is available in the AQUA-diagnostics GitHub repository in the environment-pypi.yml file.

Note

If you need to access data written in a local FDB database (not polytope), you need to install the FDB5 library. The FDB5 library is not available in the conda-forge repository, so you need to install it manually. If you are working on a supported HPC, you can check the corresponding section for more information in the HPC installation section.

Extra dependencies

Some extra depencencies are defined in the pyproject.toml file in the repository. These are necesassary to compile the documentation, to run test or to run the notebooks. You can install them with the following command:

pip install aqua-diagnostics[docs]
pip install aqua-diagnostics[notebooks]
pip install aqua-diagnostics[tests]

Or to install all the extra dependencies:

pip install aqua-diagnostics[all]

Conda/Mamba installation with environment file

It is possible to use Mamba/Conda package manager for the installation process. AQUA-diagnostics is not yet available on the conda-forge repository, so the installation process requires the use of an environment file that contains all the required dependencies.

Installation with Miniforge

First, clone the AQUA-diagnostics repository from GitHub:

git clone git@github.com:DestinE-Climate-DT/AQUA-diagnostics.git

Then, navigate to the AQUA-diagnostics directory:

cd AQUA-diagnostics

Create a new environment with Mamba. An environment file is provided in the repository, so you can create the environment with the following command:

mamba env create -f environment.yml

This will create a new environment called aqua-diagnostics with all the required dependencies.

Finally, activate the environment:

mamba activate aqua-diagnostics

At this point, you should have successfully installed the AQUA-diagnostics package and its dependencies in the newly created aqua-diagnostics environment.

Note

By default, the environment file installs the cloned version of AQUA-diagnostics in editable mode with pip install -e .[all].

Note

If you need to install AQUA-core in editable mode for development purposes, you must clone the AQUA-core repository separately and install it in editable mode before installing AQUA-diagnostics.

HPC Installation

Installation on LUMI HPC

LUMI is currently the main HPC of the DestinE-Climate-DT project, and it is the main platform for the development of AQUA. The Lustre filesystem does not support the use of conda environments, so another approach has been developed to install on LUMI, based on container-wrapper.

First, clone both the AQUA (aqua-core) and AQUA-diagnostics repositories from GitHub:

git clone git@github.com:DestinE-Climate-DT/AQUA.git
git clone git@github.com:DestinE-Climate-DT/AQUA-diagnostics.git

For simpler installation, it is recommended to define the $AQUA and $AQUA_DIAGNOSTICS environment variables that point to the respective directories:

export AQUA=/path/to/AQUA
export AQUA_DIAGNOSTICS=/path/to/AQUA-diagnostics

Note

Both environment variables are required. The installation script will default to $HOME/AQUA and $HOME/AQUA-diagnostics if these are not set, but setting them explicitly is more recommended.

Then, navigate to the AQUA-diagnostics directory and specifically in the cli/lumi-install directory:

cd $AQUA_DIAGNOSTICS/cli/lumi-install

Run the installation script:

bash lumi_install.sh

This installs the AQUA environment into a container, and then sets up the correct modules via a load_aqua_diagnostics.sh script that is generated and then called from the .bash_profile.

What load_aqua_diagnostics.sh does

The generated load_aqua_diagnostics.sh script configures your shell environment by:

  1. Setting up FDB5 access: Adds the FDB5 binary path and libraries to PATH and LD_LIBRARY_PATH, enabling access to data stored in FDB databases.

  2. Configuring GSV paths: Sets environment variables required by the GSV package:

    • GSV_WEIGHTS_PATH: Path to GSV neural network weights

    • GSV_TEST_FILES: Path to GSV test files

    • GRID_DEFINITION_PATH: Path to grid definitions for unstructured grids

  3. Defining AQUA paths: Sets AQUA_DIAGNOSTICS_PATH and AQUA_CORE_PATH variables pointing to the respective environment binaries.

  4. Adding AQUA-diagnostics to PATH: Prepends the AQUA-diagnostics binary directory to your PATH, making the aqua command and Python environment available.

Why sourcing is required

As environment variables and shell functions are only available in the current shell session. When you log out and start a new session, all these settings are lost.

The script will ask the user if they wish to add source ~/load_aqua_diagnostics.sh to .bash_profile at the end of the installation. If added to .bash_profile, the script is automatically sourced every time you start a new login shell, so AQUA-diagnostics will be ready to use immediately.

If you choose not to add it to .bash_profile, you will need to manually source the script at the beginning of each session:

source ~/load_aqua_diagnostics.sh

Note

Both AQUA (aqua-core) and AQUA-diagnostics are installed in editable mode. This means you can modify the source code in both repositories and changes will be reflected immediately without reinstallation.

Switching between AQUA environments

The installation script creates two helper functions to manage AQUA environments:

  • switch_aqua [diagnostics|core]: Switch between AQUA-diagnostics and AQUA-core environments. Use switch_aqua -v diagnostics or switch_aqua -v core for verbose output showing path changes.

  • which_aqua: Check which AQUA environment is currently active (returns aqua-diagnostics, aqua-core, or none).

By default, the AQUA-diagnostics environment is loaded when you source load_aqua_diagnostics.sh.

Note

Comment or delete scripts calls to files like source load_aqua.sh in your .bash_profile file to avoid possible conflicts.

Note

The installation script is designed to be run on the LUMI cluster, and it may require some adjustments to be run on other systems that use the container-wrapper tool. Please refer to the documentation of the container-wrapper tool for more information.

Warning

This installation script, despite the name, does not install the AQUA package in the traditional sense nor in a pure container. It wraps the conda installation in a container, allowing to load LUMI modules and run from command line or batch jobs the AQUA code. Different LUMI module loading or setups may lead to different results, but it’s the most flexible way to develop AQUA (core and/or diagnostics) on LUMI.

Note

If you encounter any issues with the installation script, please refer to the Troubleshooting and FAQ section.

Installation on Levante HPC at DKRZ

You can follow the installation process described in the previous section (see Conda/Mamba installation with environment file). In order to use the FDB access, you need to load the FDB5 binary library (libfdb5.so). At the moment a specific module for levante seems not to be available, so you can either compile your own copy and then make it available (download the source code from https://github.com/ecmwf/fdb), or you can use our precompiled version by setting:

export LD_LIBRARY_PATH=/work/bb1153/b382075/aqua/local/lib:$LD_LIBRARY_PATH

in .bash_profile and in .bashrc in your home directory.

The GSV package will also require, in order to correctly decode the unstructured grid, an environment variable to be set:

export GRID_DEFINITION_PATH=/work/bb1153/b382321/grid_definitions

This path is the one where the grid definitions are stored, and it is necessary for the GSV package to work correctly. Also in this case, you can set the environment variable in your .bash_profile and in .bashrc in your home directory.

Installation on MareNostrum 5 (MN5) HPC at BSC

To enable internet-dependent operations like git, pip or conda on MN5, you can configure an SSH tunnel and set up proxy environment variables.

Note

We recommend using a machine with a stable connection, such as Levante or LUMI, for these configurations, as connections to MN5 from personal computers may be unstable.

Add a RemoteForward directive with a valid port number under the MN5 section of your ~/.ssh/config file. Use the following configuration, replacing <port_number> with a unique port number to avoid conflicts (on most systems the valid range for ports is from 1024 to 49151 for user-level applications).

Host mn5
    RemoteForward <port_number>

After logging into MN5, export the following proxy environment variables to direct traffic through the SSH tunnel. Replace <port_number> with the same port number used in your SSH configuration:

export https_proxy=socks5://localhost:<port_number>
export http_proxy=socks5://localhost:<port_number>

You can add these exports to your .bash_profile and .bashrc files for persistence.

You can check if the forwarding is running by using the following command with your chosen port number:

netstat -tlnp | grep <port_number>

Next, create your GitHub SSH key as usual, and then update your ~/.ssh/config file with the following configuration:

Host github.com
    Hostname ssh.github.com
    Port 443
    User git
    IdentityFile ~/.ssh/id_github
    ProxyCommand nc -x localhost:<port_number> %h %p

To verify the configuration, try testing the SSH connection with:

ssh -T git@github.com

Once verified, you can successfully use git clone and other Git commands with SSH.

To install AQUA, see Conda/Mamba installation with environment file.

Warning

The wget command does not work properly in this setup. Use curl as an alternative for downloading files.

To use the FDB5 binary library on MN5, set the following environment variable:

export LD_LIBRARY_PATH="$LD_LIBRARY_PATH:/gpfs/projects/ehpc01/sbeyer/models/DE_CY48R1.0_climateDT_tco399_aerosol_runoff/build/lib"

Installation on ECMWF HPC2020

HPC2020 is moving to a more container-based approach, so the suggested installation process uses a technology similar to the one used on LUMI. In fact, using directly conda or mamba on lustre filesystems ($PERM and $HPCPERM) is not recommended and has been verified to lead to severe performance issues.

The recommended approach is to use the tykky module developed by CSC, and available on HPC2020, which provides the same container wrapper technology used for an install on LUMI. This process is also described in the relevant HPC2020 documentation pages.

While basically you could follow the instructions in the ECMWF docs on how to create a tykky environment, a small bug in one of the AQUA dependencies requires a slightly more complex procedure, so that, as for LUMI, a convenience installation script has been created.

First, clone the AQUA-diagnostics repository from GitHub as described in the previous section.

The installation process uses considerable resources which may exceed the capacity of the login node. For this reason, it is recommended to start an interactive session asking for adequate resources:

ecinteractive -c 8 -m 20 -s 30

which will ask for a session with 8 cpus, 20 GB of RAM and 30 GB of temporary local disk storage. This is required only for the installation, not necessarily for using AQUA.

Note

If this is the first time that you run ecinteractive, you should first set up your ssh keys by running the command ssh-key-setup.

It is recommended to define an $AQUA_DIAGNOSTICS environment variable that points to the AQUA_DIAGNOSTICS directory (the script will assume by default that AQUA_DIAGNOSTICS is located in the current directiry.):

export AQUA_DIAGNOSTICS=/path/to/AQUA

Then run the the installation script:

cd $AQUA_DIAGNOSTICS/cli/hpc2020-install
./hpc2020-install.sh

The script installs by default the AQUA tykky environment in the directory $HPCPERM/tykky/aqua.

The script will ask the user if they wish to add the AQUA environment permanently to their $PATH in the .bash_profile file at the end of the installation. Please note that adding AQUA to your PATH will make you use the aqua environment for all activities on HPC2020, so this is not really recommended.

Instead, the recommended way to use AQUA is by loading the environment with a conda-like syntax:

module load tykky
tykky activate aqua

You can later also use tykky deactivate to deactivate the environment.

Note

This installs aqua-core as a package from pip and aqua-diagnostics in editable mode. If you are a developer you can also install using the hpc2020_install_dev.sh script, which will install both in editable mode, creating the tykky environment aqua-dev.

In case you plan to use Visual Studio Code, you can add a kernel pointing to the containerized AQUA by running also the following command:

$HPCPERM/tykky/aqua/bin/python3 -m ipykernel install --user --name=<my_containerised_env_name>

Installation and use of the AQUA container

In order to use AQUA-diagnostics in complicate workflows or in a production environment, it is recommended to use the AQUA-diagnostics container. The AQUA container is a Docker container that contains the AQUA-diagnostics package and all its dependencies.

Please refer to the Container section for more information on how to deploy and how to use the AQUA container.

Note

If you’re working on LUMI, Levante or MN5 HPCs, a compact script is available to load the AQUA container, mounting the necessary folders and creating the necessary environment variables. Please refer to the Container section.