Hydrological Model Validator

This project provides a set of tools for evaluating the performance of Bio-Geo-Hydrological simulations and analyzing their outputs.

The focus of this repository is on post-processing, offering utilities to:

Clean and pre-process relevant dataframes
Interpolate observed datasets to handle missing values and apply proper masking
Compare observed and simulated outputs for validation and performance assessment
Analyse the simulation outputs for further insights

Features

Data cleaning and transformation
Missing data interpolation and masking
Automated validation metrics and visual comparison
Optional PDF report generation
Modular structure for customizable analysis workflows

Project Structure

The project is organized into two main objectives:

Quick-Use Toolkit
A high-level interface that allows users to input datasets and automatically generate:
- Validation plots
- Summary dataframes
- A PDF report (optional)
Modular Subcomponents
A collection of standalone functions and classes for users who prefer to build custom analysis pipelines or integrate specific components into other projects. These are collected into 3 submodules, which can both be used as standalones or combined:
- Processing/: Functions for reading, cleaning, transforming, and analyzing input datasets.
- Plotting/: Tools for generating a variety of plots from the processed results, including time series, scatter plots, and performance metrics.
- Report/: Utilities for creating structured PDF reports, incorporating plots, summary statistics, and metadata.

Note: This repository is part of a Physics of the Earth System thesis and may be expanded in the future to include additional variables and more advanced analytical features.

Model/Simulation Evaluation

The current evaluation approach is based on a direct comparison between simulated and observed datasets over the same time window. The results are presented through a variety of plots and statistical performance metrics.

Analytical Tools

The following visualization and statistical tools are used to evaluate model performance:

Time Series & Scatter Plots
- General time series plots for visual inspection
- Seasonal scatter plots for intra-annual trends
Distribution Plots
- Box-and-whisker plots
- Violin plots
Multivariate Performance Plots
- Target diagrams
- Taylor diagrams
Efficiency Metrics A wide set of statistical coefficients is implemented to evaluate model accuracy:
- Coefficient of Determination (R²)
  - Standard
  - Weighted
- Index of Agreement (d)
  - Standard
  - Modified
  - Relative
- Nash–Sutcliffe Efficiency (NSE)
  - Standard
  - Logarithmic
  - Modified
  - Relative
Error Decomposition
- Time-series and Spectral Analysis
  - Compared against cloud coverage patterns
- Spatial Performance Mapping
  - Annual and monthly resolution maps showing model performance across geographic regions

Expansion of the Analysis: Bottom (Benthic) σ-Layer

As the first direction for expanded analysis, this repository introduces tools focused on the extraction and study of the bottom σ-layer of the simulation grid. This layer is particularly relevant for investigating the formation of deep water masses and the distribution of bio-geochemical variables near the seabed.

Once the model has been validated using the core evaluation tools, users can apply these modules to explore processes such as:

Stratification and mixing at depth
Tracer evolution in deep layers (e.g., nutrients, oxygen, carbon compounds)
Temporal variability in bottom water properties

For implementation details and example workflows, refer to the test cases provided in the Test_cases/ directory.

Installation Guide

This project can be installed using conda (recommended) or pip across all major operating systems. Below you’ll also find guidance for optional tools like MATLAB and CDO (Linux only) which are integrated in some of the functions/routines

Python Environment Setup

Python version supported :

Conda (Recommended)

All Systems

# Create a new conda environment
conda create -n hydroval python=3.10

# Activate the environment
conda activate hydroval

# Install the package and dependencies in editable mode
pip install -e .

Pip Only (Without Conda)

All Systems

# Optionally create and activate a virtual environment (recommended)
python -m venv env
source env/bin/activate      # Windows: env\Scripts\activate

# Install the package and dependencies in editable mode
pip install -e .

Alternative Pip Options

--user (No admin rights)

pip install --user -e .

-e (Editable/development mode)

pip install -e .

Use -e when actively developing or modifying the source code.

MATLAB (Optional but needed for the interpolator script)

MATLAB Setup (All Systems)

Description

Some test cases or post-analysis steps may require MATLAB. Make sure it's installed and available via your system's PATH.

🔗 Official MATLAB Installation Guide

To correctly run the interpolator, the toolboxes m_map, mexcdf, and nctoolbox need to be accessible by the script. Please make sure that their paths are reachable by your MATLAB installation. For a guide on how to add paths in MATLAB, please refer to MATLAB Add Folder to Path Documentation.

The usage of a MATLAB interpolator is to make the process NOAA compliant by using their same tools, allowing future integration of this repository with other NOAA tools.

m_map toolbox: https://www.eoas.ubc.ca/~rich/map.html
mexcdf toolbox: https://www.mathworks.com/matlabcentral/fileexchange/26310-netcdf-interface-for-matlab-mexcdf
nctoolbox: https://github.com/nctoolbox/nctoolbox

CDO - Climate Data Operators (Linux Only)

CDO Setup (Linux Only)

⚠️ CDO is supported only on Linux-based systems.

# Ubuntu/Debian
sudo apt install cdo

# Or use conda
conda install -c conda-forge cdo

🔗 Official CDO Installation Guide

Helpful Links

Official Documentation

Usage Guide: `GenerateReport` CLI

The GenerateReport command-line interface (CLI) allows users to generate evaluation reports from observed and simulated Bio-Geo-Hydrological datasets.

Basic Command

GenerateReport [input_folder_or_dict] [OPTIONS]

Positional Argument

usage: GenerateReport [-h] [--output-dir path] [--check] [--no-pdf] [--verbose] [--open-report]
                      [--variable var_name] [--unit unit_str] [--no-banner] [--info] [--version]
                      [input]

Generate a comprehensive evaluation report from observed and simulated Bio-Geo-Hydrological datasets.

positional arguments:
  input                 Path to the input data directory or a dictionary of file paths.
                        You can pass:
                          - a folder containing: obs_spatial, sim_spatial, obs_ts, sim_ts, and mask
                          - or a stringified dictionary (JSON or Python format) mapping these keys:
                            {
                              "obs_spatial": "obs_spatial.nc",
                              "sim_spatial": "sim_spatial.nc",
                              "obs_ts": "obs_timeseries.csv",
                              "sim_ts": "sim_timeseries.csv",
                              "mask": "mask.nc"
                            }

options:
  -h, --help            Show this help message and exit
  --output-dir path     Destination folder for report and plots (default: ./REPORT)
  --check               Validate input files and structure only, no report generation
  --no-pdf              Skip PDF generation, only output plots and dataframes
  --verbose             Enable detailed logging
  --open-report         Automatically open the PDF report if generated
  --variable var_name   Name of the target variable (e.g. "Chlorophyll-a")
  --unit unit_str       Unit of the variable (e.g. "mg/L", "m3/s"), LaTeX-ready
  --no-banner           Suppress ASCII banner (useful for batch jobs)
  --info                Show program description and exit
  --version             Show version and exit

Examples

Minimal Run (Interactive)

GenerateReport ./data

With Output Directory & No PDF

GenerateReport ./data --output-dir ./results --no-pdf

Using a JSON-Style Dictionary

GenerateReport "{ \"obs_spatial\": \"obs.nc\", \"sim_spatial\": \"sim.nc\", \"obs_ts\": \"obs.csv\", \"sim_ts\": \"sim.csv\", \"mask\": \"mask.nc\" }"

Quiet Batch Run (No Banner, Auto Open Report)

GenerateReport ./data --no-banner --open-report

For example usage of the singular functions (sans the report generation ones) availbale in the repository, and generally for in-script import and usage, please refer to the test cases available in the Test_cases/ folder and their respective TEST_CASES_README file.

Test Cases and Pytests

This repository includes a suite of example routines and automated tests to ensure the correct functionality of its components. All tests are located in the Test_cases/ directory.

Test Case Scripts

These are step-by-step, verbose scripts that demonstrate how to apply the tools for data cleaning, analysis, and reporting. They are ideal for understanding the intended usage.

Available Test Cases

Data_cleaner_setupper.py
Demonstrates how to clean and prepare datasets for analysis.
Includes the MATLAB script Interpolator_v2.m to perform bilinear interpolation on observed datasets.
SST_data_analyzer.py & CHL_data_analyzer.py
Practical examples of analysis workflows using Sea Surface Temperature and Chlorophyll-a datasets.
These are simplified and didactical illustrations of what the Report_generator submodule automates.
Benthic_layer.py
Focuses on extracting and analyzing bottom σ-layers, emphasizing dense water formation and bio-geochemical tracers near the seabed.

Pytests and Code Quality

Automated testing ensures reliability and stability of the modules, using:

pytest
flake8 (for linting and style enforcement)

You can run them via:

pytest
flake8 src/

These tools verify logic correctness, class behavior, and code style compliance.

Code Quality Reports

This project is continuously monitored with external quality and coverage tools:

Codacy	Codebeat	Codecov	Documentation

Bibliography

The Northern Adriatic Forecasting System for Circulation and Biogeochemistry: Implementation and Preliminary Results (Scroccaro I et al., 2022)

Comparison of different efficiency criteria for hydrological model assessment (Krause P. et al., 2005)

Summary diagrams for coupled hydrodynamic-ecosystem model skill assessment (Jolliff et al., 2008)

The International Thermodynamic Equation of Seawater 2010 (TEOS-10): Calculation and Use of Thermodynamic Properties (McDougall et al., 2010)

Defining a Simplified Yet “Realistic” Equation of State for Seawater (Roquet et al., 2015)

Climatological analysis of the Adriatic Sea thermohaline characteristics (Giorgietti A., 1998)

Evaluation of different Maritime rapid environmental assessment procedures with a focus on acoustic performance (Oddo et al., 2022)

A study of the hydrographic conditions in the Adriatic Sea from numerical modelling and direct observations (2000–2008) (Oddo et al., 2011)

Name		Name	Last commit message	Last commit date
Latest commit History 568 Commits
.github/workflows		.github/workflows
Hydrological_model_validator		Hydrological_model_validator
Test_cases		Test_cases
docs		docs
.gitignore		.gitignore
.readthedocs.yaml		.readthedocs.yaml
AUTHORS.md		AUTHORS.md
MANIFEST.in		MANIFEST.in
README.md		README.md
TODOs.md		TODOs.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Hydrological Model Validator

Features

Project Structure

Model/Simulation Evaluation

Analytical Tools

Expansion of the Analysis: Bottom (Benthic) σ-Layer

Installation Guide

Python Environment Setup

All Systems

All Systems

--user (No admin rights)

-e (Editable/development mode)

MATLAB (Optional but needed for the interpolator script)

Description

CDO - Climate Data Operators (Linux Only)

⚠️ CDO is supported only on Linux-based systems.

Helpful Links

Usage Guide: `GenerateReport` CLI

Basic Command

Positional Argument

Examples

Test Cases and Pytests

Test Case Scripts

Available Test Cases

Pytests and Code Quality

Code Quality Reports

Bibliography

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Hydrological Model Validator

Features

Project Structure

Model/Simulation Evaluation

Analytical Tools

Expansion of the Analysis: Bottom (Benthic) σ-Layer

Installation Guide

Python Environment Setup

All Systems

All Systems

--user (No admin rights)

-e (Editable/development mode)

MATLAB (Optional but needed for the interpolator script)

Description

CDO - Climate Data Operators (Linux Only)

⚠️ CDO is supported only on Linux-based systems.

Helpful Links

Usage Guide: GenerateReport CLI

Basic Command

Positional Argument

Examples

Test Cases and Pytests

Test Case Scripts

Available Test Cases

Pytests and Code Quality

Code Quality Reports

Bibliography

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Usage Guide: `GenerateReport` CLI

Packages