Skip to content

AlessandroGozzoli/Hydrological-Model-Validator

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

568 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

Hydrological Model Validator

This project provides a set of tools for evaluating the performance of Bio-Geo-Hydrological simulations and analyzing their outputs.

The focus of this repository is on post-processing, offering utilities to:

  • Clean and pre-process relevant dataframes
  • Interpolate observed datasets to handle missing values and apply proper masking
  • Compare observed and simulated outputs for validation and performance assessment
  • Analyse the simulation outputs for further insights

Features

  • Data cleaning and transformation
  • Missing data interpolation and masking
  • Automated validation metrics and visual comparison
  • Optional PDF report generation
  • Modular structure for customizable analysis workflows

Project Structure

The project is organized into two main objectives:

  1. Quick-Use Toolkit
    A high-level interface that allows users to input datasets and automatically generate:

    • Validation plots
    • Summary dataframes
    • A PDF report (optional)
  2. Modular Subcomponents
    A collection of standalone functions and classes for users who prefer to build custom analysis pipelines or integrate specific components into other projects. These are collected into 3 submodules, which can both be used as standalones or combined:

    • Processing/: Functions for reading, cleaning, transforming, and analyzing input datasets.
    • Plotting/: Tools for generating a variety of plots from the processed results, including time series, scatter plots, and performance metrics.
    • Report/: Utilities for creating structured PDF reports, incorporating plots, summary statistics, and metadata.

Note: This repository is part of a Physics of the Earth System thesis and may be expanded in the future to include additional variables and more advanced analytical features.


Model/Simulation Evaluation

The current evaluation approach is based on a direct comparison between simulated and observed datasets over the same time window. The results are presented through a variety of plots and statistical performance metrics.

Analytical Tools

The following visualization and statistical tools are used to evaluate model performance:

  • Time Series & Scatter Plots

    • General time series plots for visual inspection
    • Seasonal scatter plots for intra-annual trends
  • Distribution Plots

    • Box-and-whisker plots
    • Violin plots
  • Multivariate Performance Plots

    • Target diagrams
    • Taylor diagrams
  • Efficiency Metrics A wide set of statistical coefficients is implemented to evaluate model accuracy:

    • Coefficient of Determination (Rยฒ)

      • Standard
      • Weighted
    • Index of Agreement (d)

      • Standard
      • Modified
      • Relative
    • Nashโ€“Sutcliffe Efficiency (NSE)

      • Standard
      • Logarithmic
      • Modified
      • Relative
  • Error Decomposition

    • Time-series and Spectral Analysis
      • Compared against cloud coverage patterns
    • Spatial Performance Mapping
      • Annual and monthly resolution maps showing model performance across geographic regions

Expansion of the Analysis: Bottom (Benthic) ฯƒ-Layer

As the first direction for expanded analysis, this repository introduces tools focused on the extraction and study of the bottom ฯƒ-layer of the simulation grid. This layer is particularly relevant for investigating the formation of deep water masses and the distribution of bio-geochemical variables near the seabed.

Once the model has been validated using the core evaluation tools, users can apply these modules to explore processes such as:

  • Stratification and mixing at depth
  • Tracer evolution in deep layers (e.g., nutrients, oxygen, carbon compounds)
  • Temporal variability in bottom water properties

For implementation details and example workflows, refer to the test cases provided in the Test_cases/ directory.


Installation Guide

This project can be installed using conda (recommended) or pip across all major operating systems. Below youโ€™ll also find guidance for optional tools like MATLAB and CDO (Linux only) which are integrated in some of the functions/routines


Python Environment Setup

Python version supported : Python version

Conda (Recommended)

All Systems

# Create a new conda environment
conda create -n hydroval python=3.10

# Activate the environment
conda activate hydroval

# Install the package and dependencies in editable mode
pip install -e .
Pip Only (Without Conda)

All Systems

# Optionally create and activate a virtual environment (recommended)
python -m venv env
source env/bin/activate      # Windows: env\Scripts\activate

# Install the package and dependencies in editable mode
pip install -e .
Alternative Pip Options

--user (No admin rights)

pip install --user -e .

-e (Editable/development mode)

pip install -e .

Use -e when actively developing or modifying the source code.


MATLAB (Optional but needed for the interpolator script)

MATLAB Setup (All Systems)

Description

Some test cases or post-analysis steps may require MATLAB. Make sure it's installed and available via your system's PATH.

๐Ÿ”— Official MATLAB Installation Guide

To correctly run the interpolator, the toolboxes m_map, mexcdf, and nctoolbox need to be accessible by the script. Please make sure that their paths are reachable by your MATLAB installation. For a guide on how to add paths in MATLAB, please refer to MATLAB Add Folder to Path Documentation.

The usage of a MATLAB interpolator is to make the process NOAA compliant by using their same tools, allowing future integration of this repository with other NOAA tools.


CDO - Climate Data Operators (Linux Only)

CDO Setup (Linux Only)

โš ๏ธ CDO is supported only on Linux-based systems.

# Ubuntu/Debian
sudo apt install cdo

# Or use conda
conda install -c conda-forge cdo

๐Ÿ”— Official CDO Installation Guide


Helpful Links

Official Documentation

Usage Guide: GenerateReport CLI

The GenerateReport command-line interface (CLI) allows users to generate evaluation reports from observed and simulated Bio-Geo-Hydrological datasets.

Basic Command

GenerateReport [input_folder_or_dict] [OPTIONS]

Positional Argument

usage: GenerateReport [-h] [--output-dir path] [--check] [--no-pdf] [--verbose] [--open-report]
                      [--variable var_name] [--unit unit_str] [--no-banner] [--info] [--version]
                      [input]

Generate a comprehensive evaluation report from observed and simulated Bio-Geo-Hydrological datasets.

positional arguments:
  input                 Path to the input data directory or a dictionary of file paths.
                        You can pass:
                          - a folder containing: obs_spatial, sim_spatial, obs_ts, sim_ts, and mask
                          - or a stringified dictionary (JSON or Python format) mapping these keys:
                            {
                              "obs_spatial": "obs_spatial.nc",
                              "sim_spatial": "sim_spatial.nc",
                              "obs_ts": "obs_timeseries.csv",
                              "sim_ts": "sim_timeseries.csv",
                              "mask": "mask.nc"
                            }

options:
  -h, --help            Show this help message and exit
  --output-dir path     Destination folder for report and plots (default: ./REPORT)
  --check               Validate input files and structure only, no report generation
  --no-pdf              Skip PDF generation, only output plots and dataframes
  --verbose             Enable detailed logging
  --open-report         Automatically open the PDF report if generated
  --variable var_name   Name of the target variable (e.g. "Chlorophyll-a")
  --unit unit_str       Unit of the variable (e.g. "mg/L", "m3/s"), LaTeX-ready
  --no-banner           Suppress ASCII banner (useful for batch jobs)
  --info                Show program description and exit
  --version             Show version and exit

Examples

Minimal Run (Interactive)
GenerateReport ./data
With Output Directory & No PDF
GenerateReport ./data --output-dir ./results --no-pdf
Using a JSON-Style Dictionary
GenerateReport "{ \"obs_spatial\": \"obs.nc\", \"sim_spatial\": \"sim.nc\", \"obs_ts\": \"obs.csv\", \"sim_ts\": \"sim.csv\", \"mask\": \"mask.nc\" }"
Quiet Batch Run (No Banner, Auto Open Report)
GenerateReport ./data --no-banner --open-report

For example usage of the singular functions (sans the report generation ones) availbale in the repository, and generally for in-script import and usage, please refer to the test cases available in the Test_cases/ folder and their respective TEST_CASES_README file.


Test Cases and Pytests

This repository includes a suite of example routines and automated tests to ensure the correct functionality of its components. All tests are located in the Test_cases/ directory.


Test Case Scripts

These are step-by-step, verbose scripts that demonstrate how to apply the tools for data cleaning, analysis, and reporting. They are ideal for understanding the intended usage.

Available Test Cases

  • Data_cleaner_setupper.py
    Demonstrates how to clean and prepare datasets for analysis.
    Includes the MATLAB script Interpolator_v2.m to perform bilinear interpolation on observed datasets.

  • SST_data_analyzer.py & CHL_data_analyzer.py
    Practical examples of analysis workflows using Sea Surface Temperature and Chlorophyll-a datasets.
    These are simplified and didactical illustrations of what the Report_generator submodule automates.

  • Benthic_layer.py
    Focuses on extracting and analyzing bottom ฯƒ-layers, emphasizing dense water formation and bio-geochemical tracers near the seabed.


Pytests and Code Quality

Automated testing ensures reliability and stability of the modules, using:

You can run them via:

pytest
flake8 src/

These tools verify logic correctness, class behavior, and code style compliance.


Code Quality Reports

This project is continuously monitored with external quality and coverage tools:

Codacy Codebeat Codecov Documentation
Codacy Badge codebeat badge codecov Documentation Status

Bibliography

The Northern Adriatic Forecasting System for Circulation and Biogeochemistry: Implementation and Preliminary Results (Scroccaro I et al., 2022)

Comparison of different efficiency criteria for hydrological model assessment (Krause P. et al., 2005)

Summary diagrams for coupled hydrodynamic-ecosystem model skill assessment (Jolliff et al., 2008)

The International Thermodynamic Equation of Seawater 2010 (TEOS-10): Calculation and Use of Thermodynamic Properties (McDougall et al., 2010)

Defining a Simplified Yet โ€œRealisticโ€ Equation of State for Seawater (Roquet et al., 2015)

Climatological analysis of the Adriatic Sea thermohaline characteristics (Giorgietti A., 1998)

Evaluation of different Maritime rapid environmental assessment procedures with a focus on acoustic performance (Oddo et al., 2022)

A study of the hydrographic conditions in the Adriatic Sea from numerical modelling and direct observations (2000โ€“2008) (Oddo et al., 2011)

About

Tools for the analysis and validation of Bio-Geo-Hydrological simulations and other climatological data

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors