v5.3.0: Standard names pipeline, DD lifecycle, and CI hardening by Simon-McIntosh · Pull Request #35 · iterorganization/imas-codex

Simon-McIntosh · 2026-04-10T14:55:06Z

v5.3.0 Release

37 commits | 83 files changed | +9,385 / -2,113 lines

Standard Names Pipeline (14 commits)

Complete implementation of the sn CLI subcommand group for generating, validating, and managing IMAS standard names:

Schema: Extended StandardName with rich fields (physics_domain, grammar decomposition, imas_paths, confidence) and fixed persistence bugs
Mint pipeline (sn mint): Renamed from sn build for clarity. LLM-powered generation of standard names from DD paths or facility signals via compose/validate workers with extended SNCandidate fields
Publish (sn publish): Lossless export to YAML catalog with rich fields preserved for human review
Import (sn import): Catalog feedback loop with version tracking and --check mode for dry-run validation. Catalog is authoritative on import
Lifecycle commands: sn reset and sn clear for managing standard name lifecycle states
MCP tools: search_standard_names, fetch_standard_names, list_standard_names exposed via the MCP server with benchmark quality tiers

Standard Names Benchmarking (5 commits)

Evaluation framework for measuring LLM standard name generation quality:

Reference dataset expanded to 52 entries across 8 IDSs with calibration dataset
Enhanced reviewer with cache reporting and model selection runbook
Prompt parity between benchmark and mint pipeline ensured

DD Lifecycle & MCP Tool Improvements (5 commits)

Lifecycle inheritance: lifecycle_status resolved at build time from parent IDS lifecycle, backfilled into graph. NULL treated as active in search/list filters
Lifecycle surfaced: check_dd_paths, get_dd_version_context, and get_dd_changelog now report lifecycle status
MCP tool rename: Consistent dd/ids naming convention across all tools
PhysicsDomain: Imported from imas-standard-names PyPI package instead of local definition
dd-only gating: get_graph_schema hidden in dd-only mode

Tests (2 commits)

Full end-to-end TestE2ERoundTrip for standard name lifecycle (mint→publish→edit→import)
Embedding coverage and coalesce safety integration tests for graph data quality

CI/CD & Infrastructure (7 commits)

GitHub Actions updated to Node.js 24 compatible versions
setup-uv pinned to immutable v8.0.0 tag
Empty env: blocks removed (broke CI workflows)
Dead IMAS_DD_VERSION removed from container build
ACR image name hardcoded for fork RC builds to reach Azure
Smoke test health check timeout increased
Release CLI: relaxed clean worktree check for RC releases

Configuration & Documentation (6 commits)

MCP config migrated to project root with .vscode symlink
AGENTS.md updated with standard names CLI, lifecycle, MCP tools, and fork/main workflow
Feature plans and implementation order documented

…ence bugs Phase 1: - Add 12 rich fields to StandardName schema (documentation, kind, tags, links, ids_paths, validity_domain, constraints, subject, component, coordinate, position, process) - Add StandardNameKind enum (scalar/vector/metadata) - Fix StandardNameReviewStatus: rename candidate→drafted, add published - Fix MEASURES→HAS_STANDARD_NAME in schema doc and signals.py query Phase 4: - Fix coalesce bug in write_standard_names — all fields use coalesce(new, existing) to prevent data loss on re-runs - Write all rich fields to graph - Create CANONICAL_UNITS relationship per schema range convention - Wire embedding generation in persist_worker Tests: - Add tests/sn/test_graph_ops.py (12 tests) covering coalesce, DD/signal/unit relationships, and query filtering - Add tests/sn/conftest.py with shared fixtures - Update test_publish.py for candidate→drafted rename

…or schema consistency

Fork CI builds used github.repository for ACR image path, producing simon-mcintosh/imas-codex instead of iterorganization/imas-codex. Azure watches the upstream path only, so fork RCs were invisible. Split into IMAGE_NAME (per-fork, for GHCR) and ACR_IMAGE_NAME (hardcoded upstream path, for ACR). GHCR stays per-fork since each fork has its own container registry namespace.

Replace codegen'd PhysicsDomain enum (22 values from LinkML schema) with import from imas-standard-names PyPI package (32 values, StrEnum). Deleted: physics_domains.yaml, gen_physics_domains.py, domains.yaml Removed: codegen steps from build_models.py and hatch_build_hooks.py Added: imas_codex/core/physics_domain.py as tracked re-export file BREAKING CHANGE: PhysicsDomain enum now has 32 values (was 22). New values: core_plasma_physics, fast_particles, runaway_electrons, waves, fueling, plasma_initiation, spectroscopy, neutronics, gyrokinetics, plasma_measurement_diagnostics.

The IMAS_DD_VERSION env var and build-arg were passed to docker build but the Dockerfile never declared a matching ARG — it was dead code. The container gets its DD version from the graph data loaded from GHCR, not from a build argument. The DD version single source of truth is pyproject.toml under [tool.imas-codex.data-dictionary].version, read by get_dd_version() at runtime. The test workflow correctly uses IMAS_DD_VERSION as an env var override for multi-version matrix testing.

Replace multi-branch workflow with fork-based main-only workflow. All development happens on fork's main branch — no feature branches. Document ACR deployment path and Azure test URL. Add rule against pushing same tag to both origin and upstream.

Includes: - fix: hardcode ACR image name so fork RC builds reach Azure - fix: remove dead IMAS_DD_VERSION from container build workflow - docs: update AGENTS.md with fork/main workflow - refactor: import PhysicsDomain from imas-standard-names # Conflicts: # pyproject.toml # uv.lock

actions/checkout v4→v6, actions/upload-artifact v4→v7, actions/cache v4→v5, astral-sh/setup-uv v5→v8, codecov/codecov-action v4→v6, softprops/action-gh-release v1→v2, actions/attest-build-provenance v1→v4. Remove FORCE_JAVASCRIPT_ACTIONS_TO_NODE24 env var — no longer needed with native Node 24 actions.

…date workers Phase 2: Add description, documentation, unit, kind, tags, links, ids_paths, validity_domain, constraints to SNCandidate model. Update compose_worker to pass all fields through to state.composed. Phase 3: Enhance compose_system prompt with rich output format, documentation template, tags vocabulary, kind rules, links guidance. Phase 5: Add soft validation checks for description length, doc length, unit validity, kind enum, tags vocabulary, links references.

Import reviewed YAML catalog entries back into the graph as accepted StandardName nodes. Derives grammar fields via name parsing, maps catalog fields to graph schema, preserves graph-only fields. Catalog-owned fields (description, documentation, kind, tags, etc.) use direct SET for authoritative overwrite. Graph-only fields (embedding, model, generated_at) are preserved via coalesce. Includes dry-run mode, tag filtering, and comprehensive tests.

Phase 2 of catalog feedback import: - Add catalog_commit_sha and imported_at to StandardName schema - Resolve git HEAD SHA of catalog repo at import time - Store catalog_commit_sha on each imported node - Add check_catalog() for catalog-vs-graph sync comparison - Add --check flag to sn import-catalog CLI command - Report only-in-catalog, only-in-graph, and diverged entries - 21 new tests covering SHA resolution, version tracking, idempotency, check mode, and field normalization (49 total)

Rename CLI command for symmetry with sn publish: - @sn.command("import-catalog") → @sn.command("import") - sn_import_catalog() → sn_import() - Update help text examples and schema description - Add TestPublishImportRoundTrip with 3 round-trip tests

- Add rich fields to SNPublishEntry: documentation, links, ids_paths, constraints, validity_domain; change kind default from 'physical' to 'scalar' - Rewrite generate_yaml_entry() to include all catalog fields; empty optional fields omitted from output; links serialized as [{name: ...}] - Update generate_catalog_files() to group entries into tag-based subdirectories (primary tag / name.yaml); untagged -> unscoped/ - Update graph_records_to_entries() to carry rich fields through from graph records without loss - Update get_validated_standard_names() with review_status filter, CANONICAL_UNITS traversal, and full rich-field RETURN clause - Add update_review_status() to graph_ops.py for batch status updates - Call update_review_status() in CLI sn publish after YAML generation - Update check_catalog_duplicates() to use rglob for subdirectory scan - Update tests: fix kind references, add directory structure tests, rich-field round-trip tests, subdirectory duplicate detection (46 tests)

- Add sn_tools.py with search/fetch/list standard name tools following the search_tools.py pattern (hybrid vector+keyword search) - Register search_standard_names, fetch_standard_names, list_standard_names in server.py (available in both dd_only and full mode) - Add benchmark_labels.yaml with quality tier anchors (outstanding/good/adequate/poor) - Add load_quality_labels() and score_with_reviewer() to benchmark.py - Add --reviewer-model CLI option to sn benchmark command - Extend ModelResult with quality_scores, quality_distribution, avg_quality_score, avg_doc_length, avg_fields_populated fields - Update render_comparison_table() to show quality distribution table when reviewer used - Add tests: test_sn_tools.py (38 tests) and TestQualityLabels/TestReviewerModelCLI in test_benchmark.py (9 tests)

Adds tests/sn/test_integration.py with two test classes: TestEmbeddingCoverage: - test_write_preserves_existing_embedding: asserts write_standard_names Cypher never sets sn.embedding from a batch param - test_import_preserves_existing_embedding: asserts _write_catalog_entries Cypher uses coalesce(sn.embedding, null) and coalesce(sn.embedded_at, null) - test_embedding_field_not_in_write_batch: asserts no 'embedding' key appears in the gc.query batch dict TestCoalesceSafety: - test_build_does_not_erase_imported_data: verifies coalesce(b.field, sn.field) for all 15 optional fields in the MERGE SET clause - test_build_with_none_fields_preserves_graph: verifies absent fields appear as None in batch (required for coalesce evaluation) - test_created_at_preserved_on_rewrite: asserts coalesce(sn.created_at, datetime()) pattern preserves original creation timestamp - test_import_then_build_preserves_catalog_fields: end-to-end mock walk through import then build, confirming coalesce semantics in both

- get_dd_overview → get_dd_catalog (remove query/include_unit_stats params) - analyze_dd_structure + get_ids_structure → get_ids_summary (trimmed output) - get_dd_path_context → find_related_dd_paths - Remove export_imas_ids/export_imas_domain from MCP registration - Remove facade delegation tests - Update all backend methods, REPL functions, formatters, and tests - Rename format_search_imas_report → format_search_dd_report - Include host.py migration cleanup

…import)

GraphClient.query() signature is (cypher, **params) not (cypher, dict). All 4 calls were passing a dict as a positional argument, causing 'takes 2 positional arguments but 3 were given' at runtime.

- Add Standard Names section with CLI commands table (build, publish, import, status, benchmark) - Document StandardName lifecycle (drafted → published → accepted) - Document write semantics: build (coalesce) vs import (authoritative) - Document MCP tools (search, fetch, list standard names) - Document StandardName schema and key relationships - Update plans/README.md: mark features 11-14 as Done

lifecycle_status is NULL on 98.5% of IMASNode nodes (19,734 of 20,037). Per schema: NULL means 'inherits IDS-level lifecycle' which defaults to active. The filter 'p.lifecycle_status = active' matched nothing. Fix: lifecycle_filter=active now matches NULL OR active. Also wires physics_domain/lifecycle_filter into search_dd_paths backend (was accepted by MCP tool but silently ignored).

… graph Resolve lifecycle_status for all IMASNode data fields by inheriting from the parent IDS when not explicitly set in the DD XML. Previously 98.5% of fields had NULL lifecycle_status, requiring runtime NULL-handling that incorrectly treated 12,264 alpha-inherited nodes as active. Build pipeline: _batch_create_path_nodes now accepts ids_info and resolves inheritance post-version-diff to avoid false IMASNodeChange records. Graph migration: backfilled 19,734 NULL nodes via batched Cypher (7,470 active, 12,264 alpha, 277 obsolescent, 26 alpha-override). Reverted broken NULL-or-active workaround in list_dd_paths and search_dd_paths post-filter — now simple equality checks. Formatter enhancements: - Catalog: [alpha] tag on non-active IDS entries - List: [alpha]/[obsolescent] suffix on non-active paths - IDS summary: lifecycle distribution of child paths - list_dd_paths query now returns lifecycle_status in path_details Updated test_field_lifecycle_status to assert all data nodes have explicit lifecycle_status and valid values include active. Updated imas_dd.yaml schema description to reflect build-time resolution.

…aph_schema on dd-only - Add lifecycle_status to check_dd_paths query, result model, and formatter - Add lifecycle_status to get_dd_version_context per-path query and formatter - Add lifecycle_status column to get_dd_changelog query and formatter table - Gate get_graph_schema behind dd-only guard (REPL companion, not needed without REPL)

Rename the CLI command, pipeline function, docstrings, and all plan/documentation references from 'build' to 'mint'. The term better reflects the nature of standard name generation.

RC releases now warn on dirty worktrees instead of failing, since parallel agents frequently modify files concurrently. Final releases still require a clean worktree.

Consolidate plans 16 (benchmark parity), 17 (lifecycle management), and 18 (calibration) into a single fleet-ready plan with 4 phases and 6 agent dispatches. Incorporates rubber-duck critique: relationship-first deletion, DD-only scope, split calibration from reference expansion, cache smoke test in Phase 1.

Add reset_standard_names() and clear_standard_names() with relationship-first deletion safety model. Add sn reset, sn clear CLI commands and --reset-to option on sn mint.

Replace build_grammar_context() with build_compose_context() for rich grammar context. Add system/user message split for prompt caching. Preserve cluster_context through extraction.

Empty 'env:' with no values causes GitHub Actions to reject the workflow YAML before any jobs start (0 jobs, immediate failure).

Add entries for core_transport (heat/particle flux for electron and ion), mhd_linear (growth_rate, mhd_frequency), nbi (power, energy of NBI), and edge_profiles (electron temperature and density at edge region). Expand core_profiles from 8 to 12 entries: parallel electric field, bootstrap and ohmic current density, ion toroidal velocity. Expand magnetics from 2 to 6 entries: flux_loop poloidal flux, rogowski_coil plasma current, total plasma current, diamagnetic flux. Expand summary from 2 to 4 entries: toroidal beta, energy confinement time. Add 2 more equilibrium entries: profiles_1d psi, magnetic_axis vertical position. Fix physically incorrect rogowski_coil entry on magnetic_axis/r: replaced geometric_base/object combo with major_radius at MAGNETIC_AXIS position.

Create benchmark_calibration.yaml with 15 entries across 4 quality tiers. Replace inline reviewer rubric with Jinja2 template (sn/review_benchmark). Add 5-dimensional scoring: grammar, semantic, docs, convention, completeness. Retire benchmark_labels.yaml in favor of calibration dataset.

Add LLMResult class to llm.py for backward-compatible cache token exposure — supports 3-tuple unpacking while carrying cache_read_tokens and cache_creation_tokens from provider prompt caching. Add extract_cache_tokens() public function mirroring _log_cache_metrics extraction logic but returning values instead of logging. Add cache_read_tokens and cache_creation_tokens to ModelResult dataclass. Accumulate cache tokens in _run_model() using getattr fallback for mock compatibility. Display Cache % column in benchmark comparison table. Create model selection runbook with CLI commands, cost guidance, approved model list, decision criteria table, and cache optimization tips.

Update AGENTS.md with sn mint (renamed from sn build), sn reset, sn clear, --reset-to flag, benchmark cache reporting, and 5-dimensional scoring. Add SN module paths to project-dev skill, LLM proxy note to service-ops skill, and SN key files table to engineer agent. Delete superseded plans 16, 17, 18. Mark Plan 19 complete.

Simon-McIntosh added 30 commits April 10, 2026 06:30

refactor(sn): rename ids_paths to imas_paths and add physics_domain f…

a2b28a3

…or schema consistency

docs: plan to import PhysicsDomain from imas-standard-names

952f9ec

chore: increase smoke test health check timeout margin

840d54c

chore: migrate MCP config to project root with .vscode symlink

a102951

test: add TestE2ERoundTrip for full SN lifecycle (build→publish→edit→…

6b0c3cb

…import)

fix: use keyword args for GraphClient.query() in sn_tools

a2a89ec

GraphClient.query() signature is (cypher, **params) not (cypher, dict). All 4 calls were passing a dict as a positional argument, causing 'takes 2 positional arguments but 3 were given' at runtime.

fix: pin setup-uv action to v8.0.0 (immutable tag)

b2a2b2b

docs: add standard name feature plans

8fdd57e

docs: update standard name implementation order

d1fe720

feat: rename sn build to sn mint

312ce76

Rename the CLI command, pipeline function, docstrings, and all plan/documentation references from 'build' to 'mint'. The term better reflects the nature of standard name generation.

fix: relax clean worktree check for RC releases

256e519

RC releases now warn on dirty worktrees instead of failing, since parallel agents frequently modify files concurrently. Final releases still require a clean worktree.

Simon-McIntosh added 8 commits April 10, 2026 16:01

feat: add sn reset and sn clear lifecycle commands

5d03485

Add reset_standard_names() and clear_standard_names() with relationship-first deletion safety model. Add sn reset, sn clear CLI commands and --reset-to option on sn mint.

feat: fix benchmark prompt parity with mint pipeline

4d2eedd

Replace build_grammar_context() with build_compose_context() for rich grammar context. Add system/user message split for prompt caching. Preserve cluster_context through extraction.

fix: remove empty env blocks breaking CI workflows

8499610

Empty 'env:' with no values causes GitHub Actions to reject the workflow YAML before any jobs start (0 jobs, immediate failure).

Simon-McIntosh merged commit 5b48611 into iterorganization:main Apr 10, 2026
1 of 5 checks passed

Simon-McIntosh temporarily deployed to github-pages April 10, 2026 14:55 — with GitHub Pages Inactive

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v5.3.0: Standard names pipeline, DD lifecycle, and CI hardening#35

v5.3.0: Standard names pipeline, DD lifecycle, and CI hardening#35
Simon-McIntosh merged 38 commits intoiterorganization:mainfrom
Simon-McIntosh:main

Simon-McIntosh commented Apr 10, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Simon-McIntosh commented Apr 10, 2026

v5.3.0 Release

Standard Names Pipeline (14 commits)

Standard Names Benchmarking (5 commits)

DD Lifecycle & MCP Tool Improvements (5 commits)

Tests (2 commits)

CI/CD & Infrastructure (7 commits)

Configuration & Documentation (6 commits)

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant