Skip to content

v5.3.0: Standard names pipeline, DD lifecycle, and CI hardening#35

Merged
Simon-McIntosh merged 38 commits intoiterorganization:mainfrom
Simon-McIntosh:main
Apr 10, 2026
Merged

v5.3.0: Standard names pipeline, DD lifecycle, and CI hardening#35
Simon-McIntosh merged 38 commits intoiterorganization:mainfrom
Simon-McIntosh:main

Conversation

@Simon-McIntosh
Copy link
Copy Markdown
Collaborator

v5.3.0 Release

37 commits | 83 files changed | +9,385 / -2,113 lines

Standard Names Pipeline (14 commits)

Complete implementation of the sn CLI subcommand group for generating, validating, and managing IMAS standard names:

  • Schema: Extended StandardName with rich fields (physics_domain, grammar decomposition, imas_paths, confidence) and fixed persistence bugs
  • Mint pipeline (sn mint): Renamed from sn build for clarity. LLM-powered generation of standard names from DD paths or facility signals via compose/validate workers with extended SNCandidate fields
  • Publish (sn publish): Lossless export to YAML catalog with rich fields preserved for human review
  • Import (sn import): Catalog feedback loop with version tracking and --check mode for dry-run validation. Catalog is authoritative on import
  • Lifecycle commands: sn reset and sn clear for managing standard name lifecycle states
  • MCP tools: search_standard_names, fetch_standard_names, list_standard_names exposed via the MCP server with benchmark quality tiers

Standard Names Benchmarking (5 commits)

Evaluation framework for measuring LLM standard name generation quality:

  • Reference dataset expanded to 52 entries across 8 IDSs with calibration dataset
  • Enhanced reviewer with cache reporting and model selection runbook
  • Prompt parity between benchmark and mint pipeline ensured

DD Lifecycle & MCP Tool Improvements (5 commits)

  • Lifecycle inheritance: lifecycle_status resolved at build time from parent IDS lifecycle, backfilled into graph. NULL treated as active in search/list filters
  • Lifecycle surfaced: check_dd_paths, get_dd_version_context, and get_dd_changelog now report lifecycle status
  • MCP tool rename: Consistent dd/ids naming convention across all tools
  • PhysicsDomain: Imported from imas-standard-names PyPI package instead of local definition
  • dd-only gating: get_graph_schema hidden in dd-only mode

Tests (2 commits)

  • Full end-to-end TestE2ERoundTrip for standard name lifecycle (mint→publish→edit→import)
  • Embedding coverage and coalesce safety integration tests for graph data quality

CI/CD & Infrastructure (7 commits)

  • GitHub Actions updated to Node.js 24 compatible versions
  • setup-uv pinned to immutable v8.0.0 tag
  • Empty env: blocks removed (broke CI workflows)
  • Dead IMAS_DD_VERSION removed from container build
  • ACR image name hardcoded for fork RC builds to reach Azure
  • Smoke test health check timeout increased
  • Release CLI: relaxed clean worktree check for RC releases

Configuration & Documentation (6 commits)

  • MCP config migrated to project root with .vscode symlink
  • AGENTS.md updated with standard names CLI, lifecycle, MCP tools, and fork/main workflow
  • Feature plans and implementation order documented

…ence bugs

Phase 1:
- Add 12 rich fields to StandardName schema (documentation, kind, tags,
  links, ids_paths, validity_domain, constraints, subject, component,
  coordinate, position, process)
- Add StandardNameKind enum (scalar/vector/metadata)
- Fix StandardNameReviewStatus: rename candidate→drafted, add published
- Fix MEASURES→HAS_STANDARD_NAME in schema doc and signals.py query

Phase 4:
- Fix coalesce bug in write_standard_names — all fields use
  coalesce(new, existing) to prevent data loss on re-runs
- Write all rich fields to graph
- Create CANONICAL_UNITS relationship per schema range convention
- Wire embedding generation in persist_worker

Tests:
- Add tests/sn/test_graph_ops.py (12 tests) covering coalesce,
  DD/signal/unit relationships, and query filtering
- Add tests/sn/conftest.py with shared fixtures
- Update test_publish.py for candidate→drafted rename
Fork CI builds used github.repository for ACR image path, producing
simon-mcintosh/imas-codex instead of iterorganization/imas-codex.
Azure watches the upstream path only, so fork RCs were invisible.

Split into IMAGE_NAME (per-fork, for GHCR) and ACR_IMAGE_NAME
(hardcoded upstream path, for ACR). GHCR stays per-fork since
each fork has its own container registry namespace.
Replace codegen'd PhysicsDomain enum (22 values from LinkML schema)
with import from imas-standard-names PyPI package (32 values, StrEnum).

Deleted: physics_domains.yaml, gen_physics_domains.py, domains.yaml
Removed: codegen steps from build_models.py and hatch_build_hooks.py
Added: imas_codex/core/physics_domain.py as tracked re-export file

BREAKING CHANGE: PhysicsDomain enum now has 32 values (was 22).
New values: core_plasma_physics, fast_particles, runaway_electrons,
waves, fueling, plasma_initiation, spectroscopy, neutronics,
gyrokinetics, plasma_measurement_diagnostics.
The IMAS_DD_VERSION env var and build-arg were passed to docker build
but the Dockerfile never declared a matching ARG — it was dead code.
The container gets its DD version from the graph data loaded from GHCR,
not from a build argument.

The DD version single source of truth is pyproject.toml under
[tool.imas-codex.data-dictionary].version, read by get_dd_version()
at runtime. The test workflow correctly uses IMAS_DD_VERSION as an
env var override for multi-version matrix testing.
Replace multi-branch workflow with fork-based main-only workflow.
All development happens on fork's main branch — no feature branches.
Document ACR deployment path and Azure test URL. Add rule against
pushing same tag to both origin and upstream.
Includes:
- fix: hardcode ACR image name so fork RC builds reach Azure
- fix: remove dead IMAS_DD_VERSION from container build workflow
- docs: update AGENTS.md with fork/main workflow
- refactor: import PhysicsDomain from imas-standard-names

# Conflicts:
#	pyproject.toml
#	uv.lock
actions/checkout v4→v6, actions/upload-artifact v4→v7,
actions/cache v4→v5, astral-sh/setup-uv v5→v8,
codecov/codecov-action v4→v6, softprops/action-gh-release v1→v2,
actions/attest-build-provenance v1→v4.

Remove FORCE_JAVASCRIPT_ACTIONS_TO_NODE24 env var — no longer needed
with native Node 24 actions.
…date workers

Phase 2: Add description, documentation, unit, kind, tags, links,
ids_paths, validity_domain, constraints to SNCandidate model.
Update compose_worker to pass all fields through to state.composed.

Phase 3: Enhance compose_system prompt with rich output format,
documentation template, tags vocabulary, kind rules, links guidance.

Phase 5: Add soft validation checks for description length, doc
length, unit validity, kind enum, tags vocabulary, links references.
Import reviewed YAML catalog entries back into the graph as accepted
StandardName nodes. Derives grammar fields via name parsing, maps
catalog fields to graph schema, preserves graph-only fields.

Catalog-owned fields (description, documentation, kind, tags, etc.)
use direct SET for authoritative overwrite. Graph-only fields
(embedding, model, generated_at) are preserved via coalesce.

Includes dry-run mode, tag filtering, and comprehensive tests.
Phase 2 of catalog feedback import:
- Add catalog_commit_sha and imported_at to StandardName schema
- Resolve git HEAD SHA of catalog repo at import time
- Store catalog_commit_sha on each imported node
- Add check_catalog() for catalog-vs-graph sync comparison
- Add --check flag to sn import-catalog CLI command
- Report only-in-catalog, only-in-graph, and diverged entries
- 21 new tests covering SHA resolution, version tracking,
  idempotency, check mode, and field normalization (49 total)
Rename CLI command for symmetry with sn publish:
- @sn.command("import-catalog") → @sn.command("import")
- sn_import_catalog() → sn_import()
- Update help text examples and schema description
- Add TestPublishImportRoundTrip with 3 round-trip tests
- Add rich fields to SNPublishEntry: documentation, links, ids_paths,
  constraints, validity_domain; change kind default from 'physical' to 'scalar'
- Rewrite generate_yaml_entry() to include all catalog fields; empty
  optional fields omitted from output; links serialized as [{name: ...}]
- Update generate_catalog_files() to group entries into tag-based
  subdirectories (primary tag / name.yaml); untagged -> unscoped/
- Update graph_records_to_entries() to carry rich fields through from
  graph records without loss
- Update get_validated_standard_names() with review_status filter,
  CANONICAL_UNITS traversal, and full rich-field RETURN clause
- Add update_review_status() to graph_ops.py for batch status updates
- Call update_review_status() in CLI sn publish after YAML generation
- Update check_catalog_duplicates() to use rglob for subdirectory scan
- Update tests: fix kind references, add directory structure tests,
  rich-field round-trip tests, subdirectory duplicate detection (46 tests)
- Add sn_tools.py with search/fetch/list standard name tools
  following the search_tools.py pattern (hybrid vector+keyword search)
- Register search_standard_names, fetch_standard_names, list_standard_names
  in server.py (available in both dd_only and full mode)
- Add benchmark_labels.yaml with quality tier anchors (outstanding/good/adequate/poor)
- Add load_quality_labels() and score_with_reviewer() to benchmark.py
- Add --reviewer-model CLI option to sn benchmark command
- Extend ModelResult with quality_scores, quality_distribution, avg_quality_score,
  avg_doc_length, avg_fields_populated fields
- Update render_comparison_table() to show quality distribution table when reviewer used
- Add tests: test_sn_tools.py (38 tests) and TestQualityLabels/TestReviewerModelCLI
  in test_benchmark.py (9 tests)
Adds tests/sn/test_integration.py with two test classes:

TestEmbeddingCoverage:
- test_write_preserves_existing_embedding: asserts write_standard_names
  Cypher never sets sn.embedding from a batch param
- test_import_preserves_existing_embedding: asserts _write_catalog_entries
  Cypher uses coalesce(sn.embedding, null) and coalesce(sn.embedded_at, null)
- test_embedding_field_not_in_write_batch: asserts no 'embedding' key
  appears in the gc.query batch dict

TestCoalesceSafety:
- test_build_does_not_erase_imported_data: verifies coalesce(b.field,
  sn.field) for all 15 optional fields in the MERGE SET clause
- test_build_with_none_fields_preserves_graph: verifies absent fields
  appear as None in batch (required for coalesce evaluation)
- test_created_at_preserved_on_rewrite: asserts coalesce(sn.created_at,
  datetime()) pattern preserves original creation timestamp
- test_import_then_build_preserves_catalog_fields: end-to-end mock walk
  through import then build, confirming coalesce semantics in both
- get_dd_overview → get_dd_catalog (remove query/include_unit_stats params)
- analyze_dd_structure + get_ids_structure → get_ids_summary (trimmed output)
- get_dd_path_context → find_related_dd_paths
- Remove export_imas_ids/export_imas_domain from MCP registration
- Remove facade delegation tests
- Update all backend methods, REPL functions, formatters, and tests
- Rename format_search_imas_report → format_search_dd_report
- Include host.py migration cleanup
GraphClient.query() signature is (cypher, **params) not (cypher, dict).
All 4 calls were passing a dict as a positional argument, causing
'takes 2 positional arguments but 3 were given' at runtime.
- Add Standard Names section with CLI commands table (build, publish,
  import, status, benchmark)
- Document StandardName lifecycle (drafted → published → accepted)
- Document write semantics: build (coalesce) vs import (authoritative)
- Document MCP tools (search, fetch, list standard names)
- Document StandardName schema and key relationships
- Update plans/README.md: mark features 11-14 as Done
lifecycle_status is NULL on 98.5% of IMASNode nodes (19,734 of 20,037).
Per schema: NULL means 'inherits IDS-level lifecycle' which defaults to
active. The filter 'p.lifecycle_status = active' matched nothing.

Fix: lifecycle_filter=active now matches NULL OR active.

Also wires physics_domain/lifecycle_filter into search_dd_paths backend
(was accepted by MCP tool but silently ignored).
… graph

Resolve lifecycle_status for all IMASNode data fields by inheriting
from the parent IDS when not explicitly set in the DD XML. Previously
98.5% of fields had NULL lifecycle_status, requiring runtime
NULL-handling that incorrectly treated 12,264 alpha-inherited nodes
as active.

Build pipeline: _batch_create_path_nodes now accepts ids_info and
resolves inheritance post-version-diff to avoid false IMASNodeChange
records.

Graph migration: backfilled 19,734 NULL nodes via batched Cypher
(7,470 active, 12,264 alpha, 277 obsolescent, 26 alpha-override).

Reverted broken NULL-or-active workaround in list_dd_paths and
search_dd_paths post-filter — now simple equality checks.

Formatter enhancements:
- Catalog: [alpha] tag on non-active IDS entries
- List: [alpha]/[obsolescent] suffix on non-active paths
- IDS summary: lifecycle distribution of child paths
- list_dd_paths query now returns lifecycle_status in path_details

Updated test_field_lifecycle_status to assert all data nodes have
explicit lifecycle_status and valid values include active.

Updated imas_dd.yaml schema description to reflect build-time
resolution.
…aph_schema on dd-only

- Add lifecycle_status to check_dd_paths query, result model, and formatter
- Add lifecycle_status to get_dd_version_context per-path query and formatter
- Add lifecycle_status column to get_dd_changelog query and formatter table
- Gate get_graph_schema behind dd-only guard (REPL companion, not needed without REPL)
Rename the CLI command, pipeline function, docstrings, and all
plan/documentation references from 'build' to 'mint'. The term
better reflects the nature of standard name generation.
RC releases now warn on dirty worktrees instead of failing,
since parallel agents frequently modify files concurrently.
Final releases still require a clean worktree.
Consolidate plans 16 (benchmark parity), 17 (lifecycle management),
and 18 (calibration) into a single fleet-ready plan with 4 phases
and 6 agent dispatches. Incorporates rubber-duck critique:
relationship-first deletion, DD-only scope, split calibration
from reference expansion, cache smoke test in Phase 1.
Add reset_standard_names() and clear_standard_names() with
relationship-first deletion safety model. Add sn reset, sn clear
CLI commands and --reset-to option on sn mint.
Replace build_grammar_context() with build_compose_context() for
rich grammar context. Add system/user message split for prompt
caching. Preserve cluster_context through extraction.
Empty 'env:' with no values causes GitHub Actions to reject
the workflow YAML before any jobs start (0 jobs, immediate failure).
Add entries for core_transport (heat/particle flux for electron and ion),
mhd_linear (growth_rate, mhd_frequency), nbi (power, energy of NBI),
and edge_profiles (electron temperature and density at edge region).

Expand core_profiles from 8 to 12 entries: parallel electric field,
bootstrap and ohmic current density, ion toroidal velocity.

Expand magnetics from 2 to 6 entries: flux_loop poloidal flux,
rogowski_coil plasma current, total plasma current, diamagnetic flux.

Expand summary from 2 to 4 entries: toroidal beta, energy confinement time.

Add 2 more equilibrium entries: profiles_1d psi, magnetic_axis vertical position.

Fix physically incorrect rogowski_coil entry on magnetic_axis/r: replaced
geometric_base/object combo with major_radius at MAGNETIC_AXIS position.
Create benchmark_calibration.yaml with 15 entries across 4 quality tiers.
Replace inline reviewer rubric with Jinja2 template (sn/review_benchmark).
Add 5-dimensional scoring: grammar, semantic, docs, convention, completeness.
Retire benchmark_labels.yaml in favor of calibration dataset.
Add LLMResult class to llm.py for backward-compatible cache token
exposure — supports 3-tuple unpacking while carrying cache_read_tokens
and cache_creation_tokens from provider prompt caching.

Add extract_cache_tokens() public function mirroring _log_cache_metrics
extraction logic but returning values instead of logging.

Add cache_read_tokens and cache_creation_tokens to ModelResult dataclass.
Accumulate cache tokens in _run_model() using getattr fallback for mock
compatibility. Display Cache % column in benchmark comparison table.

Create model selection runbook with CLI commands, cost guidance, approved
model list, decision criteria table, and cache optimization tips.
Update AGENTS.md with sn mint (renamed from sn build), sn reset, sn clear,
--reset-to flag, benchmark cache reporting, and 5-dimensional scoring.
Add SN module paths to project-dev skill, LLM proxy note to service-ops
skill, and SN key files table to engineer agent.
Delete superseded plans 16, 17, 18. Mark Plan 19 complete.
@Simon-McIntosh Simon-McIntosh merged commit 5b48611 into iterorganization:main Apr 10, 2026
1 of 5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant