v5.3.0: Standard names pipeline, DD lifecycle, and CI hardening#35
Merged
Simon-McIntosh merged 38 commits intoiterorganization:mainfrom Apr 10, 2026
Merged
v5.3.0: Standard names pipeline, DD lifecycle, and CI hardening#35Simon-McIntosh merged 38 commits intoiterorganization:mainfrom
Simon-McIntosh merged 38 commits intoiterorganization:mainfrom
Conversation
…ence bugs Phase 1: - Add 12 rich fields to StandardName schema (documentation, kind, tags, links, ids_paths, validity_domain, constraints, subject, component, coordinate, position, process) - Add StandardNameKind enum (scalar/vector/metadata) - Fix StandardNameReviewStatus: rename candidate→drafted, add published - Fix MEASURES→HAS_STANDARD_NAME in schema doc and signals.py query Phase 4: - Fix coalesce bug in write_standard_names — all fields use coalesce(new, existing) to prevent data loss on re-runs - Write all rich fields to graph - Create CANONICAL_UNITS relationship per schema range convention - Wire embedding generation in persist_worker Tests: - Add tests/sn/test_graph_ops.py (12 tests) covering coalesce, DD/signal/unit relationships, and query filtering - Add tests/sn/conftest.py with shared fixtures - Update test_publish.py for candidate→drafted rename
…or schema consistency
Fork CI builds used github.repository for ACR image path, producing simon-mcintosh/imas-codex instead of iterorganization/imas-codex. Azure watches the upstream path only, so fork RCs were invisible. Split into IMAGE_NAME (per-fork, for GHCR) and ACR_IMAGE_NAME (hardcoded upstream path, for ACR). GHCR stays per-fork since each fork has its own container registry namespace.
Replace codegen'd PhysicsDomain enum (22 values from LinkML schema) with import from imas-standard-names PyPI package (32 values, StrEnum). Deleted: physics_domains.yaml, gen_physics_domains.py, domains.yaml Removed: codegen steps from build_models.py and hatch_build_hooks.py Added: imas_codex/core/physics_domain.py as tracked re-export file BREAKING CHANGE: PhysicsDomain enum now has 32 values (was 22). New values: core_plasma_physics, fast_particles, runaway_electrons, waves, fueling, plasma_initiation, spectroscopy, neutronics, gyrokinetics, plasma_measurement_diagnostics.
The IMAS_DD_VERSION env var and build-arg were passed to docker build but the Dockerfile never declared a matching ARG — it was dead code. The container gets its DD version from the graph data loaded from GHCR, not from a build argument. The DD version single source of truth is pyproject.toml under [tool.imas-codex.data-dictionary].version, read by get_dd_version() at runtime. The test workflow correctly uses IMAS_DD_VERSION as an env var override for multi-version matrix testing.
Replace multi-branch workflow with fork-based main-only workflow. All development happens on fork's main branch — no feature branches. Document ACR deployment path and Azure test URL. Add rule against pushing same tag to both origin and upstream.
Includes: - fix: hardcode ACR image name so fork RC builds reach Azure - fix: remove dead IMAS_DD_VERSION from container build workflow - docs: update AGENTS.md with fork/main workflow - refactor: import PhysicsDomain from imas-standard-names # Conflicts: # pyproject.toml # uv.lock
actions/checkout v4→v6, actions/upload-artifact v4→v7, actions/cache v4→v5, astral-sh/setup-uv v5→v8, codecov/codecov-action v4→v6, softprops/action-gh-release v1→v2, actions/attest-build-provenance v1→v4. Remove FORCE_JAVASCRIPT_ACTIONS_TO_NODE24 env var — no longer needed with native Node 24 actions.
…date workers Phase 2: Add description, documentation, unit, kind, tags, links, ids_paths, validity_domain, constraints to SNCandidate model. Update compose_worker to pass all fields through to state.composed. Phase 3: Enhance compose_system prompt with rich output format, documentation template, tags vocabulary, kind rules, links guidance. Phase 5: Add soft validation checks for description length, doc length, unit validity, kind enum, tags vocabulary, links references.
Import reviewed YAML catalog entries back into the graph as accepted StandardName nodes. Derives grammar fields via name parsing, maps catalog fields to graph schema, preserves graph-only fields. Catalog-owned fields (description, documentation, kind, tags, etc.) use direct SET for authoritative overwrite. Graph-only fields (embedding, model, generated_at) are preserved via coalesce. Includes dry-run mode, tag filtering, and comprehensive tests.
Phase 2 of catalog feedback import: - Add catalog_commit_sha and imported_at to StandardName schema - Resolve git HEAD SHA of catalog repo at import time - Store catalog_commit_sha on each imported node - Add check_catalog() for catalog-vs-graph sync comparison - Add --check flag to sn import-catalog CLI command - Report only-in-catalog, only-in-graph, and diverged entries - 21 new tests covering SHA resolution, version tracking, idempotency, check mode, and field normalization (49 total)
Rename CLI command for symmetry with sn publish:
- @sn.command("import-catalog") → @sn.command("import")
- sn_import_catalog() → sn_import()
- Update help text examples and schema description
- Add TestPublishImportRoundTrip with 3 round-trip tests
- Add rich fields to SNPublishEntry: documentation, links, ids_paths,
constraints, validity_domain; change kind default from 'physical' to 'scalar'
- Rewrite generate_yaml_entry() to include all catalog fields; empty
optional fields omitted from output; links serialized as [{name: ...}]
- Update generate_catalog_files() to group entries into tag-based
subdirectories (primary tag / name.yaml); untagged -> unscoped/
- Update graph_records_to_entries() to carry rich fields through from
graph records without loss
- Update get_validated_standard_names() with review_status filter,
CANONICAL_UNITS traversal, and full rich-field RETURN clause
- Add update_review_status() to graph_ops.py for batch status updates
- Call update_review_status() in CLI sn publish after YAML generation
- Update check_catalog_duplicates() to use rglob for subdirectory scan
- Update tests: fix kind references, add directory structure tests,
rich-field round-trip tests, subdirectory duplicate detection (46 tests)
- Add sn_tools.py with search/fetch/list standard name tools following the search_tools.py pattern (hybrid vector+keyword search) - Register search_standard_names, fetch_standard_names, list_standard_names in server.py (available in both dd_only and full mode) - Add benchmark_labels.yaml with quality tier anchors (outstanding/good/adequate/poor) - Add load_quality_labels() and score_with_reviewer() to benchmark.py - Add --reviewer-model CLI option to sn benchmark command - Extend ModelResult with quality_scores, quality_distribution, avg_quality_score, avg_doc_length, avg_fields_populated fields - Update render_comparison_table() to show quality distribution table when reviewer used - Add tests: test_sn_tools.py (38 tests) and TestQualityLabels/TestReviewerModelCLI in test_benchmark.py (9 tests)
Adds tests/sn/test_integration.py with two test classes: TestEmbeddingCoverage: - test_write_preserves_existing_embedding: asserts write_standard_names Cypher never sets sn.embedding from a batch param - test_import_preserves_existing_embedding: asserts _write_catalog_entries Cypher uses coalesce(sn.embedding, null) and coalesce(sn.embedded_at, null) - test_embedding_field_not_in_write_batch: asserts no 'embedding' key appears in the gc.query batch dict TestCoalesceSafety: - test_build_does_not_erase_imported_data: verifies coalesce(b.field, sn.field) for all 15 optional fields in the MERGE SET clause - test_build_with_none_fields_preserves_graph: verifies absent fields appear as None in batch (required for coalesce evaluation) - test_created_at_preserved_on_rewrite: asserts coalesce(sn.created_at, datetime()) pattern preserves original creation timestamp - test_import_then_build_preserves_catalog_fields: end-to-end mock walk through import then build, confirming coalesce semantics in both
- get_dd_overview → get_dd_catalog (remove query/include_unit_stats params) - analyze_dd_structure + get_ids_structure → get_ids_summary (trimmed output) - get_dd_path_context → find_related_dd_paths - Remove export_imas_ids/export_imas_domain from MCP registration - Remove facade delegation tests - Update all backend methods, REPL functions, formatters, and tests - Rename format_search_imas_report → format_search_dd_report - Include host.py migration cleanup
GraphClient.query() signature is (cypher, **params) not (cypher, dict). All 4 calls were passing a dict as a positional argument, causing 'takes 2 positional arguments but 3 were given' at runtime.
- Add Standard Names section with CLI commands table (build, publish, import, status, benchmark) - Document StandardName lifecycle (drafted → published → accepted) - Document write semantics: build (coalesce) vs import (authoritative) - Document MCP tools (search, fetch, list standard names) - Document StandardName schema and key relationships - Update plans/README.md: mark features 11-14 as Done
lifecycle_status is NULL on 98.5% of IMASNode nodes (19,734 of 20,037). Per schema: NULL means 'inherits IDS-level lifecycle' which defaults to active. The filter 'p.lifecycle_status = active' matched nothing. Fix: lifecycle_filter=active now matches NULL OR active. Also wires physics_domain/lifecycle_filter into search_dd_paths backend (was accepted by MCP tool but silently ignored).
… graph Resolve lifecycle_status for all IMASNode data fields by inheriting from the parent IDS when not explicitly set in the DD XML. Previously 98.5% of fields had NULL lifecycle_status, requiring runtime NULL-handling that incorrectly treated 12,264 alpha-inherited nodes as active. Build pipeline: _batch_create_path_nodes now accepts ids_info and resolves inheritance post-version-diff to avoid false IMASNodeChange records. Graph migration: backfilled 19,734 NULL nodes via batched Cypher (7,470 active, 12,264 alpha, 277 obsolescent, 26 alpha-override). Reverted broken NULL-or-active workaround in list_dd_paths and search_dd_paths post-filter — now simple equality checks. Formatter enhancements: - Catalog: [alpha] tag on non-active IDS entries - List: [alpha]/[obsolescent] suffix on non-active paths - IDS summary: lifecycle distribution of child paths - list_dd_paths query now returns lifecycle_status in path_details Updated test_field_lifecycle_status to assert all data nodes have explicit lifecycle_status and valid values include active. Updated imas_dd.yaml schema description to reflect build-time resolution.
…aph_schema on dd-only - Add lifecycle_status to check_dd_paths query, result model, and formatter - Add lifecycle_status to get_dd_version_context per-path query and formatter - Add lifecycle_status column to get_dd_changelog query and formatter table - Gate get_graph_schema behind dd-only guard (REPL companion, not needed without REPL)
Rename the CLI command, pipeline function, docstrings, and all plan/documentation references from 'build' to 'mint'. The term better reflects the nature of standard name generation.
RC releases now warn on dirty worktrees instead of failing, since parallel agents frequently modify files concurrently. Final releases still require a clean worktree.
Consolidate plans 16 (benchmark parity), 17 (lifecycle management), and 18 (calibration) into a single fleet-ready plan with 4 phases and 6 agent dispatches. Incorporates rubber-duck critique: relationship-first deletion, DD-only scope, split calibration from reference expansion, cache smoke test in Phase 1.
Add reset_standard_names() and clear_standard_names() with relationship-first deletion safety model. Add sn reset, sn clear CLI commands and --reset-to option on sn mint.
Replace build_grammar_context() with build_compose_context() for rich grammar context. Add system/user message split for prompt caching. Preserve cluster_context through extraction.
Empty 'env:' with no values causes GitHub Actions to reject the workflow YAML before any jobs start (0 jobs, immediate failure).
Add entries for core_transport (heat/particle flux for electron and ion), mhd_linear (growth_rate, mhd_frequency), nbi (power, energy of NBI), and edge_profiles (electron temperature and density at edge region). Expand core_profiles from 8 to 12 entries: parallel electric field, bootstrap and ohmic current density, ion toroidal velocity. Expand magnetics from 2 to 6 entries: flux_loop poloidal flux, rogowski_coil plasma current, total plasma current, diamagnetic flux. Expand summary from 2 to 4 entries: toroidal beta, energy confinement time. Add 2 more equilibrium entries: profiles_1d psi, magnetic_axis vertical position. Fix physically incorrect rogowski_coil entry on magnetic_axis/r: replaced geometric_base/object combo with major_radius at MAGNETIC_AXIS position.
Create benchmark_calibration.yaml with 15 entries across 4 quality tiers. Replace inline reviewer rubric with Jinja2 template (sn/review_benchmark). Add 5-dimensional scoring: grammar, semantic, docs, convention, completeness. Retire benchmark_labels.yaml in favor of calibration dataset.
Add LLMResult class to llm.py for backward-compatible cache token exposure — supports 3-tuple unpacking while carrying cache_read_tokens and cache_creation_tokens from provider prompt caching. Add extract_cache_tokens() public function mirroring _log_cache_metrics extraction logic but returning values instead of logging. Add cache_read_tokens and cache_creation_tokens to ModelResult dataclass. Accumulate cache tokens in _run_model() using getattr fallback for mock compatibility. Display Cache % column in benchmark comparison table. Create model selection runbook with CLI commands, cost guidance, approved model list, decision criteria table, and cache optimization tips.
Update AGENTS.md with sn mint (renamed from sn build), sn reset, sn clear, --reset-to flag, benchmark cache reporting, and 5-dimensional scoring. Add SN module paths to project-dev skill, LLM proxy note to service-ops skill, and SN key files table to engineer agent. Delete superseded plans 16, 17, 18. Mark Plan 19 complete.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
v5.3.0 Release
37 commits | 83 files changed | +9,385 / -2,113 lines
Standard Names Pipeline (14 commits)
Complete implementation of the
snCLI subcommand group for generating, validating, and managing IMAS standard names:StandardNamewith rich fields (physics_domain,grammardecomposition,imas_paths,confidence) and fixed persistence bugssn mint): Renamed fromsn buildfor clarity. LLM-powered generation of standard names from DD paths or facility signals via compose/validate workers with extendedSNCandidatefieldssn publish): Lossless export to YAML catalog with rich fields preserved for human reviewsn import): Catalog feedback loop with version tracking and--checkmode for dry-run validation. Catalog is authoritative on importsn resetandsn clearfor managing standard name lifecycle statessearch_standard_names,fetch_standard_names,list_standard_namesexposed via the MCP server with benchmark quality tiersStandard Names Benchmarking (5 commits)
Evaluation framework for measuring LLM standard name generation quality:
DD Lifecycle & MCP Tool Improvements (5 commits)
lifecycle_statusresolved at build time from parent IDS lifecycle, backfilled into graph. NULL treated asactivein search/list filterscheck_dd_paths,get_dd_version_context, andget_dd_changelognow report lifecycle statusdd/idsnaming convention across all toolsimas-standard-namesPyPI package instead of local definitionget_graph_schemahidden in dd-only modeTests (2 commits)
TestE2ERoundTripfor standard name lifecycle (mint→publish→edit→import)CI/CD & Infrastructure (7 commits)
setup-uvpinned to immutablev8.0.0tagenv:blocks removed (broke CI workflows)IMAS_DD_VERSIONremoved from container buildConfiguration & Documentation (6 commits)
.vscodesymlink