feat(grammar): harvest 10 novel physical_base tokens from imas-codex rotations#16
Conversation
…ase tests Upgrades the release state-machine CLI to mirror imas-codex's release command shape: - --version: explicit version override, bypasses bump computation - --skip-git: skip git tag creation and push (useful for testing) - Dirty worktree policy: RC releases warn only, final releases abort - _check_clean_tree gains strict parameter for RC vs final semantics Adds comprehensive test suite (36 tests) covering: - Version parsing, formatting, and bump logic - State machine transitions: stable→RC (patch/minor/major), RC→RC, RC→final, direct release, RC abandon with bump - Rejection cases: final-from-stable, stable-no-bump, duplicate tag - CLI integration: dry-run, skip-git, explicit version, status display, message required, remote defaults/overrides, dirty worktree policy, end-to-end tag creation
…n 40) Migrate catalog storage from one-file-per-name nested in physics-domain directories to one-file-per-physics-domain YAML sequences. - Loader rejects legacy nested per-file layout with CatalogMigrationError (permissive mode preserves single-dict compatibility for tooling). - ArgumentRef Pydantic model and error_variants field on entry schema. - Topological ordering includes arguments[].name edges so derived entries follow their operands within a domain file. - Integrity tables track per-entry hash (blake2b of canonical entry YAML) so additions, deletions, and modifications are still detectable when multiple entries share a file. - Migrate in-repo example fixtures to new per-domain layout. - Update test helpers and round-trip tests for semantic equivalence. Part of plan 40 implementation.
…ades (plan 41) - graph/local_graph.py: DiGraph builder over per-domain YAML with 5 edge types (HAS_ARGUMENT, HAS_ERROR, HAS_PREDECESSOR, HAS_SUCCESSOR, REFERENCES); stub nodes for forward refs; ordering-parent/child closure helpers for ancestors/descendants traversal. - tools/graph.py: 4 FastMCP tools — get_standard_name_neighbours, get_standard_name_ancestors, get_standard_name_descendants, shortest_standard_name_path. Registered as optional read-only tools (gated on networkx availability). - rendering/catalog.py: Mermaid hierarchy blocks, resolved links, cocos_transformation_type emission, per-entry sibling nav (Arguments/Wrapped by/Error variants/Deprecates/Superseded by). - mkdocs.yml: mermaid2 plugin. - pyproject.toml: [graph-local] networkx extra; mkdocs-mermaid2-plugin in docs group. - AGENTS.md: local-graph module + MCP tools section with edge-convention table and HAS_ERROR direction note. - tests: 27 new tests covering graph build, traversal, MCP tools, and renderer output; readonly-server allowlist extended with 4 new tools.
Additions harvested from cross-domain standard-name cycling: Processes: - e_cross_b_drift - heat_viscosity - ohmic_induction Subjects (particle classification + polarization): - trapped, passing, counter_passing, co_current, counter_current - inertial, sonic - left_hand_circularly_polarized, right_hand_circularly_polarized
- Add .github/PULL_REQUEST_TEMPLATE.md with required evidence section for vocabulary token PRs (N >= 3 distinct DD paths) - Add Vocabulary Token Policy section to CONTRIBUTING.md with N >= 3 evidence gate, deprecation rules, and structural exceptions - Add docs/vocab-retrospective-rc21-rc26.md auditing all 15 tokens added between rc21 and rc26 (all pass N >= 3, verdict: keep all)
Add 5 tokens from the imas-codex electromagnetic_wave_diagnostics tier-a pilot that passed the N>=3 evidence gate (32 VocabGap nodes harvested, 5 eligible, 27 deferred pending Tier B coverage). Additions: - physical_bases.yml: diagnostic_latency (N=4), sweep_duration (N=3), x1_width (N=3) - geometry_carriers.yml: x1_coordinate (N=3), x2_coordinate (N=3) Deferred tokens (N<3) are documented in docs/vocab-retrospective.md. Grammar model_types.py regenerated via build-grammar.
feat: vocab-evidence-gate + rc21-rc26 retrospective
…losure Add 7 physical_base tokens to close vocabulary gaps identified in W22B review score analysis: Class 2 — Physics compound nouns: - bootstrap_current_density: j_bootstrap (core_profiles, N=5 IDSs) - rotation_frequency: rotation_frequency_tor (core_profiles, N=8+) - mach_number: mach_number_parallel (langmuir_probes, N=3) - resistivity: wall/*/resistivity (wall, cryostat, N=6) Class 2 — Viscosity current density compounds: - heat_viscosity_current_density: j_heat_viscosity (edge/plasma_profiles, N=4) - parallel_viscosity_current_density: j_parallel_viscosity (N=4) - perpendicular_viscosity_current_density: j_perpendicular_viscosity (N=4) All tokens meet the N>=3 evidence gate. Sonic rotation frequency is not added as a separate base because subject=sonic + base=rotation_frequency correctly composes to sonic_rotation_frequency via existing grammar.
feat(vocab): W23A evidence-gated physical bases for grammar gap closure
Compose-level NC-32 patch in imas-codex (47ed76eb) prevented new _on_ggd compositions, but the vocab attach pipeline kept resurfacing pre-existing registry entries with this suffix. W25D + W26B evidence: 13 attach-through names scoring 0.5-0.6 dragged MHD domain mean to 0.645 (YELLOW boundary). Standard names should be coordinate-system agnostic. The DD ggd/* path subtree encodes coordinate metadata at the schema level, not the physics name level. This PR retires the _on_ggd suffix family by: - Deleting 4 physical bases with canonical non-GGD twins: energy_radial_diffusivity_on_ggd (twin: energy_radial_diffusivity) momentum_diffusivity_on_ggd (twin: momentum_diffusivity) momentum_radial_diffusivity_on_ggd (twin: momentum_radial_diffusivity) particle_radial_diffusivity_on_ggd (twin: particle_radial_diffusivity) - Deleting on_ggd unary postfix operator from operators.yml - Regenerating model_types.py and constants.py (ON_GGD enum removed) - Updating test expectations (2 parametrized entries removed) All 1106 tests pass. Zero test count delta (removed 2 parametrized entries, no new tests needed since canonical twins remain). Cross-references: - imas-codex commit 47ed76eb (NC-32 compose-level prohibition) - W26B verdict report (this PR motivation) - W25D MHD domain rotation evidence
…codex W30 rotations
Adds the following physical_base tokens, all proposed independently by the
imas-codex auto-VocabGap detection mechanism during edge_plasma_physics
rotation (W30B). Each is a clean, well-established physics term that filled
an evident gap in the registry.
- anomalous_current_density (vector): current density from anomalous/turbulent
transport; used by 3 names in edge_plasma_physics
- covariant_metric_tensor (tensor): lower-index metric g_ij, counterpart to
existing contravariant_metric_tensor; used by 1 name in magnetohydrodynamics
- diamagnetic_energy (scalar): plasma stored energy from diamagnetic measurement;
used by 1 name in transport
- distribution_function (scalar): kinetic distribution function f(x,v) in
phase space; used by 1 name in edge_plasma_physics
- eigenmode_frequency (scalar): oscillation frequency of a plasma eigenmode;
used by 1 name in gyrokinetics
- eigenmode_growth_rate (scalar): linear growth rate of an unstable eigenmode;
used by 1 name in gyrokinetics
- ionization_potential (scalar): ionization energy of an atomic species;
used by 1 name in edge_plasma_physics
- logarithmic_density_gradient (scalar): d ln(n)/dx, standard gyrokinetic
drive parameter; used by 1 name in gyrokinetics
- logarithmic_temperature_gradient (scalar): d ln(T)/dx, standard gyrokinetic
drive parameter; used by 1 name in gyrokinetics
- pressure_gradient (scalar): spatial derivative of pressure, distinct from
pressure_gradient_alpha_parameter; used by 1 name in gyrokinetics/MHD
Source: imas-codex W29 commit aa16a350 added auto-VocabGap detection;
W30B rotation surfaced these proposals via parse-the-name post-processing.
Verification:
- Vetted against existing physical_bases.yml — no duplicates
- All 1079 ISN tests pass (unchanged count)
- Reviewer-suggested usage in tracked imas-codex StandardName nodes
|
I guess this is not the final thing, but I am still giving a few comments to the "Tokens added".
|
|
What means the "kind" here ? I thought the Standard Names didn't contain information about the number of dimensions of a quantity. |
|
The "Notes" are already quite informative, but I guess they are not the final definition of the Standard Names. |
|
Thanks Frederic, these comments are useful. The development that you are seeing here relate to the development of our SN vocab (the base name in particular). The generation pipeline is coming on nicely and I should have a set of prototype names with all of their metadata shortly. You raise a good point regarding word order and I have grappled with this same issue myself. I have made the decision to go with the prefix version for now. What you see above are examples of base names onto which other grammar elements are appended to construct our full names. It will make more sense when you see actual catalog examples. The ordering issue is already addressed. As we store these names in a graph we can display them relative to their connections, for example with siblings shown adjacent, parents close etc. We do not need to rely on alphabetical sorting. The names will all include links so navigation between them should be simple. |
Summary
Harvest of 10 novel
physical_basetokens from imas-codex W30 rotation evidence.Context
imas-codex W29 (commits
5f741fc4,aa16a350) madephysical_basetruly openin the LLM compose prompt with auto-VocabGap detection running post-LLM. W30B
(edge_plasma_physics rotation) surfaced these candidates organically — the LLM
proposed them without prompting because they fit the underlying physics.
Methodology
Mining: Queried all 830 composed
StandardNamenodes in the imas-codexNeo4j graph, parsed each with
imas_standard_names.grammar.parse_standard_name(),and collected
physical_basetokens not present in the registered vocabulary.Raw yield: 526 novel base tokens detected (445 genuine + 77
of_parserartifacts + 13 parse failures).
Vetting: Each candidate classified as STRONG (clean physics term, fills
clear gap), BORDERLINE (needs editorial discussion), or NOISE (LLM artifact,
synonym of existing token, or grammar misparse). Vetting criteria:
operator+base decomposition)?
Selection: 10 STRONG candidates included in this PR. ~10 BORDERLINE
candidates documented below for editorial review.
Tokens Added (10)
anomalous_current_densitycovariant_metric_tensorcontravariant_metric_tensordiamagnetic_energydistribution_functiondistributioneigenmode_frequencyeigenmode_growth_rateionization_potentiallogarithmic_density_gradientlogarithmic_temperature_gradientpressure_gradientpressure_gradient_alpha_parameterBorderline Candidates (NOT included in this PR)
particle_source_densityparticle_number_density_sourcemomentum_source_densitymomentum_source(per-volume implied in transport context)energy_source_densityenergy_source/volumetric_energy_sourceconducted_powerpowerconvected_powerpowerwall_temperaturewall+ generictemperaturecharge_statedominant_charge_stateandminimum_charge_stateexist; bare form may be too genericplasma_internal_inductanceinternal_inductanceexists; may be redundant with subject+baseelectromagnetic_force_densityforcepassive_conductor_resistivityresistivityNoise Tokens (excluded)
of_*prefix artifacts (77 tokens): parser residue from operator decompositionelectric_field_amplitude/electric_field_phase: should be operator+base constructionspeak_power_flux: should decompose asmaximumoperator +power_flux_densityTest Results
Cross-repo References
aa16a350