Skip to content

Tighten canonical-repo match in queryTargetsDependingOnModules#338

Open
rdark wants to merge 2 commits intoTinder:masterfrom
rdark:fix/query-modules-canonical-match
Open

Tighten canonical-repo match in queryTargetsDependingOnModules#338
rdark wants to merge 2 commits intoTinder:masterfrom
rdark:fix/query-modules-canonical-match

Conversation

@rdark
Copy link
Copy Markdown

@rdark rdark commented Apr 28, 2026

Partial fix for #335

On MODULE.bazel-touching commits the impacted-targets path fans out one bazel query rdeps(//..., @@<repo>//...) subprocess per canonical repo whose label contains the changed module's name as a substring. A module named "cpp" pulled in every abseil-cpp+//... label; in workspaces with many bzlmod module extensions the fan-out reached thousands of serial subprocesses per get-impacted-targets call.

Replace the contains scan with a two-tier predicate that matches on the parsed canonical repo name:

  1. Tier A: consult bazel mod dump_repo_mapping "" (already implemented in BazelQueryService.discoverRepoMapping; made public for reuse). If the mapping resolves module.apparentName to canonical C, any repo equal to C or starting with "C+"/"C~" belongs to that module.
  2. Tier B: prefix-match module.name + "+" / module.name + "~" for transitive modules that root's repo mapping does not include and when discoverRepoMapping fails. Extension-created forms (name++ext+repo, name~~ext~repo) are subsumed by these prefixes.

Canonical repos to query are collected across all changed modules before running rdeps, which avoids duplicate queries when findChangedModules reports a version bump as both an add and a remove (two Module objects sharing the same canonical).

Also fix a pre-existing bug in the same function: the trailing computeSimpleImpactedTargets(emptyMap(), allTargets) call returned allTargets.keys, silently unioning every target back into the rdeps result. Thread from through the call chain so the union uses the real hash diff. Without this, the rdeps filtering added here would be cost-only — the impacted set would still equal the full workspace.

Tests: new CalculateImpactedTargetsInteractorModuleQueryTest covers canonical-plus, canonical-tilde, extension-created-via-mapping, extension-created-via-name-fallback, the "cpp must not match abseil-cpp" regression, transitive-only-via-name-fallback, ghost-module (Tier C skip), multiple-disjoint-modules, discoverRepoMapping failure, and the executeWithDistances module-query path. Output assertions guard against regression to the emptyMap() union bug.

On MODULE.bazel-touching commits the impacted-targets path fans out one
`bazel query rdeps(//..., @@<repo>//...)` subprocess per canonical repo
whose label contains the changed module's name as a substring. A module
named "cpp" pulled in every `abseil-cpp+//...` label; in workspaces with
many bzlmod module extensions the fan-out reached thousands of serial
subprocesses per `get-impacted-targets` call.

Replace the `contains` scan with a two-tier predicate that matches on the
parsed canonical repo name:

1. Tier A: consult `bazel mod dump_repo_mapping ""` (already implemented
   in BazelQueryService.discoverRepoMapping; made public for reuse). If
   the mapping resolves `module.apparentName` to canonical C, any repo
   equal to C or starting with "C+"/"C~" belongs to that module.
2. Tier B: prefix-match `module.name + "+"` / `module.name + "~"` for
   transitive modules that root's repo mapping does not include and when
   discoverRepoMapping fails. Extension-created forms (`name++ext+repo`,
   `name~~ext~repo`) are subsumed by these prefixes.

Canonical repos to query are collected across all changed modules before
running rdeps, which avoids duplicate queries when findChangedModules
reports a version bump as both an add and a remove (two Module objects
sharing the same canonical).

Also fix a pre-existing bug in the same function: the trailing
`computeSimpleImpactedTargets(emptyMap(), allTargets)` call returned
`allTargets.keys`, silently unioning every target back into the rdeps
result. Thread `from` through the call chain so the union uses the real
hash diff. Without this, the rdeps filtering added here would be
cost-only — the impacted set would still equal the full workspace.

Tests: new CalculateImpactedTargetsInteractorModuleQueryTest covers
canonical-plus, canonical-tilde, extension-created-via-mapping,
extension-created-via-name-fallback, the "cpp must not match abseil-cpp"
regression, transitive-only-via-name-fallback, ghost-module (Tier C skip),
multiple-disjoint-modules, discoverRepoMapping failure, and the
executeWithDistances module-query path. Output assertions guard against
regression to the `emptyMap()` union bug.
@rdark rdark force-pushed the fix/query-modules-canonical-match branch from 2ceb1c8 to a61697d Compare April 28, 2026 00:31
The per-repo rdeps query-failure fallback filtered out every `@@` label
on the assumption that the hash set was mostly workspace-local `//...`
targets. That assumption does not hold on bzlmod-only workspaces where
`generate-hashes` emits almost entirely `@@canonical` labels plus a
small number of `//external:<apparent>` bzlmod-synthetic bridges, with
zero workspace-local targets.

On such workspaces, a single failed `rdeps(//..., @@<repo>//...)` query
(for example, one caused by an unrelated loading error elsewhere in the
dep graph) left the filter with only the `//external:*` bridges, which
the downstream `excludeExternalTargets=true` default (Tinder#334) then strips
entirely — producing silently-empty impacted output on what should be a
"conservatively rebuild everything" signal.

Replace the unconditional `!startsWith("@@")` filter with a shape-aware
fallback:

  - Compute a "buildable workspace" set that excludes BOTH `@@canonical`
    transitives AND `//external:<apparent>` bridges.
  - If that set is non-empty (mixed WORKSPACE + bzlmod, or WORKSPACE-only
    shapes), emit it — preserves pre-existing granularity, avoids leaking
    tens of thousands of `@@` transitives into the impacted set on a
    single flaky `bazel query`.
  - If it is empty (bzlmod-only shape), fall through to `allTargets.keys`
    so the downstream filter has something to keep — rebuild-everything
    signal rather than a directly-buildable label list.

Adds a mixed-shape regression test asserting that `@@` transitives and
`//external:*` bridges do NOT leak into the fallback when buildable
workspace labels are present, complementing the existing bzlmod-only
test. An unconditional `allTargets.keys` fallback fails the new test;
the pre-commit `!startsWith("@@")` filter fails the existing bzlmod-only
one. Only the shape-aware fallback passes both.
@rdark rdark force-pushed the fix/query-modules-canonical-match branch from e031d63 to 1f737d4 Compare April 28, 2026 10:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant