Skip to content

feat(toolkit-docs-generator): majority-version coherence filter#907

Draft
jottakka wants to merge 5 commits intomainfrom
fix/majority-version-coherence-filter
Draft

feat(toolkit-docs-generator): majority-version coherence filter#907
jottakka wants to merge 5 commits intomainfrom
fix/majority-version-coherence-filter

Conversation

@jottakka
Copy link
Copy Markdown
Contributor

@jottakka jottakka commented Apr 9, 2026

Summary

  • Adds filterToolsByMajorityVersion() utility that computes the version shared by the most tools in a toolkit and drops tools at other versions
  • Applied in both fetchAllToolkitsData() and fetchToolkitData() (when no explicit version is passed) in CombinedToolkitDataSource
  • Fixes stale tools from older toolkit releases (e.g. Github notification tools at @2.0.1 alongside 35+ tools at @3.1.3) surviving through the docs pipeline indefinitely

Problem

The Engine /v1/tool_metadata endpoint returns tools at mixed versions for the same toolkit. The docs generator trusts this as-is, grouping tools by toolkit ID with no version filtering. The --skip-unchanged flow compounds this — it correctly sees no change run after run (because the API keeps returning the same stale tools), preserving old output indefinitely.

Solution

After grouping tools by toolkit ID, compute the majority version (the version shared by the most tools), then keep only tools at that version. This drops stale tools before they reach the diff or merger. Explicit version parameters bypass the filter.

Test plan

  • Unit tests for getMajorityVersion and filterToolsByMajorityVersion (11 tests)
  • Integration tests for CombinedToolkitDataSource version filtering (3 new tests)
  • Scenario test: mixed-version tools → filter → clean diff (3 tests)
  • Full test suite passes (499 tests, 0 type errors)

🤖 Generated with Claude Code


Note

Medium Risk
Changes which tools are included in generated docs/diffs by automatically dropping older-version tools when Engine returns mixed versions, which can cause large output churn if current data has inconsistencies.

Overview
Adds a highest-version coherence filter so each toolkit keeps only tools at the numerically highest @version when no explicit version is requested, dropping stale tools from older releases in both fetchToolkitData() and fetchAllToolkitsData().

Introduces utils/version-coherence.ts (plus exports) and moves extractVersion into utils/fp.ts while re-exporting it from data-merger.ts for compatibility. Updates the merger to preserve a previous toolkit summary when the tool signature changes but summary generation is disabled (no LLM), and adds unit/scenario tests covering mixed-version filtering and diff/summary behavior.

Reviewed by Cursor Bugbot for commit 417a51f. Bugbot is set up for automated code reviews on this repo. Configure here.

Engine API returns tools at mixed versions for the same toolkit (e.g.
Github tools at @3.1.3 alongside stale notification tools at @2.0.1).
These stale tools survive through the pipeline because the data source
has no version filtering, and --skip-unchanged preserves them
indefinitely since the API response stays consistent.

Add filterToolsByMajorityVersion() that computes the version shared by
the most tools in a toolkit and drops tools at other versions. Applied
in both fetchAllToolkitsData() and fetchToolkitData() (when no explicit
version is passed), so stale tools are removed before reaching the diff
or merger layers.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@vercel
Copy link
Copy Markdown

vercel bot commented Apr 9, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
docs Ready Ready Preview, Comment Apr 9, 2026 8:34pm

Request Review

- Move extractVersion to utils/fp.ts as the single source of truth;
  data-merger.ts re-exports it for backward compatibility
- Replace lexicographic version tie-break with numeric semver comparison
  so multi-digit components (e.g. 9.0.0 vs 10.0.0) sort correctly
- Remove redundant array spread in fetchAllToolkitsData filter
- Add test for multi-digit version tie-break edge case

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@teallarson
Copy link
Copy Markdown
Contributor

Should it be "most" or "greatest"? What if we drop 80 bad tools and keep 10 good ones?

arcade-mcp's normalize_version() allows semver with pre-release tags
(1.2.3-alpha.1) and build metadata (1.2.3+build.456). The previous
compareVersions used .split(".").map(Number) which produced NaN for
these suffixes. Now strips pre-release and build metadata before
parsing numeric MAJOR.MINOR.PATCH for comparison.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The majority-version approach was wrong: if a new release has fewer
tools (e.g. consolidation), the filter would keep the OLD version
with more tools and drop the new one. The correct logic is to always
keep tools at the highest (newest) version — stale tools are always
at older versions.

Renamed: getMajorityVersion → getHighestVersion,
filterToolsByMajorityVersion → filterToolsByHighestVersion.

Added test: "keeps newer version even when it has fewer tools".

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
}

return best || null;
};
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Version comparison and filtering use inconsistent matching

Low Severity

The getHighestVersion function, using compareVersions, ignores pre-release tags, causing it to pick the first encountered version in a tie. If a pre-release version is selected as 'highest' due to input order, the subsequent exact string filter in filterToolsByHighestVersion will incorrectly drop all stable tools with the same numeric core, violating semver precedence.

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit ea5ab6b. Configure here.

When the version filter removes stale tools, the tool list changes and
the summary signature no longer matches. Without this fix, if no LLM
generator is configured (--skip-summary), the summary is silently
dropped. Now falls back to the previous summary instead of losing it.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

There are 2 total unresolved issues (including 1 from previous review).

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit 417a51f. Configure here.

};
// Re-export extractVersion so existing consumers (e.g. toolkit-diff.ts)
// that import it from this module continue to work.
export { extractVersion } from "../utils/fp.js";
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ambiguous star-export silently drops extractVersion from package API

Medium Severity

The new extractVersion definition in utils/fp.ts is picked up by utils/index.ts via export * from "./fp.js". The backward-compat re-export added in data-merger.ts (export { extractVersion } from "../utils/fp.js") is picked up by merger/index.ts via export * from "./data-merger.js". Since src/index.ts does export * from both merger/index.js and utils/index.js, TypeScript treats extractVersion as an ambiguous binding and silently drops it from the package's public exports — a regression from the previous state where it was only exported via merger.

Additional Locations (1)
Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 417a51f. Configure here.

@EricGustin
Copy link
Copy Markdown
Member

wouldn't we want to continue documenting the deprecated tools (but with a deprecated banner) since the engine is still hosting them?

@jottakka jottakka marked this pull request as draft April 9, 2026 21:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants