Skip to content

Remove unused SWE-Bench and SWT-Bench build agent-type flags#696

Merged
all-hands-bot merged 1 commit intomainfrom
simon/remove-unused-swebench-agent-type
Apr 23, 2026
Merged

Remove unused SWE-Bench and SWT-Bench build agent-type flags#696
all-hands-bot merged 1 commit intomainfrom
simon/remove-unused-swebench-agent-type

Conversation

@simonrosenberg
Copy link
Copy Markdown
Collaborator

@simonrosenberg simonrosenberg commented Apr 23, 2026

Summary

  • remove the unused --agent-type argument from benchmarks/swebench/build_images.py
  • stop forwarding agent-type into the SWE-Bench and SWT-Bench image build commands
  • remove the now-unused agent-type workflow inputs from the SWE-Bench and SWT-Bench build workflows
  • keep inference-side agent_type handling untouched where ACP behavior is still real
  • add regression coverage for the SWE-Bench build CLI behavior

Testing

  • python3 YAML parse of .github/workflows/build-swebench-images.yml
  • python3 YAML parse of .github/workflows/build-swtbench-images.yml
  • PYTHONPATH=/Users/simonrosenberg/worktrees/benchmarks-remove-unused-agent-type /Users/simonrosenberg/repositories/benchmarks/.venv/bin/pytest /Users/simonrosenberg/worktrees/benchmarks-remove-unused-agent-type/tests/test_phased_build.py

Related to #695 and paired with OpenHands/evaluation#526.

Copy link
Copy Markdown
Collaborator

@all-hands-bot all-hands-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Acceptable - Clean removal of dead code with appropriate compatibility handling.

This PR correctly removes an unused parameter that was parsed but never referenced in the actual build logic. The workflow compatibility layer is pragmatic. See inline comments for minor notes.

)
build_builder_image.assert_called_once_with(push=False, force_build=False)
build_all_base_images.assert_called_once()
assemble_all_agent_images.assert_called_once()
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Suggestion: This test uses 5 mocks and only verifies function calls, which doesn't catch regressions in actual behavior. However, this is consistent with the existing test patterns in this file (58 total @patch decorators) and is pragmatic given that running real Docker builds in tests would be impractical.

The test does verify that main() completes successfully without the removed parameter, which is the key regression we want to catch.

default: ''
agent-type:
description: 'Agent type: default (skip ACP), acp-claude, acp-codex (keep ACP)'
description: 'Deprecated compatibility input. Ignored for SWE-Bench image builds.'
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟢 Acceptable: Good backward compatibility strategy. Keeping the workflow input with a clear deprecation message ensures external callers won't break, while making it clear this parameter is no longer used.

This is pragmatic engineering - the CLI change is breaking (intentionally), but the workflow API stays compatible.

@simonrosenberg simonrosenberg force-pushed the simon/remove-unused-swebench-agent-type branch 2 times, most recently from 4569362 to e05f73c Compare April 23, 2026 17:30
@simonrosenberg simonrosenberg changed the title Remove unused SWE-Bench build agent-type flag Remove unused SWE-Bench and SWT-Bench build agent-type flags Apr 23, 2026
@all-hands-bot all-hands-bot reopened this Apr 23, 2026
@all-hands-bot all-hands-bot enabled auto-merge (squash) April 23, 2026 18:40
@all-hands-bot all-hands-bot merged commit 14e8c9e into main Apr 23, 2026
5 checks passed
@all-hands-bot all-hands-bot deleted the simon/remove-unused-swebench-agent-type branch April 23, 2026 18:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants