Skip to content

Add nightly GHCR retention workflow for eval-agent-server#689

Merged
all-hands-bot merged 2 commits intomainfrom
add-ghcr-retention-workflow
Apr 23, 2026
Merged

Add nightly GHCR retention workflow for eval-agent-server#689
all-hands-bot merged 2 commits intomainfrom
add-ghcr-retention-workflow

Conversation

@VascoSch92
Copy link
Copy Markdown
Collaborator

@VascoSch92 VascoSch92 commented Apr 22, 2026

Summary

Adds .github/workflows/ghcr-retention.yml — a nightly GC job that deletes old versions of ghcr.io/openhands/eval-agent-server. Refs #684.

What it does

  • Nightly at 03:17 UTC, plus workflow_dispatch with dry-run (default true), cut-off (default 2mo), and keep-n-most-recent (default 100) inputs. Scheduled runs perform real deletions; manual runs dry-run by default.
  • Deletes versions older than cut-off except the 100 most-recently-created and any tag matching v1.0.0*.
  • Uses snok/container-retention-policy@v3.0.0 — handles multi-arch manifest lists so deleting a parent index doesn't leave dangling children.
  • Scoped to eval-agent-server only.

Prerequisites

  1. Add secret GHCR_CLEANUP_PAT — classic PAT with delete:packages + read:packages (or fine-grained with Packages: read & write on the org). GITHUB_TOKEN cannot delete org package versions.
  2. Confirm benchmarks has Write role at https://github.com/orgs/OpenHands/packages/container/eval-agent-server/settings (the existing 8 build workflows already push successfully, so this is likely already set).

Rollout

  1. Merge.
  2. Trigger manually with dry-run: true → read the log, sanity-check the delete count (expect tens of thousands).
  3. Trigger manually with dry-run: false for the first real purge, or wait for the next nightly. First run may take hours due to GHCR secondary rate limits (timeout-minutes: 720).

Known gap

Retention does not cross-reference eval metadata on GCS (gs://openhands-evaluation-results/metadata/*.jsonl), so an in-flight eval against a >2mo SDK sha could lose its image mid-run. Low probability given the 100-version buffer; proper fix (pre-delete jsonl filter) is a follow-up PR. cc @juanmichelini @simonrosenberg.

Test plan

  • Add GHCR_CLEANUP_PAT secret
  • Confirm package-repo link has Write role
  • Manual dispatch with dry-run: true — inspect the deletion list
  • Manual dispatch with dry-run: false for the first real purge
  • Verify post-run tag count via ghcr.io/v2/openhands/eval-agent-server/tags/list (expect ~130k → a few thousand)
  • Confirm a subsequent build-swebench-images.yml run still succeeds

Refs #684. Deletes package versions older than a configurable cut-off
(default 2mo), keeping a safety buffer of 100 most-recent versions and
preserving legacy v1.0.0* tags. Uses snok/container-retention-policy.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@juanmichelini juanmichelini marked this pull request as ready for review April 22, 2026 18:51
Copy link
Copy Markdown
Collaborator

@juanmichelini juanmichelini left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, this is a GitHub action, approving so it can be tested

Copy link
Copy Markdown
Collaborator

@all-hands-bot all-hands-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Acceptable - Pragmatic solution with solid safety mechanisms. A few clarifications suggested but the core design is sound.

Comment thread .github/workflows/ghcr-retention.yml Outdated
Comment thread .github/workflows/ghcr-retention.yml
Comment thread .github/workflows/ghcr-retention.yml
Comment thread .github/workflows/ghcr-retention.yml
@all-hands-bot
Copy link
Copy Markdown
Collaborator

[RISK ASSESSMENT]

[Overall PR] ⚠️ Risk Assessment: 🟡 MEDIUM

This PR introduces a nightly maintenance workflow that automatically deletes old container images from GHCR. Risk factors:

Elevated Risk Factors:

  • Security sensitivity: Requires a high-privilege PAT with delete:packages + read:packages permissions
  • Blast radius: Deletion mistakes could affect in-flight evaluations (though mitigated by 100-version buffer and tag preservation)
  • Infrastructure dependencies: Relies on GHCR_CLEANUP_PAT secret being configured and the third-party snok/container-retention-policy@v3.0.0 action
  • Known gap: Does not cross-reference eval metadata on GCS, so evaluations running against >2mo SDK commits could lose their image mid-run (acknowledged in PR description, planned for follow-up)

Mitigating Factors:

  • Dry-run by default: Manual dispatches default to dry-run mode for safe testing
  • Safety buffer: Always keeps 100 most-recent versions regardless of age
  • Tag preservation: Protects v1.0.0* tags from deletion
  • Scoped: Limited to a single package (eval-agent-server), not org-wide
  • Long timeout: 12-hour timeout accounts for potentially long first run
  • Concurrency control: Prevents multiple cleanup runs from overlapping

Recommendation: This is acceptable for autonomous merge with the following caveats:

  1. Ensure GHCR_CLEANUP_PAT is configured before merge (per PR prerequisite checklist)
  2. Perform initial dry-run test via manual dispatch to validate deletion count
  3. Monitor first real run closely (may take hours due to GHCR rate limits)
  4. Consider implementing the GCS metadata cross-reference check in a follow-up PR to close the known gap

VERDICT:
Worth merging - Pragmatic infrastructure maintenance with appropriate safety mechanisms for the 133k+ tag problem.

KEY INSIGHT:
This workflow demonstrates good engineering taste by choosing aggressive cleanup (2-month cutoff, delete tagged and untagged) while adding multiple safety nets (version buffer, tag preservation, dry-run default) rather than implementing a complex, theoretically perfect solution that cross-references all metadata sources.


This review was created by an AI agent (OpenHands) on behalf of the repository maintainers.

@simonrosenberg
Copy link
Copy Markdown
Collaborator

LGTM, this is a GitHub action, approving so it can be tested

we can always test it by overriding another pre-existing github action and then renaming it before merging btw!

Pin snok/container-retention-policy to the v3.0.0 commit SHA so a future
tag retarget can't hand a PAT with delete:packages to arbitrary code.
Drop the job-level packages: write block — it's dead since the action
authenticates via GHCR_CLEANUP_PAT, not GITHUB_TOKEN.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@all-hands-bot all-hands-bot merged commit 62d06a4 into main Apr 23, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants