fix #675: Modify commit0 initial message to prevent cheating#676
fix #675: Modify commit0 initial message to prevent cheating#676juanmichelini merged 2 commits intomainfrom
Conversation
Modify the default.j2 prompt template to indicate that implementations should be created from scratch without copying or looking at similar code online. This prevents agents from using git clone, pip install, npm install, or other methods to obtain the target package from external sources. Fixes #675
|
@OpenHands Look at the CI. Why the tests are failing? Don't modify the code jsut report here your finding. |
|
I'm on it! VascoSch92 can track my progress at all-hands.dev |
CI Failure AnalysisThe CI is failing due to an unrelated test failure in Root CauseThe test assert args.select is NoneHowever, the actual value is: This indicates that a recent change to the Relevance to This PRThis failure is NOT caused by the changes in this PR (which only modifies Summary
This analysis was created by an AI assistant (OpenHands) on behalf of the user. |
SummaryI analyzed the CI failure for PR #676 as requested and reported my findings without making any code changes. Answer to the Question: Why are the tests failing?The CI is failing due to an unrelated test failure in Root Cause: The test expects Key Finding: This failure is NOT caused by this PR's changes (which only modify CI Status
Actions Taken
|
|
@VascoSch92 Previous (22098671747) | 7/16 (babel, chardet, cookiecutter, imapclient, jinja, parsel, voluptuous) Score change: 2 instances from 9/16 to 7/16. Rerunning with new prompt |
|
@VascoSch92 new prompt got 0 cheating and score is not to bad either 8/16 vs the cheating one that had 9/16 |
all-hands-bot
left a comment
There was a problem hiding this comment.
🟢 Good taste - Clean, direct solution that addresses issue #675.
[RISK ASSESSMENT]
- [Overall PR]
⚠️ Risk Assessment: 🟢 LOW
This change only adds instructional text to a prompt template without modifying any code logic. It's backward compatible and cannot break existing functionality. The worst-case scenario is that agents ignore the instructions, which is the current state.
VERDICT:
✅ Worth merging: Clear anti-cheating instructions that improve benchmark integrity.
KEY INSIGHT:
Simple, pragmatic solution—adds explicit instructions without over-engineering the problem.
Summary
Modify the commit0 initial prompt template (
benchmarks/commit0/prompts/default.j2) to indicate that implementations should be created from scratch without copying or looking at similar code online.Change
Added an anti-cheating instruction to the default.j2 template:
This prevents agents from using git clone, pip install, npm install, or other methods to obtain the target package from external sources when tasked to implement something.
Fixes
Fixes #675
@juanmichelini can click here to continue refining the PR