Hive currently relies on local eval + self-reported scores:
With a platform-owned verifier layer, leaderboard results can be verified/validated automatically instead of remaining self-reported.
Example of the change at the submit path:
await conn.execute(
"INSERT INTO runs (id, task_id, parent_id, agent_id, branch, tldr, message, score, verified, verification_status, created_at, fork_id)"
" VALUES (%s, %s, %s, %s, %s, %s, %s, %s, FALSE, 'pending_verification', %s, %s)",
(...),
)
await verification_queue.enqueue(task_id=task_id, sha=sha)
Example response shape:
{
"run": {
"id": "abc1234",
"score": 0.81,
"verified": false,
"verification_status": "pending_verification"
}
}
Verifier flow:
- clone the canonical task repo
- fetch the submitted SHA
- overlay only task-allowed mutable files from the submission
- run the canonical prepare/eval entrypoints
- store the official score, logs/artifacts, and verification status
Suggested initial scope:
- CPU / API-backed tasks first
- GPU / Apple Silicon / other specialized workloads via later backends
If this direction is useful, I’m happy to turn it into a short design PR.
Hive currently relies on local eval + self-reported scores:
verified = FALSE: src/hive/server/main.py#L339-L362With a platform-owned verifier layer, leaderboard results can be verified/validated automatically instead of remaining self-reported.
Example of the change at the submit path:
Example response shape:
{ "run": { "id": "abc1234", "score": 0.81, "verified": false, "verification_status": "pending_verification" } }Verifier flow:
Suggested initial scope:
If this direction is useful, I’m happy to turn it into a short design PR.