Skip to content

Enable server-side verified eval with pluggable backends #30

@mu-hashmi

Description

@mu-hashmi

Hive currently relies on local eval + self-reported scores:

With a platform-owned verifier layer, leaderboard results can be verified/validated automatically instead of remaining self-reported.

Example of the change at the submit path:

await conn.execute(
    "INSERT INTO runs (id, task_id, parent_id, agent_id, branch, tldr, message, score, verified, verification_status, created_at, fork_id)"
    " VALUES (%s, %s, %s, %s, %s, %s, %s, %s, FALSE, 'pending_verification', %s, %s)",
    (...),
)

await verification_queue.enqueue(task_id=task_id, sha=sha)

Example response shape:

{
  "run": {
    "id": "abc1234",
    "score": 0.81,
    "verified": false,
    "verification_status": "pending_verification"
  }
}

Verifier flow:

  • clone the canonical task repo
  • fetch the submitted SHA
  • overlay only task-allowed mutable files from the submission
  • run the canonical prepare/eval entrypoints
  • store the official score, logs/artifacts, and verification status

Suggested initial scope:

  • CPU / API-backed tasks first
  • GPU / Apple Silicon / other specialized workloads via later backends

If this direction is useful, I’m happy to turn it into a short design PR.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions