Skip to content

Human-Centric-Machine-Learning/strategic-ttc

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

30 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Strategic Test-Time Compute (TTC)

This repository contains the official code for the paper: "Test-Time Compute Games".

We model the LLM-as-a-service market as a game where providers compete by strategically selecting test-time compute levels (e.g., Best-of-N, Majority Voting) to maximize profit. We also introduce a Reverse Second-Price Auction mechanism that aligns provider incentives with social welfare.

Repository Structure

The repository is organized as follows:

.
├── configs/            # Experiment configuration files (YAML) for Llama, Qwen, etc.
├── notebooks/          # Jupyter notebooks for recreating paper figures and analysis
├── scripts/            # Helper scripts for loading results and SLURM job submission
├── strategic_ttc/      # Main package source code
│   ├── benchmarks/     # Benchmark datasets (GSM8K, AIME, GPQA)
│   ├── core/           # Core logic: Game dynamics, Auction mechanism, Generation
│   ├── models/         # HuggingFace model wrappers
│   └── verifiers/      # Answer extraction and verification logic
└── pyproject.toml      # Project metadata and build configuration

Getting Started

Prerequisites

  • Python 3.11+ (Tested on Python 3.11.2)
  • [Optional] CUDA-enabled GPU for running inference locally.

Installation

  1. Clone the repository:

    git clone git@github.com:Networks-Learning/strategic-ttc.git
    cd strategic-ttc
  2. Create a virtual environment:

    python -m venv venv
    source venv/bin/activate
  3. Install dependencies:

    pip install -r requirements.txt
    # OR install in editable mode if you plan to modify code:
    pip install -e .

Reproducing Results

  1. Run Inference (Data Generation)

    To generate the raw performance data (Accuracy vs. Tokens) for different models and compute budgets, use the CLI with a configuration file:

    python -m strategic_ttc.cli --config configs/Llama-3-8B--temp-0.6--samples-128--max-512.yaml

    Note: The raw run data is hosted on Hugging Face Datasets. To run the analysis notebooks, you must download the data into the final_runs/ folder.

    # Clone the data directly into the expected folder
    git clone https://huggingface.co/datasets/Human-Centric-Machine-Learning/strategic-ttc-data final_runs
  2. Analyze Game Dynamics

    Use the provided notebooks to simulate the market game, compute Nash Equilibria, and compare with the Auction mechanism.

    • GSM8K Analysis: notebooks/GSM8K-demo.ipynb
    • AIME Analysis: notebooks/AIME-demo.ipynb
    • GPQA Analysis: notebooks/GPQA-demo.ipynb

    These notebooks will generate the plots and tables found in the paper, saving them to the figures/ directory.

Data Management

The following directories are excluded from version control (via .gitignore) to keep the repo lightweight. You must create them locally.

  • datasets/: Stores downloaded benchmark datasets.
  • final_runs/: Stores raw inference logs and generation outputs.
  • figures/: Stores the output plots generated by the notebooks.

Configuration Files

The configs/ directory contains standard setups for all models used in the paper, naming convention follows: {Benchmark}-{Model}-{Size}--temp-{T}--samples-{N}--max-{Tokens}.yaml

License

This project is licensed under the MIT License - see the LICENSE file for details.

Citation

If you find this code useful, please cite our paper:

@misc{velasco2026testtimecomputegames,
      title={Test-Time Compute Games}, 
      author={Ander Artola Velasco and Dimitrios Rontogiannis and Stratis Tsirtsis and Manuel Gomez-Rodriguez},
      year={2026},
      eprint={2601.21839},
      archivePrefix={arXiv},
      primaryClass={cs.CY},
      url={https://arxiv.org/abs/2601.21839}, 
}

Releases

No releases published

Packages

 
 
 

Contributors