This repository contains the official code for the paper: "Test-Time Compute Games".
We model the LLM-as-a-service market as a game where providers compete by strategically selecting test-time compute levels (e.g., Best-of-N, Majority Voting) to maximize profit. We also introduce a Reverse Second-Price Auction mechanism that aligns provider incentives with social welfare.
The repository is organized as follows:
.
├── configs/ # Experiment configuration files (YAML) for Llama, Qwen, etc.
├── notebooks/ # Jupyter notebooks for recreating paper figures and analysis
├── scripts/ # Helper scripts for loading results and SLURM job submission
├── strategic_ttc/ # Main package source code
│ ├── benchmarks/ # Benchmark datasets (GSM8K, AIME, GPQA)
│ ├── core/ # Core logic: Game dynamics, Auction mechanism, Generation
│ ├── models/ # HuggingFace model wrappers
│ └── verifiers/ # Answer extraction and verification logic
└── pyproject.toml # Project metadata and build configuration
- Python 3.11+ (Tested on Python 3.11.2)
- [Optional] CUDA-enabled GPU for running inference locally.
-
Clone the repository:
git clone git@github.com:Networks-Learning/strategic-ttc.git cd strategic-ttc -
Create a virtual environment:
python -m venv venv source venv/bin/activate -
Install dependencies:
pip install -r requirements.txt # OR install in editable mode if you plan to modify code: pip install -e .
-
Run Inference (Data Generation)
To generate the raw performance data (Accuracy vs. Tokens) for different models and compute budgets, use the CLI with a configuration file:
python -m strategic_ttc.cli --config configs/Llama-3-8B--temp-0.6--samples-128--max-512.yaml
Note: The raw run data is hosted on Hugging Face Datasets. To run the analysis notebooks, you must download the data into the
final_runs/folder.# Clone the data directly into the expected folder git clone https://huggingface.co/datasets/Human-Centric-Machine-Learning/strategic-ttc-data final_runs -
Analyze Game Dynamics
Use the provided notebooks to simulate the market game, compute Nash Equilibria, and compare with the Auction mechanism.
- GSM8K Analysis:
notebooks/GSM8K-demo.ipynb - AIME Analysis:
notebooks/AIME-demo.ipynb - GPQA Analysis:
notebooks/GPQA-demo.ipynb
These notebooks will generate the plots and tables found in the paper, saving them to the
figures/directory. - GSM8K Analysis:
The following directories are excluded from version control (via .gitignore) to keep the repo lightweight. You must create them locally.
datasets/: Stores downloaded benchmark datasets.final_runs/: Stores raw inference logs and generation outputs.figures/: Stores the output plots generated by the notebooks.
The configs/ directory contains standard setups for all models used in the paper, naming convention follows: {Benchmark}-{Model}-{Size}--temp-{T}--samples-{N}--max-{Tokens}.yaml
This project is licensed under the MIT License - see the LICENSE file for details.
If you find this code useful, please cite our paper:
@misc{velasco2026testtimecomputegames,
title={Test-Time Compute Games},
author={Ander Artola Velasco and Dimitrios Rontogiannis and Stratis Tsirtsis and Manuel Gomez-Rodriguez},
year={2026},
eprint={2601.21839},
archivePrefix={arXiv},
primaryClass={cs.CY},
url={https://arxiv.org/abs/2601.21839},
}