Community benchmark database for running LLMs on Apple Silicon Macs
-
Updated
Apr 9, 2026 - Shell
Community benchmark database for running LLMs on Apple Silicon Macs
Claude Code skill that pits Claude, ChatGPT, and Gemini against each other, then lets them cross-judge each other blind
🧠 Benchmark Haiku 4.5 and MiniMax M2.1 on agentic tasks, revealing strengths in design thinking and operational skills for multi-turn workflows.
Benchmark Ollama models on your own prompts, on your own hardware.
Reproducible benchmark framework for testing hypotheses about AI coding agents
Systematic benchmark comparing Claude Haiku 4.5 vs MiniMax M2.1 on agentic coding tasks. Includes full audit trails, LLM-as-judge evaluation, and path divergence analysis.
Add a description, image, and links to the llm-benchmark topic page so that developers can more easily learn about it.
To associate your repository with the llm-benchmark topic, visit your repo's landing page and select "manage topics."