AI Stats
Leaderboard
Comparisons
Providers
Models
Benchmarks
Prices
Open menu
Enjoying AI Stats?
Support us
Benchmark Coverage Map
See which models have been evaluated on which benchmarks. Missing data is shown in gray. Help us improve coverage by contributing benchmark results!
Submit New Benchmark Results
View All Benchmarks
Benchmark Coverage
54 benchmarks, 174 models
Benchmark View
Model View
GPQA
Coverage
55%
97/174
View models needing this benchmark
View models with this benchmark
AIME 2024
Coverage
22%
39/174
View models needing this benchmark
View models with this benchmark
SimpleBench
Coverage
18%
32/174
View models needing this benchmark
View models with this benchmark
AIME 2025
Coverage
15%
26/174
View models needing this benchmark
View models with this benchmark
Aider-Polyglot
Coverage
14%
25/174
View models needing this benchmark
View models with this benchmark
ARC-AGI-1
Coverage
12%
22/174
View models needing this benchmark
View models with this benchmark
ARC-AGI-2
Coverage
12%
21/174
View models needing this benchmark
View models with this benchmark
AidanBench
Coverage
8%
14/174
View models needing this benchmark
View models with this benchmark
EQ-Bench 3
Coverage
8%
15/174
View models needing this benchmark
View models with this benchmark
LiveBench
Coverage
8%
14/174
View models needing this benchmark
View models with this benchmark
Humanity's Last Exam
Coverage
7%
13/174
View models needing this benchmark
View models with this benchmark
MMMU
Coverage
7%
12/174
View models needing this benchmark
View models with this benchmark
LisanBench
Coverage
6%
10/174
View models needing this benchmark
View models with this benchmark
MMLU
Coverage
6%
10/174
View models needing this benchmark
View models with this benchmark
SimpleQA
Coverage
6%
10/174
View models needing this benchmark
View models with this benchmark
SWE-Bench
Coverage
6%
11/174
View models needing this benchmark
View models with this benchmark
MMLU-Pro
Coverage
3%
5/174
View models needing this benchmark
View models with this benchmark
Codeforces
Coverage
2%
3/174
View models needing this benchmark
View models with this benchmark
HumanEval
Coverage
2%
4/174
View models needing this benchmark
View models with this benchmark
LMArena Text
Coverage
2%
4/174
View models needing this benchmark
View models with this benchmark
MATH
Coverage
2%
3/174
View models needing this benchmark
View models with this benchmark
BigCodeBench
Coverage
1%
1/174
View models needing this benchmark
View models with this benchmark
GSM8K
Coverage
1%
2/174
View models needing this benchmark
View models with this benchmark
SWE-Lancer
Coverage
1%
1/174
View models needing this benchmark
View models with this benchmark
TAU-Bench
Coverage
1%
1/174
View models needing this benchmark
View models with this benchmark
Balrog-AI
Coverage
0%
0/174
View models needing this benchmark
View models with this benchmark
Confabulations
Coverage
0%
0/174
View models needing this benchmark
View models with this benchmark
Creative Story Writing
Coverage
0%
0/174
View models needing this benchmark
View models with this benchmark
Dubesor LLM
Coverage
0%
0/174
View models needing this benchmark
View models with this benchmark
Elimation Game
Coverage
0%
0/174
View models needing this benchmark
View models with this benchmark
Fiction-Live Bench
Coverage
0%
0/174
View models needing this benchmark
View models with this benchmark
Galileo Agent
Coverage
0%
0/174
View models needing this benchmark
View models with this benchmark
HealthBench
Coverage
0%
0/174
View models needing this benchmark
View models with this benchmark
IQ Bench
Coverage
0%
0/174
View models needing this benchmark
View models with this benchmark
LiveCodeBench V5
Coverage
0%
0/174
View models needing this benchmark
View models with this benchmark
LiveCodeBench V6
Coverage
0%
0/174
View models needing this benchmark
View models with this benchmark
LMArena WebDev
Coverage
0%
0/174
View models needing this benchmark
View models with this benchmark
MathArena
Coverage
0%
0/174
View models needing this benchmark
View models with this benchmark
MC-Bench
Coverage
0%
0/174
View models needing this benchmark
View models with this benchmark
METR
Coverage
0%
0/174
View models needing this benchmark
View models with this benchmark
Misguided Attention
Coverage
0%
0/174
View models needing this benchmark
View models with this benchmark
MLE-Bench
Coverage
0%
0/174
View models needing this benchmark
View models with this benchmark
NYT Connections
Coverage
0%
0/174
View models needing this benchmark
View models with this benchmark
PaperBench
Coverage
0%
0/174
View models needing this benchmark
View models with this benchmark
PHYBench
Coverage
0%
0/174
View models needing this benchmark
View models with this benchmark
SEAL MultiChallenege
Coverage
0%
0/174
View models needing this benchmark
View models with this benchmark
SmolAgents LLM
Coverage
0%
0/174
View models needing this benchmark
View models with this benchmark
Snake-Bench
Coverage
0%
0/174
View models needing this benchmark
View models with this benchmark
SOLO-Bench
Coverage
0%
0/174
View models needing this benchmark
View models with this benchmark
Symflower Coding
Coverage
0%
0/174
View models needing this benchmark
View models with this benchmark
Thematic Generalisation
Coverage
0%
0/174
View models needing this benchmark
View models with this benchmark
WebDev Arena
Coverage
0%
0/174
View models needing this benchmark
View models with this benchmark
WeirdML
Coverage
0%
0/174
View models needing this benchmark
View models with this benchmark
XLANG Agent
Coverage
0%
0/174
View models needing this benchmark
View models with this benchmark