New Release
|
Introducing AI Stats Gateway
|
Read the docs
New Release
Home
Organisations
Models
Benchmarks
API Providers
Home
Organisations
Models
Benchmarks
API Providers
Benchmarks
GPQA Diamond
90 models
LMArena Text
67 models
Confabulations
36 models
NYT Connections
36 models
AIME 2025
35 models
ARC-AGI-1
34 models
ARC-AGI-2
33 models
Aider-Polyglot
30 models
AIME 2024
30 models
SimpleBench
30 models
Thematic Generalisation
28 models
EQ-Bench 3
25 models
Humanity's Last Exam
22 models
Elimation Game
21 models
SWE-Bench
21 models
LMArena WebDev
20 models
MMMU
15 models
AidanBench
14 models
LiveBench
14 models
MMLU
12 models
Ai2 SciArena
11 models
LisanBench
10 models
SimpleQA
10 models
HMMT 2025
9 models
MathArena Apex
9 models
MMLU-Pro
9 models
BrowseComp Long Context 128k
8 models
Graphwalks bfs <128k
8 models
Graphwalks parents <128k
8 models
OpenAI-MRCR: 2 needle 128k
8 models
VideoMME
8 models
FActScore hallucination rate
7 models
LongFact-Concepts hallucination rate
7 models
LongFact-Objects hallucination rate
7 models
MMMLU
7 models
MMMU Pro
7 models
Tau 2 Retail
7 models
Tau 2 Telecom
7 models
BrowseComp Long Context 256k
6 models
Codeforces
6 models
IFEval
6 models
OpenAI-MRCR: 2 needle 256k
6 models
Tau 2 Airline
6 models
Vending Bench 2
6 models
HumanEval
5 models
MMLU Pro
5 models
Tau Bench (Airline)
5 models
Tau Bench (Retail)
5 models
CharXiv-Reasoning
4 models
Creative Story Writing
4 models
FACTS
4 models
Frontier Math
4 models
MATH
4 models
Video MMMU
4 models
COLLIE
3 models
ERQA
3 models
HealthBench
3 models
HealthBench Hard
3 models
Scale MCP Atlas
3 models
SWE Bench Pro
3 models
Terminal Bench
3 models
Terminal Bench 2.0
3 models
BFCL Overall FC V4
2 models
BrowseComp
2 models
FACTS Benchmark Suite
2 models
GSM8K
2 models
HealthBench Concensus
2 models
IF-Bench
2 models
LiveCodeBench
2 models
LiveCodeBench Coding
2 models
LiveCodeBench V5
2 models
LiveCodeBench V6
2 models
LongCodeBench 1M
2 models
MMLU Redux
2 models
OCRBench V2
2 models
OpenAI MRCR 8 Needle 128k
2 models
OpenAI MRCR 8 Needle 1m
2 models
QVHighlights
2 models
Realkie
2 models
ScreenSpot
2 models
SWE-Lancer
2 models
USAMO 2025
2 models
AI2D
1 model
APEX-Agents
1 model
BigCodeBench
1 model
ChartQA
1 model
DocVQA
1 model
GDPval-AA
1 model
GPQA
1 model
IMO Answer Bench
1 model
LiveCodeBench Pro
1 model
Mathvista
1 model
OSWorld
1 model
SciCode
1 model
SWE Bench Multilingual
1 model
ACEBench
0 models
AMC
0 models
Arena Hard
0 models
AutoLogi
0 models
Balrog-AI
0 models
C-Eval
0 models
CNMO 2024
0 models
CSimpleQA
0 models
Dubesor LLM
0 models
EvalPlus
0 models
Fiction-Live Bench
0 models
Galileo Agent
0 models
Global PICA
0 models
IQ Bench
0 models
MATH 500
0 models
MathArena
0 models
MC-Bench
0 models
METR
0 models
Misguided Attention
0 models
MLE-Bench
0 models
MMLU Multilingual
0 models
MMLU Redux 2.0
0 models
MMMT Bench
0 models
Multi‑Programming Language Evaluation
0 models
OmniDocBench 1.5
0 models
Online Judgement Benchmark
0 models
PaperBench
0 models
PHYBench
0 models
PolyMath-en
0 models
ScreenSpot-Pro
0 models
SEAL MultiChallenege
0 models
SmolAgents LLM
0 models
Snake-Bench
0 models
SOLO-Bench
0 models
SuperGPQA
0 models
SWE Bench Live
0 models
Symflower Coding
0 models
TAU-Bench
0 models
Tau2 Bench
0 models
TriviaQA
0 models
WeirdML
0 models
Wildbench
0 models
XLANG Agent
0 models
ZebraLogic
0 models
Sign In