Loading...
Loading...
AI Stats
Home
Comparisons
Providers
Models
Benchmarks
Prices
Open menu
AIME 2025
40
Total Models
76.84
Average Score
23.30 - 100.00
Score Range
1
Max Score Achievable
Top 10 Model Performance
Top 10 of 40
Models Using This Benchmark
(40)
Google
(8 models)
Gemini 2.5 Pro Experimental
google
96.70%
Gemini 2.5 Pro Preview
google
88.00%
Gemini 2.5 Pro Preview
google
83.00%
Gemini 2.5 Flash Preview
google
78.00%
Gemini 2.5 Flash Preview
google
72.00%
Gemini 2.5 Flash Lite Preview
google
63.10%
Gemini 2.0 Flash
google
27.50%
Gemini Diffusion
google
23.30%
OpenAI
(8 models)
GPT 5-Pro
openai
100.00%
GPT-5
openai
99.60%
gpt-oss-20b
openai
98.70%
o3
openai
98.40%
gpt-oss-120b
openai
97.90%
o4 Mini
openai
92.70%
GPT-5 mini
openai
91.10%
GPT-5 nano
openai
85.20%
xAI
(6 models)
Grok 4 Heavy
x-ai
100.00%
Grok 4
x-ai
98.80%
Grok 3 Beta
x-ai
93.30%
Grok 3 Mini Beta
x-ai
90.80%
Grok 3 Mini
x-ai
83.00%
Grok 3
x-ai
57.50%
Qwen
(5 models)
Qwen3 235B A22B Thinking 2507
qwen
92.30%
Qwen3 235B A22B
qwen
81.50%
Qwen3 32B
qwen
72.90%
Qwen3 30B A3B
qwen
70.90%
Qwen3 A235 A22B Instruct 2507
qwen
70.30%
Nvidia
(3 models)
Llama 3.1 Nemotron Ultra 253B v1
nvidia
72.50%
Llama-3.3 Nemotron Super 49B v1
nvidia
58.40%
Llama 3.1 Nemotron Nano 8B V1
nvidia
47.10%
LG
(2 models)
EXAONE 4.0 32B
lg
85.30%
EXAONE 4.0 1.2B
lg
45.20%
Microsoft
(2 models)
Phi 4 Reasoning Plus
microsoft
78.00%
Phi 4 Reasoning
microsoft
62.90%
Mistral
(2 models)
Magistral Medium
mistral
64.90%
Magistral Small
mistral
62.80%
Anthropic
(1 model)
Claude Opus 4.1
anthropic
78.00%
DeepSeek
(1 model)
R1
deepseek
85.50%
MiniMax
(1 model)
MiniMax M1
minimax
76.90%
Moonshot
(1 model)
Kimi K2 Instruct
moonshotai
49.50%