Loading...
Loading...
AI Stats
Home
Comparisons
Providers
Models
Benchmarks
Prices
Open menu
GPQA Diamond
Twitter
109
Total Models
59.21
Average Score
19.20 - 89.40
Score Range
1
Max Score Achievable
Top 10 Model Performance
Top 10 of 109
Models Using This Benchmark
(109)
OpenAI
(23 models)
GPT 5-Pro
openai
89.40%
o3 Preview
openai
87.70%
GPT-5
openai
87.30%
o3 Pro
openai
84.00%
o3
openai
83.30%
GPT-5 mini
openai
82.30%
o4 Mini
openai
81.40%
gpt-oss-120b
openai
80.90%
o3-mini
openai
79.70%
o1 pro
openai
79.00%
o1
openai
78.00%
gpt-oss-20b
openai
74.20%
o1 preview
openai
73.30%
GPT-4.5
openai
71.40%
GPT-5 nano
openai
71.20%
GPT-4.1
openai
66.30%
GPT-4.1 Mini
openai
65.00%
o1 mini
openai
60.00%
GPT-4.1 Nano
openai
50.30%
GPT-4o
openai
46.00%
GPT-4o-mini
openai
40.20%
GPT-4
openai
35.70%
GPT-3.5 Turbo
openai
30.80%
Google
(14 models)
Gemini 2.5 Pro Preview
google
86.40%
Gemini 2.5 Pro Experimental
google
84.00%
Gemini 2.5 Pro Preview
google
83.00%
Gemini 2.5 Flash Preview
google
82.80%
Gemini 2.0 Flash
google
78.30%
Gemini 2.5 Flash Preview
google
78.30%
Gemini 2.5 Flash Lite Preview
google
66.70%
Gemma 3 27B
google
42.40%
Gemma 3 12B
google
40.90%
Gemini Diffusion
google
40.40%
Gemini 1.0 Ultra
google
35.70%
Gemma 3 4B
google
30.80%
Gemini 1.0 Pro
google
27.90%
Gemma 3 1B
google
19.20%
Qwen
(14 models)
Qwen3 235B A22B Thinking 2507
qwen
81.10%
Qwen3 A235 A22B Instruct 2507
qwen
77.50%
Qwen3 30B A3B
qwen
65.80%
QwQ-32B
qwen
65.20%
QwQ-32B-Preview
qwen
65.20%
Qwen2.5 32B Instruct
qwen
49.50%
Qwen2.5 72B Instruct
qwen
49.00%
Qwen3 235B A22B
qwen
47.50%
Qwen2.5 VL 32B Instruct
qwen
46.00%
Qwen2.5 14B Instruct
qwen
45.50%
Qwen2 72B Instruct
qwen
42.40%
Qwen2.5 7B Instruct
qwen
36.40%
Qwen2.5-Omni-7B
qwen
30.80%
Qwen2 7B Instruct
qwen
25.30%
Anthropic
(10 models)
Claude 3.7 Sonnet
anthropic
84.80%
Claude Sonnet 4
anthropic
83.80%
Claude Opus 4
anthropic
83.30%
Claude Opus 4.1
anthropic
80.90%
Claude 3.5 Sonnet
anthropic
67.20%
Claude 3.5 Sonnet
anthropic
65.00%
Claude 3 Opus
anthropic
50.40%
Claude 3.5 Haiku
anthropic
41.60%
Claude 3 Sonnet
anthropic
40.40%
Claude 3 Haiku
anthropic
33.30%
Meta
(9 models)
Llama 4 Maverick
meta
69.80%
Llama 4 Scout
meta
57.20%
Llama 3.1 405B Instruct
meta
50.70%
Llama 3.3 70B Instruct
meta
50.50%
Llama 3.2 90B Instruct
meta
46.70%
Llama 3.1 70B Instruct
meta
41.70%
Llama 3.2 11B Instruct
meta
32.80%
Llama 3.2 3B Instruct
meta
32.80%
Llama 3.1 8B Instruct
meta
30.40%
xAI
(9 models)
Grok 4 Heavy
x-ai
88.90%
Grok 4
x-ai
87.50%
Grok 3 Beta
x-ai
84.60%
Grok 3 Mini Beta
x-ai
84.00%
Grok 3 Mini
x-ai
80.30%
Grok 3
x-ai
79.10%
Grok 2
x-ai
56.00%
Grok 2 Mini
x-ai
51.00%
Grok 1.5
x-ai
35.90%
Microsoft
(7 models)
Phi 4 Reasoning Plus
microsoft
68.90%
Phi 4 Reasoning
microsoft
65.80%
Phi 4
microsoft
56.10%
Phi 4 Mini Reasoning
microsoft
52.00%
Phi-3.5-MoE-instruct
microsoft
36.80%
Phi-3.5-mini-instruct
microsoft
30.40%
Phi 4 Mini
microsoft
25.20%
ai21
(4 models)
Jamba Large 1.6
ai21
38.70%
Jamba Large 1.5
ai21
36.90%
Jamba Mini 1.5
ai21
32.30%
Jamba Mini 1.6
ai21
30.00%
Amazon
(4 models)
Nova Premier
amazon
57.10%
Nova Pro 1.0
amazon
46.90%
Nova Lite 1.0
amazon
42.00%
Nova Micro 1.0
amazon
40.00%
DeepSeek
(4 models)
R1
deepseek
81.00%
R1
deepseek
71.50%
DeepSeek-V3 0324
deepseek
68.40%
DeepSeek-V3
deepseek
59.10%
Mistral
(3 models)
Magistral Medium
mistral
70.80%
Magistral Small
mistral
68.20%
Mistral Small 3.2
mistral
46.13%
Nvidia
(3 models)
Llama 3.1 Nemotron Ultra 253B v1
nvidia
76.00%
Llama-3.3 Nemotron Super 49B v1
nvidia
66.70%
Llama 3.1 Nemotron Nano 8B V1
nvidia
54.10%
LG
(2 models)
EXAONE 4.0 32B
lg
75.40%
EXAONE 4.0 1.2B
lg
52.00%
Moonshot
(2 models)
Kimi K2 Instruct
moonshotai
75.10%
Kimi K2 Base
moonshotai
48.10%
MiniMax
(1 model)
MiniMax M1
minimax
70.00%