Loading...
Loading...
AI Stats
Leaderboard
Comparisons
Providers
Models
Benchmarks
Prices
Open menu
This website is not yet fully optimised for mobile viewing. Some features may not display or function as intended.
GPQA
Twitter
About this Benchmark
Unfortunately there isn't a description for this benchmark yet.
Model Performance
GPQA
Top 20 of 98
Models Using This Benchmark
(98)
OpenAI
(23 models)
o3 Preview
87.7%
o3
83.3%
o4 Mini
81.4%
o4 Mini High
81.4%
o3 Mini High
79.7%
o1-pro
79.0%
o1
78.0%
o3-mini
77.2%
o3 Mini Medium
76.8%
o1-preview
73.3%
GPT-4.5 (Preview)
71.4%
o3 Mini Low
70.6%
GPT-4.5
69.5%
GPT-4.1
66.3%
GPT-4.1 Mini
65.0%
o1-mini
60.0%
GPT-4.1 Nano
50.3%
GPT-4 Turbo
48.0%
GPT-4o
46.0%
GPT-4o-mini
40.2%
GPT-4
35.7%
GPT-3.5 Turbo
30.8%
GPT-2
1.12%
Google
(13 models)
Qwen
(12 models)
Anthropic
(10 models)
Meta
(9 models)
Microsoft
(7 models)
xAI
(7 models)
Amazon
(4 models)
DeepSeek
(4 models)
Mistral
(4 models)
Nvidia
(3 models)
ai21
(2 models)