Loading...
Loading...
AI Stats
Home
Comparisons
Providers
Models
Benchmarks
Prices
Open menu
Humanity's Last Exam
Twitter
23
Total Models
18.16
Average Score
4.70 - 44.40
Score Range
1
Max Score Achievable
Top 10 Model Performance
Top 10 of 23
Models Using This Benchmark
(23)
OpenAI
(8 models)
GPT 5-Pro
openai
42.00%
GPT-5
openai
35.20%
o3
openai
20.30%
gpt-oss-120b
openai
19.00%
o4 Mini
openai
17.70%
gpt-oss-20b
openai
17.30%
GPT-5 mini
openai
16.70%
GPT-5 nano
openai
8.70%
Google
(7 models)
Gemini 2.5 Pro Preview
google
21.60%
Gemini 2.5 Pro Experimental
google
18.80%
Gemini 2.5 Pro Preview
google
17.80%
Gemini 2.5 Flash Preview
google
12.10%
Gemini 2.5 Flash Preview
google
11.00%
Gemini 2.5 Flash Lite Preview
google
6.90%
Gemini 2.0 Flash
google
5.10%
Mistral
(2 models)
Magistral Medium
mistral
9.00%
Magistral Small
mistral
6.40%
xAI
(2 models)
Grok 4 Heavy
x-ai
44.40%
Grok 4
x-ai
38.60%
DeepSeek
(1 model)
R1
deepseek
17.70%
MiniMax
(1 model)
MiniMax M1
minimax
8.40%
Moonshot
(1 model)
Kimi K2 Instruct
moonshotai
4.70%
Qwen
(1 model)
Qwen3 235B A22B Thinking 2507
qwen
18.20%