Loading...
Loading...
AI Stats
Home
Comparisons
Providers
Models
Benchmarks
Prices
Open menu
Confabulations
Twitter
44
Total Models
19.16
Average Score
10.34 - 40.26
Score Range
1
Max Score Achievable
Top 10 Model Performance
Top 10 of 44
Lower is better
Models Using This Benchmark
(44)
Lower is better
OpenAI
(14 models)
GPT-5
openai
10.34
o1
openai
11.74
o1 preview
openai
13.04
GPT-5 mini
openai
13.28
GPT-4.5
openai
13.64
o3 Pro
openai
14.22
o3
openai
14.38
GPT-4o
openai
15.34
gpt-oss-120b
openai
15.65
o4 Mini
openai
15.79
GPT-4o
openai
17.21
o3-mini
openai
18.43
o1 mini
openai
18.55
GPT-4o-mini
openai
37.21
Anthropic
(8 models)
Claude Sonnet 4
anthropic
14.85
Claude Opus 4
anthropic
17.06
Claude Opus 4.1
anthropic
18.51
Claude 3.7 Sonnet
anthropic
19.76
Claude 3.5 Sonnet
anthropic
19.94
Claude 3 Opus
anthropic
22.70
Claude 3 Haiku
anthropic
34.21
Claude 3.5 Haiku
anthropic
36.74
Google
(5 models)
Gemini 2.5 Pro Preview
google
10.62
Gemini 2.5 Pro Experimental
google
10.80
Gemini 2.5 Pro Preview
google
12.38
Gemma 2 27B
google
27.24
Gemma 3 27B
google
40.26
Qwen
(4 models)
Qwen3 30B A3B
qwen
12.28
Qwen3 235B A22B
qwen
15.41
Qwen3 235B A22B Thinking 2507
qwen
16.77
Qwen2.5 72B Instruct
qwen
19.09
xAI
(4 models)
Grok 4
x-ai
12.41
Grok 3 Mini Beta
x-ai
14.04
Grok 3 Beta
x-ai
14.19
Grok 2
x-ai
20.14
Meta
(3 models)
Llama 3.1 405B (base)
meta
17.62
Llama 4 Maverick
meta
22.58
Llama 3.3 70B Instruct
meta
22.81
DeepSeek
(2 models)
R1
deepseek
14.56
DeepSeek-V3 0324
deepseek
26.15
Amazon
(1 model)
Nova Pro 1.0
amazon
30.05
Microsoft
(1 model)
Phi 4
microsoft
29.43
Mistral
(1 model)
Mistral Large 2
mistral
21.40
Moonshot
(1 model)
Kimi K2 Instruct
moonshotai
20.38