Loading...
Loading...
AI Stats
Home
Comparisons
Providers
Models
Benchmarks
Prices
Open menu
Thematic Generalisation
Twitter
36
Total Models
1.99
Average Score
1.70 - 2.60
Score Range
1
Max Score Achievable
Top 10 Model Performance
Top 10 of 36
Lower is better
Models Using This Benchmark
(36)
Lower is better
OpenAI
(9 models)
o1
openai
1.80
o3 Pro
openai
1.82
o4 Mini
openai
1.82
o3
openai
1.83
o3-mini
openai
1.85
GPT-4.5
openai
1.93
o1 mini
openai
1.95
GPT-4o
openai
1.96
GPT-4o-mini
openai
2.30
Anthropic
(5 models)
Claude Opus 4
anthropic
1.70
Claude 3.7 Sonnet
anthropic
1.88
Claude Sonnet 4
anthropic
1.89
Claude 3.5 Sonnet
anthropic
1.93
Claude 3.5 Haiku
anthropic
2.25
Google
(5 models)
Gemini 2.5 Pro Experimental
google
1.74
Gemini 2.5 Pro Preview
google
1.75
Gemini 2.5 Pro Preview
google
1.79
Gemma 3 27B
google
2.21
Gemma 2 27B
google
2.60
xAI
(4 models)
Grok 4
x-ai
1.88
Grok 3 Mini Beta
x-ai
1.90
Grok 3 Beta
x-ai
2.07
Grok 2
x-ai
2.21
DeepSeek
(3 models)
R1
deepseek
1.74
DeepSeek-V3 0324
deepseek
1.95
DeepSeek-V3
deepseek
2.03
Meta
(3 models)
Llama 4 Maverick
meta
2.04
Llama 3.1 405B (base)
meta
2.08
Llama 3.3 70B Instruct
meta
2.12
Qwen
(3 models)
Qwen3 235B A22B
qwen
1.90
Qwen3 30B A3B
qwen
2.09
Qwen2.5 72B Instruct
qwen
2.21
Amazon
(1 model)
Nova Pro 1.0
amazon
2.11
Microsoft
(1 model)
Phi 4
microsoft
2.10
Mistral
(1 model)
Mistral Large 2
mistral
2.11
Moonshot
(1 model)
Kimi K2 Instruct
moonshotai
1.94