Grok 3 Beta
xAI
Highlights
Top benchmark results for x-ai/grok-3-beta-2025-02-19.
2347#5
0.53#19
0.96#4
0.93#7
14.19#13
6.20#6
1066#22
0.85#9
231#5
0.62#11
1406#9
0.80#7
0.78#7
0.20#21
0.44#5
2.07#17
Benchmark table
Detailed scores across tracked benchmarks.
| Benchmark | Category | Top Score | Info | Self Reported | Source |
|---|---|---|---|---|---|
| AidanBench | - | 2347 | Reasoning | No | Source |
| Aider-Polyglot | code | 0.53 | - | No | Source |
| AIME 2024 | math | 0.96 | Reasoning | Yes | Source |
| AIME 2025 | math | 0.93 | Think, Cons@64 | Yes | Source |
| Confabulations | - | 14.19 | No Reasoning | No | Source |
| Elimation Game | - | 6.20 | No Reasoning | No | Source |
| EQ-Bench 3 | - | 1066 | - | No | Source |
| GPQA Diamond | general-knowledge | 0.85 | Think, Cons@64 | Yes | Source |
| LisanBench | - | 231 | - | No | Source |
| LiveBench | - | 0.62 | High Reasoning Effort | No | Source |
| LMArena Text | - | 1406 | Early Grok-3 | No | Source |
| MMLU-Pro | - | 0.80 | - | Yes | Source |
| MMMU | - | 0.78 | Think, Cons@64 | Yes | Source |
| NYT Connections | - | 0.20 | No Reasoning | No | Source |
| SimpleQA | - | 0.44 | - | Yes | Source |
| Thematic Generalisation | - | 2.07 | No Reasoning | No | Source |
Benchmark comparisons
Use the selector to switch benchmarks and see how this model stacks up against its closest competitors.
AIME 2024
Compare this model with the leading peers for the selected benchmark.
Benchmark
0.96
Rank #4/40
40 models
Showing 11 models around the selected model (out of 40 total).