Grok 4
xAI
Highlights
Top benchmark results for x-ai/grok-4-2025-07-10.
0.80#5
0.99#3
0.67#5
0.16#4
12.41#7
7.69#6
6.00#8
1193#14
0.88#4
0.94#2
0.39#4
0.02#3
0.92#1
1.88#9
0.38#2
1999#4
Benchmark table
Detailed scores across tracked benchmarks.
| Benchmark | Category | Top Score | Info | Self Reported | Source |
|---|---|---|---|---|---|
| Aider-Polyglot | code | 0.80 | Diff | No | Source |
| AIME 2025 | math | 0.99 | - | Yes | Source |
| ARC-AGI-1 | - | 0.67 | Thinking | No | Source |
| ARC-AGI-2 | - | 0.16 | Thinking | No | Source |
| Confabulations | - | 12.41 | - | No | Source |
| Creative Story Writing | - | 7.69 | - | No | Source |
| Elimation Game | - | 6.00 | - | No | Source |
| EQ-Bench 3 | - | 1193 | - | No | Source |
| GPQA Diamond | general-knowledge | 0.88 | No Tools | Yes | Source |
| HMMT 2025 | - | 0.94 | With Python | Yes | Source |
| Humanity's Last Exam | - | 0.39 | Tool Use | Yes | Source |
| MathArena Apex | - | 0.02 | - | No | Source |
| NYT Connections | - | 0.92 | - | No | Source |
| Thematic Generalisation | - | 1.88 | - | No | Source |
| USAMO 2025 | - | 0.38 | - | Yes | Source |
| Vending Bench 2 | - | 1999 | - | No | Source |
Benchmark comparisons
Use the selector to switch benchmarks and see how this model stacks up against its closest competitors.
Aider-Polyglot
Compare this model with the leading peers for the selected benchmark.
Benchmark
0.80
Rank #5/34
34 models
Showing 11 models around the selected model (out of 34 total).