Claude Sonnet 4
Anthropic
Highlights
Top benchmark results for anthropic/claude-sonnet-4-2025-05-21.
1045#6
0.61#13
0.40#12
0.06#7
14.85#17
5.68#9
1261#10
0.84#11
99#9
0.72#6
1394#14
1382#4
0.41#13
0.46#8
1.89#10
Benchmark table
Detailed scores across tracked benchmarks.
| Benchmark | Category | Top Score | Info | Self Reported | Source |
|---|---|---|---|---|---|
| Ai2 SciArena | - | 1045 | - | No | Source |
| Aider-Polyglot | code | 0.61 | 32k Thinking | No | Source |
| ARC-AGI-1 | - | 0.40 | 16k Thinking | No | Source |
| ARC-AGI-2 | - | 0.06 | 16k Thinking | No | Source |
| Confabulations | - | 14.85 | No Reasoning | No | Source |
| Elimation Game | - | 5.68 | No Reasoning | No | Source |
| EQ-Bench 3 | - | 1261 | - | No | Source |
| GPQA Diamond | general-knowledge | 0.84 | - | Yes | - |
| LisanBench | - | 99 | - | No | Source |
| LiveBench | - | 0.72 | 64k Thinking | No | Source |
| LMArena Text | - | 1394 | - | No | Source |
| LMArena WebDev | - | 1382 | 16th June 2025 | No | Source |
| NYT Connections | - | 0.41 | 16k Thinking | No | Source |
| SimpleBench | - | 0.46 | Thinking | No | Source |
| Thematic Generalisation | - | 1.89 | No Reasoning | No | Source |
Benchmark comparisons
Use the selector to switch benchmarks and see how this model stacks up against its closest competitors.
ARC-AGI-1
Compare this model with the leading peers for the selected benchmark.
Benchmark
0.40
Rank #12/31
31 models
Showing 11 models around the selected model (out of 31 total).