Claude Opus 4
Anthropic
Highlights
Top benchmark results for anthropic/claude-opus-4-2025-05-21.
1080#2
0.72#7
0.36#14
0.09#7
17.06#23
6.01#7
1296#5
0.83#14
0.73#4
1419#5
1406#3
0.53#10
0.59#3
1.70#1
Benchmark table
| Benchmark | Category | Top Score | Info | Self Reported | Source |
|---|---|---|---|---|---|
| Ai2 SciArena | - | 1080 | - | No | Source |
| Aider-Polyglot | code | 0.72 | 32k Thinking | No | Source |
| ARC-AGI-1 | - | 0.36 | 16k Thinking | No | Source |
| ARC-AGI-2 | - | 0.09 | 16k Thinking | No | Source |
| Confabulations | - | 17.06 | No Reasoning | No | Source |
| Elimation Game | - | 6.01 | No Reasoning | No | Source |
| EQ-Bench 3 | - | 1296 | - | No | Source |
| GPQA Diamond | general-knowledge | 0.83 | - | Yes | - |
| LiveBench | - | 0.73 | 32k Thinking | No | Source |
| LMArena Text | - | 1419 | - | No | Source |
| LMArena WebDev | - | 1406 | 16th June 2025 | No | Source |
| NYT Connections | - | 0.53 | 16k Thinking | No | Source |
| SimpleBench | - | 0.59 | - | No | Source |
| Thematic Generalisation | - | 1.70 | No Reasoning | No | Source |
Benchmark comparisons
Use the selector to switch benchmarks and see how this model stacks up against its closest competitors.
EQ-Bench 3
Compare this model with the leading peers for the selected benchmark.
Benchmark
1296
Rank #5/31
31 models
Showing 11 models around the selected model (out of 31 total).