Claude Opus 4
Anthropic
Highlights
Top benchmark results for anthropic/claude-opus-4-2025-05-21.
1080#2
0.72#7
0.36#13
0.09#6
17.06#23
6.01#7
1296#6
0.83#12
0.73#5
1419#5
1406#3
0.53#10
0.59#4
1.70#1
Benchmark table
Detailed scores across tracked benchmarks.
| Benchmark | Category | Top Score | Info | Self Reported | Source |
|---|---|---|---|---|---|
| Ai2 SciArena | - | 1080 | - | No | Source |
| Aider-Polyglot | code | 0.72 | 32k Thinking | No | Source |
| ARC-AGI-1 | - | 0.36 | 16k Thinking | No | Source |
| ARC-AGI-2 | - | 0.09 | 16k Thinking | No | Source |
| Confabulations | - | 17.06 | No Reasoning | No | Source |
| Elimation Game | - | 6.01 | No Reasoning | No | Source |
| EQ-Bench 3 | - | 1296 | - | No | Source |
| GPQA Diamond | general-knowledge | 0.83 | - | Yes | - |
| LiveBench | - | 0.73 | 32k Thinking | No | Source |
| LMArena Text | - | 1419 | - | No | Source |
| LMArena WebDev | - | 1406 | 16th June 2025 | No | Source |
| NYT Connections | - | 0.53 | 16k Thinking | No | Source |
| SimpleBench | - | 0.59 | - | No | Source |
| Thematic Generalisation | - | 1.70 | No Reasoning | No | Source |
Benchmark comparisons
Use the selector to switch benchmarks and see how this model stacks up against its closest competitors.
Confabulations
Compare this model with the leading peers for the selected benchmark.
Benchmark
15.92
Rank #23/43
43 models
Lower is better
Lower scores indicate stronger performance.
Showing 11 models around the selected model (out of 43 total).