Claude Opus 4.5
Anthropic
Highlights
Top benchmark results for anthropic/claude-opus-4-5-2025-11-24.
1#1
0.80#2
0.38#2
0.87#6
0.43#3
0.91#2
0.81#4
0.66#1
0.76#1
0.52#1
0.81#1
0.88#1
0.89#1
0.98#1
0.59#1
4967#2
Benchmark table
Detailed scores across tracked benchmarks.
| Benchmark | Category | Top Score | Info | Self Reported | Source |
|---|---|---|---|---|---|
| AIME 2025 | math | 1 | Avg@5, 64k Thinking, With Tools | Yes | Source |
| ARC-AGI-1 | - | 0.80 | 64k Thinking | No | Source |
| ARC-AGI-2 | - | 0.38 | 64k Thinking | No | Source |
| GPQA Diamond | general-knowledge | 0.87 | Avg@5, 64k Thinking | Yes | Source |
| Humanity's Last Exam | - | 0.43 | With Search | Yes | Source |
| MMMLU | - | 0.91 | Avg@5, 64k Thinking | Yes | Source |
| MMMU | - | 0.81 | Avg@5, 64k Thinking | Yes | Source |
| OSWorld | - | 0.66 | Pass@1; Avg@5, 64k Thinking | Yes | Source |
| SWE Bench Multilingual | code | 0.76 | Avg@5 | Yes | Source |
| SWE Bench Pro | - | 0.52 | Avg@5 | Yes | Source |
| SWE-Bench | code | 0.81 | Avg@5, 64k Thinking | Yes | Source |
| Tau 2 Airline | - | 0.88 | Corrected | Yes | Source |
| Tau 2 Retail | - | 0.89 | Avg@5, 64k Thinking | Yes | Source |
| Tau 2 Telecom | - | 0.98 | Avg@5, 64k Thinking | Yes | Source |
| Terminal Bench 2.0 | - | 0.59 | Avg@5, 128k Thinking | Yes | Source |
| Vending Bench 2 | - | 4967 | 8k Thinking | No | Source |
Benchmark comparisons
Use the selector to switch benchmarks and see how this model stacks up against its closest competitors.
SWE-Bench
Compare this model with the leading peers for the selected benchmark.
Benchmark
0.81
Rank #1/21
21 models
Showing 11 models around the selected model (out of 21 total).