Deepseek R1 (2025-05-28)
DeepSeek
Highlights
Top benchmark results for deepseek/deepseek-r1-2025-05-28.
1062#4
0.91#8
0.85#13
0.21#19
0.01#15
14.56#16
6.35#2
0.81#18
0.18#11
0.69#8
1411#7
1409#2
0.50#11
0.41#11
1.74#2
Benchmark table
Detailed scores across tracked benchmarks.
| Benchmark | Category | Top Score | Info | Self Reported | Source |
|---|---|---|---|---|---|
| Ai2 SciArena | - | 1062 | - | No | Source |
| AIME 2024 | math | 0.91 | - | Yes | Source |
| AIME 2025 | math | 0.85 | - | Yes | Source |
| ARC-AGI-1 | - | 0.21 | - | No | Source |
| ARC-AGI-2 | - | 0.01 | - | No | Source |
| Confabulations | - | 14.56 | - | No | Source |
| Elimation Game | - | 6.35 | - | No | Source |
| GPQA Diamond | general-knowledge | 0.81 | - | Yes | Source |
| Humanity's Last Exam | - | 0.18 | - | Yes | Source |
| LiveBench | - | 0.69 | - | No | Source |
| LMArena Text | - | 1411 | - | No | Source |
| LMArena WebDev | - | 1409 | 16th June 2025 | No | Source |
| NYT Connections | - | 0.50 | - | No | Source |
| SimpleBench | - | 0.41 | - | No | Source |
| Thematic Generalisation | - | 1.74 | - | No | Source |
Benchmark comparisons
Use the selector to switch benchmarks and see how this model stacks up against its closest competitors.
ARC-AGI-2
Compare this model with the leading peers for the selected benchmark.
Benchmark
0.01
Rank #15/29
29 models
Showing 11 models around the selected model (out of 29 total).