o4 Mini
OpenAI
Highlights
Top benchmark results for openai/o4-mini-2025-04-16.
1054#5
0.72#7
0.93#5
0.93#8
0.42#10
0.02#13
15.79#21
4.40#20
1291#7
0.81#16
0.62#5
0.51#6
0.18#11
337#3
0.72#7
1396#12
1102#17
0.75#5
0.39#13
1.82#6
0.80#3
Benchmark table
Detailed scores across tracked benchmarks.
| Benchmark | Category | Top Score | Info | Self Reported | Source |
|---|---|---|---|---|---|
| Ai2 SciArena | - | 1054 | - | No | Source |
| Aider-Polyglot | code | 0.72 | High Reasoning Effort | No | Source |
| AIME 2024 | math | 0.93 | - | Yes | Source |
| AIME 2025 | math | 0.93 | - | Yes | Source |
| ARC-AGI-1 | - | 0.42 | Medium Reasoning Effort | No | Source |
| ARC-AGI-2 | - | 0.02 | Medium Reasoning Effort | No | Source |
| BrowseComp Long Context 128k | - | 0.80 | High Reasoning Effort | Yes | Source |
| Confabulations | - | 15.79 | High Reasoning Effort | No | Source |
| Elimation Game | - | 4.40 | High Reasoning Effort | No | Source |
| EQ-Bench 3 | - | 1291 | - | No | Source |
| FActScore hallucination rate | hallucinations | 0.39 | High Reasoning Effort | Yes | Source |
| GPQA Diamond | general-knowledge | 0.81 | - | Yes | Source |
| Graphwalks bfs <128k | - | 0.62 | High Reasoning Effort | Yes | Source |
| Graphwalks parents <128k | - | 0.51 | High Reasoning Effort | Yes | Source |
| Humanity's Last Exam | - | 0.18 | - | Yes | Source |
| LisanBench | - | 337 | High Reasoning Effort | No | Source |
| LiveBench | - | 0.72 | High Reasoning Effort | No | Source |
| LMArena Text | - | 1396 | - | No | Source |
| LMArena WebDev | - | 1102 | 16th June 2025 | No | Source |
| LongFact-Concepts hallucination rate | hallucinations | 0.03 | High Reasoning Effort | Yes | Source |
| LongFact-Objects hallucination rate | hallucinations | 0.09 | High Reasoning Effort | Yes | Source |
| NYT Connections | - | 0.75 | High Reasoning Effort | No | Source |
| OpenAI-MRCR: 2 needle 128k | - | 0.56 | High Reasoning Effort | Yes | Source |
| SimpleBench | - | 0.39 | High Reasoning Effort | No | Source |
| Thematic Generalisation | - | 1.82 | High Reasoning Effort | No | Source |
| VideoMME | - | 0.80 | High Reasoning Effort | Yes | Source |
Benchmark comparisons
Use the selector to switch benchmarks and see how this model stacks up against its closest competitors.
Graphwalks bfs <128k
Compare this model with the leading peers for the selected benchmark.
Benchmark
0.62
Rank #5/8
8 models
Showing 8 models around the selected model (out of 8 total).