GPT 5 mini
OpenAI
Highlights
Top benchmark results for openai/gpt-5-mini-2025-08-07.
0.72#8
0.91#10
0.54#8
0.04#9
0.76#3
0.98#2
13.28#10
8.31#4
0.63#2
0.22#2
0.82#15
0.73#3
0.64#3
0.88#3
0.17#13
0.01#5
0.62#2
0.82#3
0.74#3
0.71#6
0.60#3
0.78#3
0.74#3
0.82#3
0.79#5
Benchmark table
Detailed scores across tracked benchmarks.
| Benchmark | Category | Top Score | Info | Self Reported | Source |
|---|---|---|---|---|---|
| Aider-Polyglot | code | 0.72 | High Reasoning Effort, Diff Method | Yes | Source |
| AIME 2025 | math | 0.91 | High Reasoning Effort, No Tools | Yes | Source |
| ARC-AGI-1 | - | 0.54 | High Reasoning Effort | No | Source |
| ARC-AGI-2 | - | 0.04 | High Reasoning Effort | No | Source |
| BrowseComp Long Context 128k | - | 0.89 | High Reasoning Effort | Yes | Source |
| BrowseComp Long Context 256k | - | 0.86 | High Reasoning Effort | Yes | Source |
| CharXiv-Reasoning | - | 0.76 | High Reasoning Effort | Yes | Source |
| COLLIE | - | 0.98 | High Reasoning Effort | Yes | Source |
| Confabulations | - | 13.28 | - | No | Source |
| Creative Story Writing | - | 8.31 | - | No | Source |
| ERQA | - | 0.63 | High Reasoning Effort | Yes | Source |
| FActScore hallucination rate | hallucinations | 0.04 | High Reasoning Effort | Yes | Source |
| Frontier Math | math | 0.22 | With Thinking, With Python, Pass @ 1 | Yes | Source |
| GPQA Diamond | general-knowledge | 0.82 | High Reasoning Effort, No Tools | Yes | Source |
| Graphwalks bfs <128k | - | 0.73 | High Reasoning Effort | Yes | Source |
| Graphwalks parents <128k | - | 0.64 | High Reasoning Effort | Yes | Source |
| HMMT 2025 | - | 0.88 | High Reasoning Effort, No Tools | Yes | Source |
| Humanity's Last Exam | - | 0.17 | High Reasoning Effort, No Tools | Yes | Source |
| LongFact-Concepts hallucination rate | hallucinations | 0.01 | High Reasoning Effort | Yes | Source |
| LongFact-Objects hallucination rate | hallucinations | 0.01 | High Reasoning Effort | Yes | Source |
| MathArena Apex | - | 0.01 | High Reasoning Effort | No | Source |
| MMLU Pro | - | 0.62 | High Reasoning Effort | Yes | Source |
| MMMU | - | 0.82 | High Reasoning Effort | Yes | Source |
| MMMU Pro | - | 0.74 | High Reasoning Effort | Yes | Source |
| OpenAI-MRCR: 2 needle 128k | - | 0.84 | High Reasoning Effort | Yes | Source |
| OpenAI-MRCR: 2 needle 256k | - | 0.59 | High Reasoning Effort | Yes | Source |
| SWE-Bench | code | 0.71 | High Reasoning Effort, No Tools | Yes | Source |
| Tau 2 Airline | - | 0.60 | High Reasoning Effort | Yes | Source |
| Tau 2 Retail | - | 0.78 | High Reasoning Effort | Yes | Source |
| Tau 2 Telecom | - | 0.74 | High Reasoning Effort | Yes | Source |
| Video MMMU | - | 0.82 | High Reasoning Effort | Yes | Source |
| VideoMME | - | 0.79 | High Reasoning Effort | Yes | Source |
Benchmark comparisons
Use the selector to switch benchmarks and see how this model stacks up against its closest competitors.
OpenAI-MRCR: 2 needle 128k
Compare this model with the leading peers for the selected benchmark.
Benchmark
0.84
Rank #2/8
8 models
Showing 8 models around the selected model (out of 8 total).