GPT 5
OpenAI
Highlights
Top benchmark results for openai/gpt-5-2025-08-07.
0.88#1
1.00#3
0.66#7
0.10#6
0.55#1
0.81#1
0.99#1
10.34#1
8.60#1
0.66#1
0.26#2
0.87#6
0.78#1
0.73#1
0.67#1
0.46#1
0.97#3
0.35#6
0.02#2
0.70#3
0.84#1
0.78#1
0.57#4
0.75#3
0.63#4
0.81#3
0.97#2
0.85#1
0.87#1
Benchmark table
| Benchmark | Category | Top Score | Info | Self Reported | Source |
|---|---|---|---|---|---|
| Aider-Polyglot | code | 0.88 | With Thinking, Pass @ 1, Diff Method | Yes | Source |
| AIME 2025 | math | 1.00 | Thinking, With Python, Pass @ 1 | Yes | Source |
| ARC-AGI-1 | - | 0.66 | High Reasoning Effort | No | Source |
| ARC-AGI-2 | - | 0.10 | High Reasoning Effort | No | Source |
| BrowseComp | agents | 0.55 | With Thinking, Pass @ 1 | Yes | Source |
| BrowseComp Long Context 128k | - | 0.90 | High Reasoning Effort | Yes | Source |
| BrowseComp Long Context 256k | - | 0.89 | High Reasoning Effort | Yes | Source |
| CharXiv-Reasoning | - | 0.81 | With Thinking, Pass @ 1 | Yes | Source |
| COLLIE | - | 0.99 | With Thinking, Pass @ 1 | Yes | Source |
| Confabulations | - | 10.34 | Medium Reasoning Effort | No | Source |
| Creative Story Writing | - | 8.60 | Medium Reasoning Effort | No | Source |
| ERQA | - | 0.66 | With Thinking, Pass @ 1 | Yes | Source |
| FActScore hallucination rate | hallucinations | 0.03 | High Reasoning Effort | Yes | Source |
| Frontier Math | math | 0.26 | With Thinking, With Python, Pass @ 1 | Yes | Source |
| GPQA Diamond | general-knowledge | 0.87 | Pass @ 1 | Yes | Source |
| Graphwalks bfs <128k | - | 0.78 | High Reasoning Effort | Yes | Source |
| Graphwalks parents <128k | - | 0.73 | High Reasoning Effort | Yes | Source |
| HealthBench | health | 0.67 | With Thinking | Yes | Source |
| HealthBench Hard | health | 0.46 | With Thinking | Yes | Source |
| HMMT 2025 | - | 0.97 | Pass @ 1 | Yes | Source |
| Humanity's Last Exam | - | 0.35 | Pass @ 1 | Yes | Source |
| LongFact-Concepts hallucination rate | hallucinations | 0.01 | High Reasoning Effort | Yes | Source |
| LongFact-Objects hallucination rate | hallucinations | 0.01 | High Reasoning Effort | Yes | Source |
| MathArena Apex | - | 0.02 | High Reasoning Effort + Agent | No | Source |
| MMLU Pro | - | 0.70 | With Thinking, Pass @ 1 | Yes | Source |
| MMMU | - | 0.84 | With Thinking, Pass @ 1 | Yes | Source |
| MMMU Pro | - | 0.78 | With Thinking, Pass @ 1 | Yes | Source |
| OpenAI-MRCR: 2 needle 128k | - | 0.95 | High Reasoning Effort | Yes | Source |
| OpenAI-MRCR: 2 needle 256k | - | 0.87 | High Reasoning Effort | Yes | Source |
| SimpleBench | - | 0.57 | High Reasoning Effort | No | Source |
| SWE-Bench | code | 0.75 | With Thinking, Pass @ 1 | Yes | Source |
| Tau 2 Airline | - | 0.63 | With Thinking, Pass @ 1 | Yes | Source |
| Tau 2 Retail | - | 0.81 | With Thinking, Pass @ 1 | Yes | Source |
| Tau 2 Telecom | - | 0.97 | With Thinking, Pass @ 1 | Yes | Source |
| Video MMMU | - | 0.85 | With Thinking, Pass @ 1 | Yes | Source |
| VideoMME | - | 0.87 | High Reasoning Effort | Yes | Source |
Benchmark comparisons
Use the selector to switch benchmarks and see how this model stacks up against its closest competitors.
BrowseComp
Compare this model with the leading peers for the selected benchmark.
Benchmark
0.55
Rank #1/1
1 models
Showing 1 models around the selected model (out of 1 total).