GPT 4.1
OpenAI
Highlights
Top benchmark results for openai/gpt-4-1-2025-04-14.
1037#7
1379#11
0.52#19
0.48#28
0.06#26
0.00#19
1235#11
0.66#41
0.62#6
0.58#5
0.56#12
1410#8
1257#7
0.24#20
0.27#21
0.55#15
0.35#2
0.79#4
Benchmark table
| Benchmark | Category | Top Score | Info | Self Reported | Source |
|---|---|---|---|---|---|
| Ai2 SciArena | - | 1037 | - | No | Source |
| AidanBench | - | 1379 | - | No | Source |
| Aider-Polyglot | code | 0.52 | - | No | Source |
| AIME 2024 | math | 0.48 | - | Yes | Source |
| ARC-AGI-1 | - | 0.06 | - | No | Source |
| ARC-AGI-2 | - | 0.00 | - | No | Source |
| BrowseComp Long Context 128k | - | 0.86 | - | Yes | Source |
| BrowseComp Long Context 256k | - | 0.76 | - | Yes | Source |
| EQ-Bench 3 | - | 1235 | - | No | Source |
| FActScore hallucination rate | hallucinations | 0.07 | - | Yes | Source |
| GPQA Diamond | general-knowledge | 0.66 | - | Yes | Source |
| Graphwalks bfs <128k | - | 0.62 | - | Yes | Source |
| Graphwalks parents <128k | - | 0.58 | - | Yes | Source |
| LiveBench | - | 0.56 | - | No | Source |
| LMArena Text | - | 1410 | - | No | Source |
| LMArena WebDev | - | 1257 | 16th June 2025 | No | Source |
| LongFact-Concepts hallucination rate | hallucinations | 0.01 | - | Yes | Source |
| LongFact-Objects hallucination rate | hallucinations | 0.01 | - | Yes | Source |
| NYT Connections | - | 0.24 | - | No | Source |
| OpenAI-MRCR: 2 needle 128k | - | 0.57 | - | Yes | Source |
| OpenAI-MRCR: 2 needle 256k | - | 0.56 | - | Yes | Source |
| SimpleBench | - | 0.27 | - | No | Source |
| SWE-Bench | code | 0.55 | - | Yes | Source |
| SWE-Lancer | code | 0.35 | IC-Diamond | Yes | Source |
| VideoMME | - | 0.79 | - | Yes | Source |
Benchmark comparisons
Use the selector to switch benchmarks and see how this model stacks up against its closest competitors.
GPQA Diamond
Compare this model with the leading peers for the selected benchmark.
Benchmark
0.66
Rank #41/102
102 models
Showing 11 models around the selected model (out of 102 total).