Individual benchmark scores plotted by date.
| Organisation | Model | Reported | Top Score | Info | Self Reported | Source |
|---|---|---|---|---|---|---|
| GPT 5 | 07 Aug 2025 | 88% | With Thinking, Pass @ 1, Diff Method | Yes | Source | |
| o3 Pro | 10 Jun 2025 | 84.90% | High Reasoning Effort | No | Source | |
| Gemini 2.5 Pro Preview TTS (2025-05-20) | 20 May 2025 | 82.20% | inferred family alias from gemini-2.5-pro-preview-06-05 (score=0.4243; benches=13) | Yes | Source | |
| Gemini 2.5 Pro Preview (2025-06-05) | 05 Jun 2025 | 82.20% | Diff-Fenced | No | Source | |
| o3 | 16 Apr 2025 | 81.30% | High Reasoning Effort | No | Source | |
| Seed 2.0 Pro | 14 Feb 2026 | 80% | Seed2 official benchmark table | Aider Polyglot | Yes | Source | |
| Grok 4 | 10 Jul 2025 | 79.60% | Diff | No | Source | |
| Gemini 2.5 Computer Use Preview | 07 Oct 2025 | 76.50% | inferred family alias from gemini-2.5-pro (score=0.3960; benches=16) | Yes | Source | |
| Gemini 2.5 Pro Preview (2025-05-06) | 06 May 2025 | 76.50% | Whole | Yes | Source | |
| Gemini 2.5 Pro Experimental (2025-03-25) | 25 Mar 2025 | 76.50% | inferred alias from gemini-2.5-pro | Yes | Source | |
| Gemini 2.5 Pro Preview TTS (2025-12-10) | 10 Dec 2025 | 76.50% | inferred modality/version alias from gemini-2.5-pro | Yes | Source | |
| Gemini Embedding 2 Preview | 10 Mar 2026 | 76.50% | manual fallback alias from gemini-2.5-pro | Yes | Source | |
| Seed 2.0 Lite | 14 Feb 2026 | 76% | Seed2 official benchmark table | Aider Polyglot | Yes | Source | |
| Deepseek V3.2 Exp | 29 Sept 2025 | 74.50% | - | Yes | Source | |
| DeepSeek OCR 2 | - | 74.50% | inferred family alias from deepseek-v3.2-exp (score=0.3809; benches=14) | Yes | Source | |
| o4 Mini | 16 Apr 2025 | 72% | High Reasoning Effort | Yes | Source | |
| Claude Opus 4 | 21 May 2025 | 72% | 32k Thinking | No | Source | |
| GPT 5 Mini | 07 Aug 2025 | 71.60% | High Reasoning Effort, Diff Method | Yes | Source | |
| o4 mini Deep Research | 26 Jun 2025 | 68.90% | inferred modality/version alias from o4-mini | Yes | Source | |
| DeepSeek V3.1 Terminus | 22 Sept 2025 | 68.40% | inferred alias from deepseek-v3.1 | Yes | Source | |
| DeepSeek V3.1 | 21 Aug 2025 | 68.40% | Non-thinking: 68.4%, Thinking: 76.3% | Yes | Source | |
| o3 mini | 30 Jan 2025 | 66.70% | - | Yes | Source | |
| Claude 3.7 Sonnet | 24 Feb 2025 | 64.90% | 32k Thinking | No | Source | |
| Gemini 2.5 Flash Image Preview (Nano Banana) | 25 Aug 2025 | 61.90% | inferred modality/version alias from gemini-2.5-flash | Yes | Source | |
| Gemini 2.5 Flash Image (Nano Banana) | 02 Oct 2025 | 61.90% | inferred modality/version alias from gemini-2.5-flash | Yes | Source | |
| Gemini 2.5 Flash Preview (2025-09-25) | 25 Sept 2025 | 61.90% | inferred alias from gemini-2.5-flash | Yes | Source | |
| Gemini 2.5 Flash Native Audio Preview (2025-09-23) | - | 61.90% | inferred modality/version alias from gemini-2.5-flash | Yes | Source | |
| Gemini 2.5 Flash Preview TTS (2025-12-10) | 10 Dec 2025 | 61.90% | inferred modality/version alias from gemini-2.5-flash | Yes | Source | |
| Gemini 2.5 Flash Preview TTS (2025-05-20) | 20 May 2025 | 61.90% | inferred modality/version alias from gemini-2.5-flash | Yes | Source | |
| Gemini 2.5 Flash Exp Native Audio Thinking Dialog | - | 61.90% | inferred modality/version alias from gemini-2.5-flash | Yes | Source | |
| Gemini 2.5 Flash Preview Native Audio Dialog | - | 61.90% | inferred modality/version alias from gemini-2.5-flash | Yes | Source | |
| Gemini Live 2.5 Flash Preview | 09 Apr 2025 | 61.90% | inferred high-confidence family alias from gemini-2.5-flash (score=0.5083; benches=14) | Yes | Source | |
| Gemini 2.5 Flash Preview (2025-05-20) | 20 May 2025 | 61.90% | Whole | Yes | Source | |
| Qwen 3 Coder 480B A35B Instruct | - | 61.80% | - | Yes | Source | |
| o1 | 17 Dec 2024 | 61.70% | - | No | Source | |
| Claude Sonnet 4 | 21 May 2025 | 61.30% | 32k Thinking | No | Source | |
| Kimi K2 (2025-09-05) | 05 Sept 2025 | 60% | Acc | Yes | Source | |
| Qwen 3 A235 A22B Instruct 2507 | - | 57.30% | - | Yes | Source | |
| Deepseek R1 (2025-01-20) | 20 Jan 2025 | 56.90% | - | No | Source | |
| DeepSeek V3 (2025-03-24) | 25 Mar 2025 | 55.10% | - | No | Source | |
| Grok 3 Beta | 19 Feb 2025 | 53.30% | - | No | Source | |
| GPT 4.1 | 14 Apr 2025 | 52.40% | - | No | Source | |
| Claude 3.5 Sonnet (2024-10-22) | 22 Oct 2024 | 51.60% | - | No | Source | |
| Gemini 2.5 Flash Preview (2025-04-17) | 17 Apr 2025 | 51.10% | Thinking, Whole | Yes | Source | |
| Qwen 3 Next 80B A3B Instruct | - | 49.80% | - | Yes | Source | |
| DeepSeek OCR | 20 Oct 2025 | 49.60% | inferred family alias from deepseek-v3 (score=0.3000; benches=20) | Yes | Source | |
| DeepSeek V4 | - | 49.60% | inferred high-confidence family alias from deepseek-v3 (score=0.5818; benches=20) | Yes | Source | |
| DeepSeek V2 (2024-06-28) | 28 Jun 2024 | 49.60% | inferred family alias from deepseek-v3 (score=0.4159; benches=20) | Yes | Source | |
| GPT 5 Nano | 07 Aug 2025 | 48.40% | High Reasoning Effort, Diff Method | Yes | Source | |
| Magistral Medium 1.2 | 17 Sept 2025 | 47.10% | inferred version-family alias from magistral-medium | Yes | Source | |
| Magistral Medium 1.0 | 10 Jun 2025 | 47.10% | - | Yes | Source | |
| Magistral Medium 1.1 | 24 Jul 2025 | 47.10% | inferred version-family alias from magistral-medium | Yes | Source | |
| GPT 4.5 | 27 Feb 2025 | 44.90% | - | No | Source | |
| GPT OSS 120b | 05 Aug 2025 | 44.40% | High Reasoning Effort | Yes | Source | |
| GPT OSS 20b | 05 Aug 2025 | 34.20% | High Reasoning Effort | Yes | Source | |
| o1 mini | 12 Sept 2024 | 32.90% | - | No | Source | |
| GPT 4.1 Mini | 14 Apr 2025 | 32.40% | - | No | Source | |
| GPT 4o Search Preview | 11 Mar 2025 | 30.70% | inferred modality/version alias from gpt-4o-2024-08-06 | Yes | Source | |
| GPT 4o Transcribe Diarize | 15 Oct 2025 | 30.70% | inferred modality/version alias from gpt-4o-2024-08-06 | Yes | Source | |
| GPT 4o Transcribe | 20 Mar 2025 | 30.70% | inferred modality/version alias from gpt-4o-2024-08-06 | Yes | Source | |
| GPT 4o (2024-08-06) | 06 Aug 2024 | 30.70% | - | Yes | Source | |
| GPT 4o Audio (2024-10-01) | 01 Oct 2024 | 30.70% | inferred modality/version alias from gpt-4o-2024-08-06 | Yes | Source | |
| GPT 4o Audio (2025-06-03) | 03 Jun 2025 | 30.70% | inferred modality/version alias from gpt-4o-2024-08-06 | Yes | Source | |
| GPT 4o Realtime Preview (2024-10-01) | 01 Oct 2024 | 30.70% | inferred modality/version alias from gpt-4o-2024-08-06 | Yes | Source | |
| GPT 4o Realtime Preview (2025-06-03) | 03 Jun 2025 | 30.70% | inferred modality/version alias from gpt-4o-2024-08-06 | Yes | Source | |
| GPT 4o Audio (2024-12-17) | 17 Dec 2024 | 30.70% | inferred modality/version alias from gpt-4o-2024-08-06 | Yes | Source | |
| Claude 3.5 Haiku | 04 Nov 2024 | 28% | - | No | Source | |
| Gemini 2.5 Flash Lite Preview (2025-06-17) | 17 Jun 2025 | 27.10% | Thinking | Yes | Source | |
| Gemini 2.5 Flash Lite Preview (2025-09-25) | 25 Sept 2025 | 26.70% | inferred alias from gemini-2.5-flash-lite | Yes | Source | |
| Gemini 2.0 Flash | 05 Feb 2025 | 22.20% | Whole | Yes | Source | |
| GPT 4.1 Nano | 14 Apr 2025 | 8.90% | - | No | Source |