Individual benchmark scores plotted by date.
| Organisation | Model | Reported | Top Score | Info | Self Reported | Source |
|---|---|---|---|---|---|---|
| Claude Sonnet 4 | 21 May 2025 | 90% | 1k Thinking | No | Source | |
| Claude 3.7 Sonnet | 24 Feb 2025 | 90% | 8k Thinking | No | Source | |
| GPT 5 Nano | 07 Aug 2025 | 90% | Medium Reasoning Effort | No | Source | |
| o1 pro | 19 Mar 2025 | 90% | Low Reasoning Effort | No | Source | |
| GPT 5.2 Pro | 11 Dec 2025 | 85.70% | Medium Reasoning Effort | Yes | - | |
| GPT 5 Mini | 07 Aug 2025 | 80% | Low Reasoning Effort | No | Source | |
| GPT 5.2 | 11 Dec 2025 | 80% | No Reasoning | Yes | - | |
| GPT 4.5 | 27 Feb 2025 | 80% | - | No | Source | |
| o1 | 17 Dec 2024 | 80% | Low Reasoning Effort | No | Source | |
| o1 mini | 12 Sept 2024 | 80% | - | No | Source | |
| Gemini 3.1 Pro Preview | 19 Feb 2026 | 77.10% | - | Yes | - | |
| Claude Opus 4.7 | 16 Apr 2026 | 75.83% | - | Yes | Source | |
| GPT 5.4 | 05 Mar 2026 | 73.30% | - | Yes | Source | |
| GPT 5 Search API | 14 Oct 2025 | 73.30% | inferred family alias from gpt-5.4 (score=0.3050; benches=19) | Yes | Source | |
| GPT 5 Pro | 07 Aug 2025 | 73.30% | inferred family alias from gpt-5.4 (score=0.4083; benches=19) | Yes | Source | |
| Claude Opus 4.6 | 05 Feb 2026 | 69.17% | ARC Prize Foundation private dataset; 120k thinking tokens; high effort | Yes | Source | |
| Claude Sonnet 4.6 | 17 Feb 2026 | 60.42% | ARC Prize Foundation private dataset; 120k thinking tokens; high effort | Yes | Source | |
| GPT 5.2 Chat | 11 Dec 2025 | 52.90% | inferred alias from gpt-5.2-2025-12-11 | Yes | Source | |
| Gemini 3 Pro Preview | 18 Nov 2025 | 45.10% | Deep Think, With Tools | Yes | Source | |
| Muse Spark | 08 Apr 2026 | 42.50% | Public set | Yes | Source | |
| GPT 5.1 | 12 Nov 2025 | 40% | No Reasoning | No | Source | |
| Grok 3 Mini | 18 Apr 2025 | 40% | Low Reasoning Effort | No | Source | |
| GPT 4.1 | 14 Apr 2025 | 40% | - | No | Source | |
| Claude Opus 4.5 | 24 Nov 2025 | 37.60% | 64k Thinking | No | Source | |
| Seed 2.0 Pro | 14 Feb 2026 | 37.50% | Seed2 official benchmark table | ARC-AGI-2 | Yes | Source | |
| Gemini 3 Flash Preview | 17 Dec 2025 | 33.60% | - | Yes | Source | |
| Gemini 3 Pro Image Preview (Nano Banana Pro) | 20 Nov 2025 | 31.10% | inferred modality/version alias from gemini-3-pro-preview | Yes | Source | |
| Grok 4 | 10 Jul 2025 | 16.20% | Thinking | Yes | Source | |
| Seed 2.0 Lite | 14 Feb 2026 | 14.80% | Seed2 official benchmark table | ARC-AGI-2 | Yes | Source | |
| Kimi K2.5 | 27 Jan 2026 | 11.80% | - | No | - | |
| GPT 5 | 07 Aug 2025 | 9.90% | High Reasoning Effort | No | Source | |
| Claude Opus 4 | 21 May 2025 | 8.60% | 16k Thinking | No | Source | |
| Gemini 2.5 Computer Use Preview | 07 Oct 2025 | 4.90% | inferred family alias from gemini-2.5-pro (score=0.3960; benches=16) | No | Source | |
| o3 Pro | 10 Jun 2025 | 4.90% | High Reasoning Effort | No | Source | |
| Gemini 2.5 Pro Preview TTS (2025-12-10) | 10 Dec 2025 | 4.90% | inferred modality/version alias from gemini-2.5-pro | No | Source | |
| Gemini 2.5 Pro Experimental (2025-03-25) | 25 Mar 2025 | 4.90% | inferred alias from gemini-2.5-pro | No | Source | |
| Gemini Embedding 2 Preview | 10 Mar 2026 | 4.90% | manual fallback alias from gemini-2.5-pro | No | Source | |
| GLM 5 | 11 Feb 2026 | 4.90% | - | No | - | |
| Gemini 2.5 Pro Preview (2025-06-05) | 05 Jun 2025 | 4.90% | 32k Thinking | No | Source | |
| DeepSeek V3.2 | 01 Dec 2025 | 4% | - | No | - | |
| o3 Preview | 20 Dec 2024 | 4% | Preview Model & Low Reasoning Effort | No | Source | |
| o3 mini | 30 Jan 2025 | 3% | High Reasoning Effort | No | Source | |
| o3 | 16 Apr 2025 | 3% | Medium Reasoning Effort | No | Source | |
| o4 Mini | 16 Apr 2025 | 2.40% | Medium Reasoning Effort | No | Source | |
| Seed 2.0 Mini | 14 Feb 2026 | 2.30% | Seed2 official benchmark table | ARC-AGI-2 | Yes | Source | |
| Codex Mini | 16 May 2025 | 1.30% | - | No | Source | |
| Deepseek R1 (2025-01-20) | 20 Jan 2025 | 1.30% | - | No | - | |
| Qwen 3 235B A22B | - | 1.30% | - | No | - | |
| Deepseek R1 (2025-05-28) | 28 May 2025 | 1.10% | - | No | - | |
| Magistral Small 1.0 | 10 Jun 2025 | 0% | - | No | Source | |
| Magistral Medium 1.0 | 10 Jun 2025 | 0% | Thinking | No | Source | |
| GPT 4.1 Nano | 14 Apr 2025 | 0% | - | No | Source | |
| GPT 4.1 Mini | 14 Apr 2025 | 0% | - | No | Source |