Individual benchmark scores plotted by date.
| Organisation | Model | Reported | Top Score | Info | Self Reported | Source |
|---|---|---|---|---|---|---|
| Gemini 3 Pro Preview | 18 Nov 2025 | 74.80% | - | No | Source | |
| Gemini 2.5 Pro Preview (2025-06-05) | 05 Jun 2025 | 62.40% | - | No | Source | |
| Claude Opus 4.1 | 05 Aug 2025 | 60% | - | No | Source | |
| Claude Opus 4 | 21 May 2025 | 58.80% | - | No | Source | |
| GPT 5 | 07 Aug 2025 | 56.70% | High Reasoning Effort | No | Source | |
| o3 | 16 Apr 2025 | 53.10% | High Reasoning Effort | No | Source | |
| Claude 3.7 Sonnet | 24 Feb 2025 | 46.40% | Thinking | No | Source | |
| Claude Sonnet 4 | 21 May 2025 | 45.50% | Thinking | No | Source | |
| o1 preview | 12 Sept 2024 | 41.70% | - | No | Source | |
| Claude 3.5 Sonnet (2024-10-22) | 22 Oct 2024 | 41.40% | - | No | Source | |
| Deepseek R1 (2025-05-28) | 28 May 2025 | 40.80% | - | No | Source | |
| o1 | 17 Dec 2024 | 40.10% | - | No | Source | |
| o4 Mini | 16 Apr 2025 | 38.70% | High Reasoning Effort | No | Source | |
| Grok 3 | 18 Apr 2025 | 36.10% | - | No | Source | |
| GPT 4.5 | 27 Feb 2025 | 34.50% | - | No | Source | |
| Qwen 3 235B A22B | - | 31% | - | No | Source | |
| Deepseek R1 (2025-01-20) | 20 Jan 2025 | 30.90% | - | No | Source | |
| Gemini 2.0 Flash | 05 Feb 2025 | 30.70% | Thinking | No | Source | |
| Claude 3.5 Sonnet (2024-06-20) | 21 Jun 2024 | 27.50% | - | No | Source | |
| DeepSeek V3 (2025-03-24) | 25 Mar 2025 | 27.20% | - | No | Source | |
| Gemini 1.5 Pro 002 | 24 Sept 2024 | 27.10% | - | No | Source | |
| GPT 4.1 | 14 Apr 2025 | 27% | - | No | Source | |
| GPT 4 Turbo (2023-03-14) | 14 Mar 2023 | 25.10% | - | No | Source | |
| Claude 3 Opus | 04 Mar 2024 | 23.50% | - | No | Source | |
| Llama 3.1 405B Instruct | 23 Jul 2024 | 23% | - | No | Source | |
| o3 mini | 30 Jan 2025 | 22.80% | High Reasoning Effort | No | Source | |
| Grok 2 | 13 Aug 2024 | 22.70% | - | No | Source | |
| Mistral Large 2.0 | 24 Jul 2024 | 22.50% | - | No | Source | |
| Llama 3.3 70B Instruct | 06 Dec 2024 | 19.90% | - | No | Source | |
| DeepSeek V3 (2024-12-26) | 26 Dec 2024 | 18.90% | - | No | Source | |
| o1 mini | 12 Sept 2024 | 18.10% | - | No | Source | |
| GPT 4o (2024-11-20) | 20 Nov 2024 | 17.80% | - | No | Source | |
| Command R+ (2024-08-30) | 30 Aug 2024 | 17.40% | - | No | Source | |
| GPT 4o Mini (2024-07-18) | 18 Jul 2024 | 10.70% | - | No | Source |