Individual benchmark scores plotted by date.
| Organisation | Model | Reported | Top Score | Info | Self Reported | Source |
|---|---|---|---|---|---|---|
| Claude Mythos Preview | 07 Apr 2026 | 64.70% | With tools | Yes | Source | |
| GPT 5.4 Pro | 05 Mar 2026 | 58.70% | With tools | Yes | Source | |
| Muse Spark Contemplating | 08 Apr 2026 | 58.40% | With tools | Yes | Source | |
| Claude Opus 4.6 | 05 Feb 2026 | 53.10% | - | Yes | Source | |
| GLM 5.1 | - | 52.30% | With tools | Yes | Source | |
| GPT 5.4 | 05 Mar 2026 | 52.10% | With tools | Yes | Source | |
| Gemini 3.1 Pro Preview | 19 Feb 2026 | 51.40% | Search + Code | Yes | - | |
| Kimi K2 Thinking | 06 Nov 2025 | 51% | inferred alias from kimi-k2-thinking-0905 | Yes | Source | |
| Grok 4 Heavy | 10 Jul 2025 | 50.70% | - | Yes | Source | |
| Muse Spark | 08 Apr 2026 | 50.40% | With tools | Yes | Source | |
| Kimi K2.5 | 27 Jan 2026 | 50.20% | - | Yes | Source | |
| GPT 5.2 Pro | 11 Dec 2025 | 50% | With Search + Python | Yes | Source | |
| Claude Sonnet 4.6 | 17 Feb 2026 | 49% | With tools | Yes | Source | |
| Qwen 3.5 Flash | 23 Feb 2026 | 48.50% | inferred family alias from qwen3.5-27b (score=0.4147; benches=81) | Yes | Source | |
| Qwen 3.5 27B | 24 Feb 2026 | 48.50% | - | Yes | Source | |
| Qwen 3.5 397B A17B | 16 Feb 2026 | 48.30% | With tools | Yes | Source | |
| Qwen 3.5 122B A10B | 24 Feb 2026 | 47.50% | - | Yes | Source | |
| Qwen 3.5 35B A3B | 24 Feb 2026 | 47.40% | - | Yes | Source | |
| Claude Opus 4.7 | 16 Apr 2026 | 46.90% | No tools | Yes | Source | |
| Gemini 3 Pro Image Preview (Nano Banana Pro) | 20 Nov 2025 | 45.80% | inferred modality/version alias from gemini-3-pro-preview | Yes | Source | |
| Gemini 3 Pro Preview | 18 Nov 2025 | 45.80% | With Search & Code Execution | Yes | Source | |
| GPT 5.2 | 11 Dec 2025 | 45.50% | With Search + Python | Yes | Source | |
| Gemini 3 Flash Preview | 17 Dec 2025 | 43.50% | - | Yes | Source | |
| Claude Opus 4.5 | 24 Nov 2025 | 43.20% | With Search | Yes | Source | |
| GLM 4.7 | 22 Dec 2025 | 42.80% | - | Yes | Source | |
| GPT 5.4 Mini | 17 Mar 2026 | 41.50% | With tools | Yes | Source | |
| Grok 4 | 10 Jul 2025 | 40% | - | Yes | Source | |
| GPT 5 Search API | 14 Oct 2025 | 39.80% | inferred family alias from gpt-5.4 (score=0.3050; benches=19) | Yes | Source | |
| GPT 5 Pro | 07 Aug 2025 | 39.80% | inferred family alias from gpt-5.4 (score=0.4083; benches=19) | Yes | Source | |
| Ernie 5.0 | 22 Jan 2026 | 39% | - | Yes | Source | |
| Ernie 5.0 Preview 1203 | - | 39% | inferred version-family alias from ernie-5.0 | Yes | Source | |
| Ernie 5.0 0110 | - | 39% | inferred version-family alias from ernie-5.0 | Yes | Source | |
| Ernie 5.0 Preview 1220 | - | 39% | inferred version-family alias from ernie-5.0 | Yes | Source | |
| GPT 5.4 Nano | 17 Mar 2026 | 37.70% | With tools | Yes | Source | |
| GPT 5 | 07 Aug 2025 | 35.20% | Pass @ 1 | Yes | Source | |
| GPT 5.2 Chat | 11 Dec 2025 | 34.50% | inferred alias from gpt-5.2-2025-12-11 | Yes | Source | |
| DeepSeek V3.2 Speciale | 01 Dec 2025 | 30.60% | - | Yes | Source | |
| Qwen 3.6 Plus | 01 Apr 2026 | 28.80% | - | Yes | Source | |
| Gemma 4 31B | 02 Apr 2026 | 26.50% | With search | Yes | Source | |
| Nemotron 3 Super | 11 Mar 2026 | 22.82% | - | Yes | Source | |
| MiMo V2 Flash | 16 Dec 2025 | 22.10% | - | Yes | Source | |
| MiniMax M2.1 | 23 Dec 2025 | 22% | - | Yes | Source | |
| Gemini 2.5 Pro Preview TTS (2025-05-20) | 20 May 2025 | 21.60% | inferred family alias from gemini-2.5-pro-preview-06-05 (score=0.4243; benches=13) | Yes | Source | |
| Gemini 2.5 Pro Preview (2025-06-05) | 05 Jun 2025 | 21.60% | No Tools | Yes | Source | |
| o3 | 16 Apr 2025 | 20.30% | - | Yes | Source | |
| DeepSeek OCR 2 | - | 19.80% | inferred family alias from deepseek-v3.2-exp (score=0.3809; benches=14) | Yes | Source | |
| Deepseek V3.2 Exp | 29 Sept 2025 | 19.80% | Text-only subset where applicable | Yes | Source | |
| GPT OSS 120b | 05 Aug 2025 | 19% | High Reasoning Effort, With Tools | Yes | Source | |
| Qwen 3 235B A22B Thinking 2507 | - | 18.20% | Text Only | Yes | Source | |
| Gemini 2.5 Pro Preview (2025-05-06) | 06 May 2025 | 17.80% | No Tools | Yes | Source | |
| Gemini 2.5 Pro Preview TTS (2025-12-10) | 10 Dec 2025 | 17.80% | inferred modality/version alias from gemini-2.5-pro | Yes | Source | |
| Gemini 2.5 Computer Use Preview | 07 Oct 2025 | 17.80% | inferred family alias from gemini-2.5-pro (score=0.3960; benches=16) | Yes | Source | |
| Gemini 2.5 Pro Experimental (2025-03-25) | 25 Mar 2025 | 17.80% | inferred alias from gemini-2.5-pro | Yes | Source | |
| Gemini Embedding 2 Preview | 10 Mar 2026 | 17.80% | manual fallback alias from gemini-2.5-pro | Yes | Source | |
| o4 Mini | 16 Apr 2025 | 17.70% | - | Yes | Source | |
| Deepseek R1 (2025-05-28) | 28 May 2025 | 17.70% | - | Yes | Source | |
| GPT OSS 20b | 05 Aug 2025 | 17.30% | High Reasoning Effort, With Tools | Yes | Source | |
| Gemma 4 26B A4B | 02 Apr 2026 | 17.20% | With search | Yes | Source | |
| GLM 4.6 | 30 Sept 2025 | 17.20% | - | Yes | Source | |
| GPT 5 Mini | 07 Aug 2025 | 16.70% | High Reasoning Effort, No Tools | Yes | Source | |
| Veo 3.1 Lite Preview | 31 Mar 2026 | 16% | manual fallback alias from gemini-3.1-flash-lite-preview | Yes | Source | |
| Gemini 3.1 Flash Image Preview (Nano Banana 2) | 26 Feb 2026 | 16% | inferred modality/version alias from gemini-3.1-flash-lite-preview | Yes | Source | |
| Gemini 3.1 Flash Lite Preview | 03 Mar 2026 | 16% | - | Yes | Source | |
| DeepSeek V3.1 Terminus | 22 Sept 2025 | 15.90% | inferred alias from deepseek-v3.1 | Yes | Source | |
| DeepSeek V3.1 | 21 Aug 2025 | 15.90% | Thinking mode only, text-only subset | Yes | Source | |
| Nemotron Nano 3 30B A3B | 15 Dec 2025 | 15.50% | - | Yes | Source | |
| GPT OSS Safeguard 120b | 29 Oct 2025 | 14.90% | inferred high-confidence family alias from gpt-oss-120b (score=0.5102; benches=7) | Yes | Source | |
| o4 mini Deep Research | 26 Jun 2025 | 14.70% | inferred modality/version alias from o4-mini | Yes | Source | |
| GLM 4.5 | 28 Jul 2025 | 14.40% | - | Yes | Source | |
| GLM 4.7 Flash | 19 Jan 2026 | 14.40% | - | Yes | Source | |
| Qwen 3 VL 235B A22B Thinking | - | 13.60% | - | Yes | - | |
| MiniMax M2 | 27 Oct 2025 | 12.50% | - | Yes | - | |
| MiniMax M2 Her | 24 Jan 2026 | 12.50% | inferred modality/version alias from minimax-m2 | Yes | - | |
| Gemini 2.5 Flash Preview (2025-04-17) | 17 Apr 2025 | 12.10% | Thinking, No Tools | Yes | Source | |
| Gemini 2.5 Flash Image (Nano Banana) | 02 Oct 2025 | 11% | inferred modality/version alias from gemini-2.5-flash | Yes | Source | |
| Gemini 2.5 Flash Preview Native Audio Dialog | - | 11% | inferred modality/version alias from gemini-2.5-flash | Yes | Source | |
| Gemini 2.5 Flash Preview (2025-05-20) | 20 May 2025 | 11% | No Tools | Yes | Source | |
| Gemini 2.5 Flash Image Preview (Nano Banana) | 25 Aug 2025 | 11% | inferred modality/version alias from gemini-2.5-flash | Yes | Source | |
| Gemini 2.5 Flash Preview TTS (2025-12-10) | 10 Dec 2025 | 11% | inferred modality/version alias from gemini-2.5-flash | Yes | Source | |
| Gemini 2.5 Flash Preview (2025-09-25) | 25 Sept 2025 | 11% | inferred alias from gemini-2.5-flash | Yes | Source | |
| Gemini 2.5 Flash Native Audio Preview (2025-09-23) | - | 11% | inferred modality/version alias from gemini-2.5-flash | Yes | Source | |
| Gemini 2.5 Flash Preview TTS (2025-05-20) | 20 May 2025 | 11% | inferred modality/version alias from gemini-2.5-flash | Yes | Source | |
| Gemini 2.5 Flash Exp Native Audio Thinking Dialog | - | 11% | inferred modality/version alias from gemini-2.5-flash | Yes | Source | |
| Gemini Live 2.5 Flash Preview | 09 Apr 2025 | 11% | inferred high-confidence family alias from gemini-2.5-flash (score=0.5083; benches=14) | Yes | Source | |
| GPT OSS Safeguard 20b | 29 Oct 2025 | 10.90% | inferred high-confidence family alias from gpt-oss-20b (score=0.5137; benches=7) | Yes | Source | |
| GLM 4.5 Air | 28 Jul 2025 | 10.60% | - | Yes | Source | |
| Magistral Medium 1.2 | 17 Sept 2025 | 9% | inferred version-family alias from magistral-medium | Yes | Source | |
| Magistral Medium 1.1 | 24 Jul 2025 | 9% | inferred version-family alias from magistral-medium | Yes | Source | |
| Magistral Medium 1.0 | 10 Jun 2025 | 9% | Text Only Subset | Yes | Source | |
| GPT 5 Nano | 07 Aug 2025 | 8.70% | High Reasoning Effort, No Tools | Yes | Source | |
| MiniMax M1 80K | 16 Jun 2025 | 8.40% | - | Yes | - | |
| Minimax M1 40K | 16 Jun 2025 | 7.20% | - | Yes | - | |
| Gemini 2.5 Flash Lite Preview (2025-06-17) | 17 Jun 2025 | 6.90% | Thinking, No Tools | Yes | Source | |
| Magistral Small 1.0 | 10 Jun 2025 | 6.40% | Text Only Subset | Yes | Source | |
| GPT 4o Transcribe Diarize | 15 Oct 2025 | 5.30% | inferred modality/version alias from gpt-4o-2024-08-06 | Yes | Source | |
| GPT 4o Audio (2025-06-03) | 03 Jun 2025 | 5.30% | inferred modality/version alias from gpt-4o-2024-08-06 | Yes | Source | |
| GPT 4o Realtime Preview (2025-06-03) | 03 Jun 2025 | 5.30% | inferred modality/version alias from gpt-4o-2024-08-06 | Yes | Source | |
| GPT 4o Audio (2024-12-17) | 17 Dec 2024 | 5.30% | inferred modality/version alias from gpt-4o-2024-08-06 | Yes | Source | |
| GPT 4o Transcribe | 20 Mar 2025 | 5.30% | inferred modality/version alias from gpt-4o-2024-08-06 | Yes | Source | |
| GPT 4o Audio (2024-10-01) | 01 Oct 2024 | 5.30% | inferred modality/version alias from gpt-4o-2024-08-06 | Yes | Source | |
| GPT 4o (2024-08-06) | 06 Aug 2024 | 5.30% | - | Yes | Source | |
| GPT 4o Search Preview | 11 Mar 2025 | 5.30% | inferred modality/version alias from gpt-4o-2024-08-06 | Yes | Source | |
| GPT 4o Realtime Preview (2024-10-01) | 01 Oct 2024 | 5.30% | inferred modality/version alias from gpt-4o-2024-08-06 | Yes | Source | |
| Gemini 2.0 Flash | 05 Feb 2025 | 5.10% | No Tools | Yes | Source | |
| Gemini 2.5 Flash Lite Preview (2025-09-25) | 25 Sept 2025 | 5.10% | inferred alias from gemini-2.5-flash-lite | Yes | Source | |
| Kimi K2 (2025-09-05) | 05 Sept 2025 | 4.70% | Text Only | Yes | Source |