Individual benchmark scores plotted by date.
| Organisation | Model | Reported | Top Score | Info | Self Reported | Source |
|---|---|---|---|---|---|---|
| Claude Mythos Preview | 07 Apr 2026 | 93.90% | Verified | Yes | Source | |
| Claude Opus 4.7 | 16 Apr 2026 | 87.60% | Verified | Yes | Source | |
| Claude Opus 4.5 | 24 Nov 2025 | 80.90% | Avg@5, 64k Thinking | Yes | Source | |
| Claude Opus 4.6 | 05 Feb 2026 | 80.80% | Verified | Yes | Source | |
| Gemini 3.1 Pro Preview | 19 Feb 2026 | 80.60% | Single Attempt | Yes | - | |
| MiniMax M2.5 | 12 Feb 2026 | 80.20% | - | Yes | Source | |
| GPT 5.2 | 11 Dec 2025 | 80% | - | Yes | Source | |
| GPT 5.2 Chat | 11 Dec 2025 | 80% | inferred alias from gpt-5.2-2025-12-11 | Yes | Source | |
| Claude Sonnet 4.6 | 17 Feb 2026 | 79.60% | Verified | Yes | Source | |
| Qwen 3.6 Plus | 01 Apr 2026 | 78.80% | - | Yes | Source | |
| MiMo V2 TTS | 18 Mar 2026 | 78% | inferred modality/version alias from mimo-v2-pro | Yes | Source | |
| MiMo V2 Pro | 18 Mar 2026 | 78% | - | Yes | Source | |
| Gemini 3 Flash Preview | 17 Dec 2025 | 78% | - | Yes | Source | |
| GLM 5 | 11 Feb 2026 | 77.80% | - | Yes | Source | |
| Muse Spark | 08 Apr 2026 | 77.40% | Verified | Yes | Source | |
| Kimi K2.5 | 27 Jan 2026 | 76.80% | - | Yes | Source | |
| Seed 2.0 Pro | 14 Feb 2026 | 76.50% | - | Yes | Source | |
| Qwen 3.5 397B A17B | 16 Feb 2026 | 76.40% | - | Yes | Source | |
| GPT 5.1 Chat | 13 Nov 2025 | 76.30% | inferred alias from gpt-5.1-2025-11-13 | Yes | Source | |
| Gemini 3 Pro Preview | 18 Nov 2025 | 76.20% | Single Attempt | Yes | Source | |
| Gemini 3 Pro Image Preview (Nano Banana Pro) | 20 Nov 2025 | 76.20% | inferred modality/version alias from gemini-3-pro-preview | Yes | Source | |
| GPT 5 | 07 Aug 2025 | 74.90% | With Thinking, Pass @ 1 | Yes | Source | |
| MiMo V2 Omni | 18 Mar 2026 | 74.80% | - | Yes | Source | |
| Claude Opus 4.1 | 05 Aug 2025 | 74.50% | - | Yes | Source | |
| GPT 5 Codex | 15 Sept 2025 | 74.50% | inferred alias from gpt-5-codex-2025-09-15 | Yes | Source | |
| Step 3.5 Flash | - | 74.40% | - | Yes | Source | |
| GLM 4.7 | 22 Dec 2025 | 73.80% | - | Yes | Source | |
| GPT 5.1 Codex | 13 Nov 2025 | 73.70% | - | Yes | Source | |
| GPT Audio 1.5 | 23 Feb 2026 | 73.70% | manual fallback alias from gpt-5.1-codex | Yes | Source | |
| GPT Realtime 1.5 | 23 Feb 2026 | 73.70% | manual fallback alias from gpt-5.1-codex | Yes | Source | |
| Seed 2.0 Lite | 14 Feb 2026 | 73.50% | Seed2 official benchmark table | SWE Bench Verified | Yes | Source | |
| MiMo V2 Flash | 16 Dec 2025 | 73.40% | - | Yes | Source | |
| Claude Haiku 4.5 | 15 Oct 2025 | 73.30% | inferred alias from claude-haiku-4-5-20251001 | Yes | Source | |
| DeepSeek V3.2 Speciale | 01 Dec 2025 | 73.10% | - | Yes | Source | |
| Qwen 3.5 27B | 24 Feb 2026 | 72.40% | - | Yes | Source | |
| Qwen 3.5 Flash | 23 Feb 2026 | 72.40% | inferred family alias from qwen3.5-27b (score=0.4147; benches=81) | Yes | Source | |
| Qwen 3.5 122B A10B | 24 Feb 2026 | 72% | - | Yes | Source | |
| Kimi K2 (2025-09-05) | 05 Sept 2025 | 71.60% | Multiple Attempts (Acc) | Yes | Source | |
| Kimi K2 Thinking | 06 Nov 2025 | 71.30% | inferred alias from kimi-k2-thinking-0905 | Yes | Source | |
| GPT 5 Mini | 07 Aug 2025 | 71% | High Reasoning Effort, No Tools | Yes | Source | |
| Grok Code Fast 1 | 28 Aug 2025 | 70.80% | - | Yes | Source | |
| Nova 2 Pro | 02 Dec 2025 | 70% | - | Yes | Source | |
| Qwen 3 Coder 480B A35B Instruct | - | 69.60% | OpenHands Scaffold, 500 Turns | Yes | Source | |
| Qwen 3 14B | - | 69.60% | inferred family alias from qwen3-max (score=0.3333; benches=6) | Yes | Source | |
| Qwen 3 TTS (2025-11-27) | - | 69.60% | inferred family alias from qwen3-max (score=0.3833; benches=6) | Yes | Source | |
| Qwen 3 Max Thinking | 26 Jan 2026 | 69.60% | inferred alias from qwen3-max | Yes | Source | |
| MiniMax M2 | 27 Oct 2025 | 69.40% | - | Yes | - | |
| MiniMax M2 Her | 24 Jan 2026 | 69.40% | inferred modality/version alias from minimax-m2 | Yes | - | |
| Qwen 3.5 35B A3B | 24 Feb 2026 | 69.20% | - | Yes | Source | |
| o4 Mini | 16 Apr 2025 | 68.10% | - | Yes | Source | |
| o4 mini Deep Research | 26 Jun 2025 | 68.10% | inferred modality/version alias from o4-mini | Yes | Source | |
| GLM 4.6 | 30 Sept 2025 | 68% | - | Yes | Source | |
| Deepseek V3.2 Exp | 29 Sept 2025 | 67.80% | - | Yes | Source | |
| DeepSeek OCR 2 | - | 67.80% | inferred family alias from deepseek-v3.2-exp (score=0.3809; benches=14) | Yes | Source | |
| Gemini 2.5 Pro Preview (2025-06-05) | 05 Jun 2025 | 67.20% | Multiple Attempts | Yes | Source | |
| Gemini 2.5 Pro Preview TTS (2025-05-20) | 20 May 2025 | 67.20% | inferred family alias from gemini-2.5-pro-preview-06-05 (score=0.4243; benches=13) | Yes | Source | |
| MiniMax M2.1 | 23 Dec 2025 | 67% | - | Yes | Source | |
| DeepSeek V3.1 Terminus | 22 Sept 2025 | 66% | inferred alias from deepseek-v3.1 | Yes | Source | |
| DeepSeek V3.1 | 21 Aug 2025 | 66% | Evaluated with internal code agent framework | Yes | Source | |
| Nova 2 Lite | 02 Dec 2025 | 64.50% | - | Yes | Source | |
| GLM 4.5 | 28 Jul 2025 | 64.20% | - | Yes | Source | |
| Gemini 2.5 Computer Use Preview | 07 Oct 2025 | 63.20% | inferred family alias from gemini-2.5-pro (score=0.3960; benches=16) | Yes | Source | |
| Trinity Large Thinking | 01 Apr 2026 | 63.20% | Hugging Face model card benchmark table (arcee-ai/Trinity-Large-Thinking) | Yes | Source | |
| Gemini 2.5 Pro Preview TTS (2025-12-10) | 10 Dec 2025 | 63.20% | inferred modality/version alias from gemini-2.5-pro | Yes | Source | |
| Gemini Embedding 2 Preview | 10 Mar 2026 | 63.20% | manual fallback alias from gemini-2.5-pro | Yes | Source | |
| Gemini 2.5 Pro Experimental (2025-03-25) | 25 Mar 2025 | 63.20% | inferred alias from gemini-2.5-pro | Yes | Source | |
| Gemini 2.5 Pro Preview (2025-05-06) | 06 May 2025 | 63.20% | - | Yes | Source | |
| GPT OSS 120b | 05 Aug 2025 | 62.40% | High Reasoning Effort | Yes | Source | |
| Devstral Medium 1.0 | 10 Jul 2025 | 61.60% | - | Yes | Source | |
| GPT OSS 20b | 05 Aug 2025 | 60.70% | High Reasoning Effort | Yes | Source | |
| Gemini 2.5 Flash Preview Native Audio Dialog | - | 60.40% | inferred modality/version alias from gemini-2.5-flash | Yes | Source | |
| Gemini 2.5 Flash Preview TTS (2025-12-10) | 10 Dec 2025 | 60.40% | inferred modality/version alias from gemini-2.5-flash | Yes | Source | |
| Gemini 2.5 Flash Preview (2025-09-25) | 25 Sept 2025 | 60.40% | inferred alias from gemini-2.5-flash | Yes | Source | |
| Longcat Flash Cat | - | 60.40% | inferred high-confidence family alias from longcat-flash-chat (score=0.4667; benches=16) | Yes | Source | |
| Gemini 2.5 Flash Preview (2025-05-20) | 20 May 2025 | 60.40% | - | Yes | Source | |
| Gemini 2.5 Flash Exp Native Audio Thinking Dialog | - | 60.40% | inferred modality/version alias from gemini-2.5-flash | Yes | Source | |
| Gemini Live 2.5 Flash Preview | 09 Apr 2025 | 60.40% | inferred high-confidence family alias from gemini-2.5-flash (score=0.5083; benches=14) | Yes | Source | |
| Gemini 2.5 Flash Image Preview (Nano Banana) | 25 Aug 2025 | 60.40% | inferred modality/version alias from gemini-2.5-flash | Yes | Source | |
| Gemini 2.5 Flash Preview TTS (2025-05-20) | 20 May 2025 | 60.40% | inferred modality/version alias from gemini-2.5-flash | Yes | Source | |
| Gemini 2.5 Flash Image (Nano Banana) | 02 Oct 2025 | 60.40% | inferred modality/version alias from gemini-2.5-flash | Yes | Source | |
| Gemini 2.5 Flash Native Audio Preview (2025-09-23) | - | 60.40% | inferred modality/version alias from gemini-2.5-flash | Yes | Source | |
| GLM 4.7 Flash | 19 Jan 2026 | 59.20% | - | Yes | Source | |
| GLM 4.5 Air | 28 Jul 2025 | 57.60% | - | Yes | Source | |
| MiniMax M1 80K | 16 Jun 2025 | 56% | - | Yes | - | |
| Minimax M1 40K | 16 Jun 2025 | 55.60% | - | Yes | - | |
| GPT 5 Nano | 07 Aug 2025 | 54.70% | High Reasoning Effort, No Tools | Yes | Source | |
| GPT 4.1 | 14 Apr 2025 | 54.60% | - | Yes | Source | |
| Nemotron 3 Super | 11 Mar 2026 | 53.73% | - | Yes | Source | |
| Devstral Small 2.0 | 09 Dec 2025 | 53.60% | inferred version-family alias from devstral-small-2507 | Yes | Source | |
| Devstral Small 1.1 | 10 Jul 2025 | 53.60% | OpenHands Scaffold | Yes | Source | |
| Devstral Small 1.0 | 21 May 2025 | 53.60% | inferred version-family alias from devstral-small-2507 | Yes | Source | |
| o3 mini | 30 Jan 2025 | 49.30% | - | Yes | Source | |
| Gemini 2.5 Flash Lite Preview (2025-06-17) | 17 Jun 2025 | 44.90% | Thinking, Multiple Attempts | Yes | Source | |
| DeepSeek V2 (2024-06-28) | 28 Jun 2024 | 42% | inferred family alias from deepseek-v3 (score=0.4159; benches=20) | Yes | Source | |
| DeepSeek OCR | 20 Oct 2025 | 42% | inferred family alias from deepseek-v3 (score=0.3000; benches=20) | Yes | Source | |
| DeepSeek V4 | - | 42% | inferred high-confidence family alias from deepseek-v3 (score=0.5818; benches=20) | Yes | Source | |
| o1 preview | 12 Sept 2024 | 41.30% | - | Yes | Source | |
| Nemotron Nano 3 30B A3B | 15 Dec 2025 | 38.80% | - | Yes | Source | |
| GPT 4.5 | 27 Feb 2025 | 38% | - | Yes | Source | |
| GPT 4o Transcribe Diarize | 15 Oct 2025 | 33.20% | inferred modality/version alias from gpt-4o-2024-08-06 | Yes | Source | |
| GPT 4o Audio (2025-06-03) | 03 Jun 2025 | 33.20% | inferred modality/version alias from gpt-4o-2024-08-06 | Yes | Source | |
| GPT 4o Realtime Preview (2025-06-03) | 03 Jun 2025 | 33.20% | inferred modality/version alias from gpt-4o-2024-08-06 | Yes | Source | |
| GPT 4o Transcribe | 20 Mar 2025 | 33.20% | inferred modality/version alias from gpt-4o-2024-08-06 | Yes | Source | |
| GPT 4o Audio (2024-10-01) | 01 Oct 2024 | 33.20% | inferred modality/version alias from gpt-4o-2024-08-06 | Yes | Source | |
| GPT 4o Audio (2024-12-17) | 17 Dec 2024 | 33.20% | inferred modality/version alias from gpt-4o-2024-08-06 | Yes | Source | |
| GPT 4o Search Preview | 11 Mar 2025 | 33.20% | inferred modality/version alias from gpt-4o-2024-08-06 | Yes | Source | |
| GPT 4o (2024-08-06) | 06 Aug 2024 | 33.20% | - | Yes | Source | |
| GPT 4o Realtime Preview (2024-10-01) | 01 Oct 2024 | 33.20% | inferred modality/version alias from gpt-4o-2024-08-06 | Yes | Source | |
| Gemini 2.5 Flash Lite Preview (2025-09-25) | 25 Sept 2025 | 31.60% | inferred alias from gemini-2.5-flash-lite | Yes | Source | |
| Gemini Diffusion | 20 May 2025 | 22.90% | Pass@1 | Yes | Source | |
| DeepSeek V2.5 (2024-12-10) | 10 Dec 2024 | 16.80% | inferred alias from deepseek-v2.5 | Yes | Source | |
| DeepSeek V2.5 (2024-09-05) | 05 Sept 2024 | 16.80% | inferred alias from deepseek-v2.5 | Yes | Source | |
| GPT 4o Mini (2024-07-18) | 18 Jul 2024 | 9% | - | Yes | Source | |
| GPT 4o Mini TTS (2025-03-20) | 20 Mar 2025 | 8.70% | inferred modality/version alias from gpt-4o-mini-2024-07-18 | Yes | Source | |
| GPT 4o Mini Search Preview | 11 Mar 2025 | 8.70% | inferred modality/version alias from gpt-4o-mini-2024-07-18 | Yes | Source | |
| GPT 4o Mini Audio Preview | 17 Dec 2024 | 8.70% | inferred modality/version alias from gpt-4o-mini-2024-07-18 | Yes | Source | |
| GPT 4o Mini Transcribe (2025-03-20) | 20 Mar 2025 | 8.70% | inferred modality/version alias from gpt-4o-mini-2024-07-18 | Yes | Source | |
| GPT 4o Mini Realtime Preview | 17 Dec 2024 | 8.70% | inferred modality/version alias from gpt-4o-mini-2024-07-18 | Yes | Source | |
| GPT 4o Mini Transcribe (2025-12-15) | 15 Dec 2025 | 8.70% | inferred modality/version alias from gpt-4o-mini-2024-07-18 | Yes | Source | |
| GPT 4o Mini TTS (2025-12-15) | 15 Dec 2025 | 8.70% | inferred modality/version alias from gpt-4o-mini-2024-07-18 | Yes | Source |