Individual benchmark scores plotted by date.
| Organisation | Model | Reported | Top Score | Info | Self Reported | Source |
|---|---|---|---|---|---|---|
| Gemini 3.1 Pro Preview | 19 Feb 2026 | 92.60% | No Tools | Yes | - | |
| Gemini 3 Flash Preview | 17 Dec 2025 | 91.80% | - | Yes | Source | |
| Gemini 3 Pro Preview | 18 Nov 2025 | 91.80% | - | Yes | Source | |
| Gemini 3 Pro Image Preview (Nano Banana Pro) | 20 Nov 2025 | 91.80% | inferred modality/version alias from gemini-3-pro-preview | Yes | Source | |
| Claude Opus 4.7 | 16 Apr 2026 | 91.50% | - | Yes | Source | |
| Claude Opus 4.6 | 05 Feb 2026 | 91.10% | - | Yes | Source | |
| Claude Opus 4.5 | 24 Nov 2025 | 90.77% | Avg@5, 64k Thinking | Yes | Source | |
| GPT 5.2 | 11 Dec 2025 | 89.60% | - | Yes | Source | |
| GPT 5.2 Chat | 11 Dec 2025 | 89.60% | inferred alias from gpt-5.2-2025-12-11 | Yes | Source | |
| Claude Opus 4.1 | 05 Aug 2025 | 89.50% | - | Yes | Source | |
| Qwen 3.6 Plus | 01 Apr 2026 | 89.50% | - | Yes | Source | |
| Claude Sonnet 4.6 | 17 Feb 2026 | 89.30% | - | Yes | Source | |
| Gemini 3.1 Flash Lite Preview | 03 Mar 2026 | 88.90% | - | Yes | Source | |
| Gemini 3.1 Flash Image Preview (Nano Banana 2) | 26 Feb 2026 | 88.90% | inferred modality/version alias from gemini-3.1-flash-lite-preview | Yes | Source | |
| Veo 3.1 Lite Preview | 31 Mar 2026 | 88.90% | manual fallback alias from gemini-3.1-flash-lite-preview | Yes | Source | |
| Qwen 3.5 397B A17B | 16 Feb 2026 | 88.50% | - | Yes | Source | |
| Gemma 4 31B | 02 Apr 2026 | 88.40% | - | Yes | Source | |
| Seed 2.0 Pro | 14 Feb 2026 | 88.10% | Seed2 official benchmark table | MMMLU | Yes | Source | |
| Seed 2.0 Lite | 14 Feb 2026 | 87.70% | Seed2 official benchmark table | MMMLU | Yes | Source | |
| Qwen 3.5 122B A10B | 24 Feb 2026 | 86.70% | - | Yes | Source | |
| Qwen 3 235B A22B | - | 86.70% | - | Yes | Source | |
| Gemma 4 26B A4B | 02 Apr 2026 | 86.30% | - | Yes | Source | |
| Qwen 3.5 27B | 24 Feb 2026 | 85.90% | - | Yes | Source | |
| Qwen 3.5 Flash | 23 Feb 2026 | 85.90% | inferred family alias from qwen3.5-27b (score=0.4147; benches=81) | Yes | Source | |
| K EXAONE | 31 Dec 2025 | 85.70% | inferred modality/version alias from k-exaone-236b-a23b | Yes | Source | |
| Mistral Large 1.0 | 26 Feb 2024 | 85.50% | inferred family alias from mistral-large-latest (score=0.3650; benches=5) | Yes | Source | |
| Mistral Large 3.0 | 02 Dec 2025 | 85.50% | 8-Lang Average | Yes | Source | |
| Qwen 3.5 35B A3B | 24 Feb 2026 | 85.20% | - | Yes | Source | |
| GPT 4.5 | 27 Feb 2025 | 85.10% | - | Yes | Source | |
| Claude Haiku 4.5 | 15 Oct 2025 | 83% | inferred alias from claude-haiku-4-5-20251001 | Yes | Source | |
| Seed 2.0 Mini | 14 Feb 2026 | 81.60% | Seed2 official benchmark table | MMMLU | Yes | Source | |
| GPT 4o Search Preview | 11 Mar 2025 | 81.40% | inferred modality/version alias from gpt-4o-2024-08-06 | Yes | Source | |
| GPT 4o Realtime Preview (2024-10-01) | 01 Oct 2024 | 81.40% | inferred modality/version alias from gpt-4o-2024-08-06 | Yes | Source | |
| GPT 4o Transcribe Diarize | 15 Oct 2025 | 81.40% | inferred modality/version alias from gpt-4o-2024-08-06 | Yes | Source | |
| GPT 4o Transcribe | 20 Mar 2025 | 81.40% | inferred modality/version alias from gpt-4o-2024-08-06 | Yes | Source | |
| GPT 4o Audio (2024-12-17) | 17 Dec 2024 | 81.40% | inferred modality/version alias from gpt-4o-2024-08-06 | Yes | Source | |
| GPT 4o Realtime Preview (2025-06-03) | 03 Jun 2025 | 81.40% | inferred modality/version alias from gpt-4o-2024-08-06 | Yes | Source | |
| GPT 4o (2024-08-06) | 06 Aug 2024 | 81.40% | - | Yes | Source | |
| GPT 4o Audio (2025-06-03) | 03 Jun 2025 | 81.40% | inferred modality/version alias from gpt-4o-2024-08-06 | Yes | Source | |
| GPT 4o Audio (2024-10-01) | 01 Oct 2024 | 81.40% | inferred modality/version alias from gpt-4o-2024-08-06 | Yes | Source | |
| GPT OSS 120b | 05 Aug 2025 | 81.30% | High Reasoning Effort, Average | Yes | Source | |
| Qwen 3.5 9B | 02 Mar 2026 | 81.20% | - | Yes | Source | |
| Qwen 3.5 4B | 02 Mar 2026 | 76.10% | - | Yes | Source | |
| GPT OSS 20b | 05 Aug 2025 | 75.70% | High Reasoning Effort, Average | Yes | Source | |
| Phi 3.5 MoE instruct | 23 Aug 2024 | 69.90% | - | Yes | Source | |
| Qwen 3.5 2B | 02 Mar 2026 | 63.10% | - | Yes | Source | |
| Phi 3 Mini 128K Instruct | - | 55.40% | inferred family alias from phi-3.5-mini-instruct (score=0.3533; benches=31) | Yes | Source | |
| Phi 3.5 mini instruct | 23 Aug 2024 | 55.40% | - | Yes | Source | |
| Qwen 3.5 0.8B | 02 Mar 2026 | 44.30% | - | Yes | Source |