Individual benchmark scores plotted by date.
| Organisation | Model | Reported | Top Score | Info | Self Reported | Source |
|---|---|---|---|---|---|---|
| GLM 4.5 | 28 Jul 2025 | 98.20% | - | Yes | Source | |
| GLM 4.5 Air | 28 Jul 2025 | 98.10% | - | Yes | Source | |
| Nvidia Nemotron Nano 12B V2 | - | 97.80% | inferred high-confidence family alias from nvidia-nemotron-nano-9b-v2 (score=0.4889; benches=6) | Yes | Source | |
| Nvidia Nemotron Nano 9B V2 | - | 97.80% | - | Yes | Source | |
| Kimi K2 (2025-09-05) | 05 Sept 2025 | 97.40% | Acc | Yes | Source | |
| Llama 3.1 Nemotron Ultra 253B v1 | 07 Apr 2025 | 97% | - | Yes | Source | |
| MiniMax M1 80K | 16 Jun 2025 | 96.80% | - | Yes | - | |
| Llama 3.3 Nemotron Super 49B v1 | 18 Mar 2025 | 96.60% | - | Yes | Source | |
| Llama 3.3 Nemotron Super 49B V1.5 | - | 96.60% | inferred version-family alias from llama-3.3-nemotron-super-49b-v1 | Yes | Source | |
| Longcat Flash Cat | - | 96.40% | inferred high-confidence family alias from longcat-flash-chat (score=0.4667; benches=16) | Yes | Source | |
| Kimi K1.5 | 20 Jan 2025 | 96.20% | - | Yes | Source | |
| Minimax M1 40K | 16 Jun 2025 | 96% | - | Yes | - | |
| Llama 3.1 Nemotron Nano 4B V1.1 | - | 95.40% | inferred high-confidence family alias from llama-3.1-nemotron-nano-8b-v1 (score=0.5523; benches=7) | Yes | Source | |
| Llama 3.1 Nemotron Nano 8B V1 | 18 Mar 2025 | 95.40% | - | Yes | Source | |
| Phi 4 Mini Flash Reasoning | - | 94.60% | inferred modality/version alias from phi-4-mini-reasoning | Yes | Source | |
| Phi 4 Mini Reasoning | 30 Apr 2025 | 94.60% | - | Yes | Source | |
| QwQ 32B Preview | - | 90.60% | - | Yes | Source | |
| QwQ 32B | - | 90.60% | - | Yes | Source | |
| DeepSeek V4 | - | 90.20% | inferred high-confidence family alias from deepseek-v3 (score=0.5818; benches=20) | Yes | Source | |
| DeepSeek OCR | 20 Oct 2025 | 90.20% | inferred family alias from deepseek-v3 (score=0.3000; benches=20) | Yes | Source | |
| DeepSeek V2 (2024-06-28) | 28 Jun 2024 | 90.20% | inferred family alias from deepseek-v3 (score=0.4159; benches=20) | Yes | Source | |
| o1 mini | 12 Sept 2024 | 90% | - | Yes | Source | |
| Granite 3.3 2B Instruct | 16 Apr 2025 | 69.02% | inferred family alias from granite-3.3-8b-instruct (score=0.3627; benches=14) | Yes | Source | |
| Granite 3.1 8B Instruct | - | 69.02% | inferred high-confidence family alias from granite-3.3-8b-instruct (score=0.4911; benches=14) | Yes | Source | |
| Granite 3.2 8B Instruct | - | 69.02% | inferred high-confidence family alias from granite-3.3-8b-instruct (score=0.4911; benches=14) | Yes | Source | |
| Granite Guardian 3.1 8B | - | 69.02% | inferred family alias from granite-3.3-8b-instruct (score=0.4062; benches=14) | Yes | Source | |
| Granite 3.2 8B Instruct Preview | - | 69.02% | inferred high-confidence family alias from granite-3.3-8b-instruct (score=0.4687; benches=14) | Yes | Source | |
| Granite 3.3 8B Instruct | 16 Apr 2025 | 69.02% | - | Yes | Source | |
| Granite Guardian 3.3 8B | - | 69.02% | inferred high-confidence family alias from granite-3.3-8b-instruct (score=0.5071; benches=14) | Yes | Source | |
| Granite 3.0 8B Instruct | - | 69.02% | inferred high-confidence family alias from granite-3.3-8b-instruct (score=0.4911; benches=14) | Yes | Source | |
| Granite Guardian 3.0 8B | - | 69.02% | inferred family alias from granite-3.3-8b-instruct (score=0.4062; benches=14) | Yes | Source | |
| Granite Speech 3.2 8B | - | 69.02% | inferred family alias from granite-3.3-8b-instruct (score=0.4062; benches=14) | Yes | Source | |
| Granite Speech 3.3 8B | - | 69.02% | inferred high-confidence family alias from granite-3.3-8b-instruct (score=0.5071; benches=14) | Yes | Source |