Individual benchmark scores plotted by date.
| Organisation | Model | Reported | Top Score | Info | Self Reported | Source |
|---|---|---|---|---|---|---|
| Step 3.5 Flash | - | 85.40% | - | Yes | Source | |
| DeepSeek V3.2 Speciale | 01 Dec 2025 | 84.50% | - | Yes | Source | |
| Qwen 3.6 Plus | 01 Apr 2026 | 83.80% | - | Yes | Source | |
| GLM 5.1 | - | 83.80% | - | Yes | Source | |
| GLM 4.7 | 22 Dec 2025 | 82% | - | Yes | Source | |
| Kimi K2.5 | 27 Jan 2026 | 81.80% | - | Yes | Source | |
| Qwen 3.5 397B A17B | 16 Feb 2026 | 80.90% | - | Yes | Source | |
| Kimi K2 Thinking | 06 Nov 2025 | 78.60% | inferred alias from kimi-k2-thinking-0905 | Yes | Source |