Individual benchmark scores plotted by date.
| Organisation | Model | Reported | Top Score | Info | Self Reported | Source |
|---|---|---|---|---|---|---|
| Qwen 3.5 Plus | 16 Feb 2026 | 81.60 | DeepPlanning v1.1 | Alibaba/Qwen-3.5-Plus (w/o thinking) | Yes | Source | |
| Qwen 3 Max Thinking | 26 Jan 2026 | 62.80 | DeepPlanning v1.1 | Alibaba/Qwen3-Max (w/ thinking) | Yes | Source | |
| Grok 4 Fast Non Reasoning | 20 Sept 2025 | 29.60 | DeepPlanning v1.1 | xAI/Grok-4.1-fast (non-reasoning) | Yes | Source |