Individual benchmark scores plotted by date.
| Organisation | Model | Reported | Top Score | Info | Self Reported | Source |
|---|---|---|---|---|---|---|
| GPT 5 | 07 Aug 2025 | 8.60 | Medium Reasoning Effort | No | Source | |
| Kimi K2 (2025-09-05) | 05 Sept 2025 | 8.56 | - | No | Source | |
| Claude Opus 4.1 | 05 Aug 2025 | 8.47 | No Reasoning | No | Source | |
| GPT 5 Mini | 07 Aug 2025 | 8.31 | - | No | Source | |
| Qwen 3 235B A22B Thinking 2507 | - | 8.24 | - | No | Source | |
| Grok 4 | 10 Jul 2025 | 7.69 | - | No | Source | |
| GLM 4.5 | 28 Jul 2025 | 7.34 | - | No | Source |