Individual benchmark scores plotted by date.
| Organisation | Model | Reported | Top Score | Info | Self Reported | Source |
|---|---|---|---|---|---|---|
| Claude Opus 4.5 | 24 Nov 2025 | 87.80% | Corrected | Yes | Source | |
| Nova 2 Pro | 02 Dec 2025 | 77.70% | - | Yes | Source | |
| Nova 2 Lite | 02 Dec 2025 | 64.80% | - | Yes | Source | |
| GPT 5 | 07 Aug 2025 | 62.60% | With Thinking, Pass @ 1 | Yes | Source | |
| GPT 5 Mini | 07 Aug 2025 | 60% | High Reasoning Effort | Yes | Source | |
| Qwen 3 235B A22B Thinking 2507 | - | 58% | - | Yes | Source | |
| Kimi K2 (2025-09-05) | 05 Sept 2025 | 56.50% | Avg@4 | Yes | Source | |
| GPT 5 Nano | 07 Aug 2025 | 41% | High Reasoning Effort | Yes | Source |