Individual benchmark scores plotted by date.
| Organisation | Model | Reported | Top Score | Info | Self Reported | Source |
|---|---|---|---|---|---|---|
| DeepSeek V3.2 | 01 Dec 2025 | 52% | Reasoning | No | Source | |
| Gemini 2.5 Pro Preview (2025-06-05) | 05 Jun 2025 | 52% | - | No | Source | |
| GLM 4.6 | 30 Sept 2025 | 52% | - | No | Source | |
| Gemini 3 Pro Preview | 18 Nov 2025 | 23.40% | - | No | Source | |
| Gemini 3 Pro Image Preview (Nano Banana Pro) | 20 Nov 2025 | 23.40% | inferred modality/version alias from gemini-3-pro-preview | Yes | Source | |
| Qwen 3 235B A22B Thinking 2507 | - | 5.21% | - | No | Source | |
| Grok 4 Fast Reasoning | 20 Sept 2025 | 5.21% | - | No | Source | |
| Grok 4 | 10 Jul 2025 | 2.08% | - | No | Source | |
| GPT 5 | 07 Aug 2025 | 2.08% | High Reasoning Effort + Agent | No | Source | |
| Claude Sonnet 4.5 | 29 Sept 2025 | 1.56% | - | No | Source | |
| GLM 4.5 | 28 Jul 2025 | 1.04% | - | No | Source | |
| GPT 5 Mini | 07 Aug 2025 | 1.04% | High Reasoning Effort | No | Source | |
| GPT OSS 120b | 05 Aug 2025 | 1.04% | High Reasoning Effort | No | Source | |
| GPT 5.1 | 12 Nov 2025 | 1.04% | High Reasoning Effort | No | Source |