Individual benchmark scores plotted by date.
| Organisation | Model | Reported | Top Score | Info | Self Reported | Source |
|---|---|---|---|---|---|---|
| GPT 5.5 | 23 Apr 2026 | 3.50 | xHigh reasoning; win prob 0.92; 95% CI 3.4..3.6 | No | Source | |
| GPT 5.4 | 05 Mar 2026 | 3.40 | xHigh reasoning; win prob 0.91; 95% CI 3.3..3.6 | No | Source | |
| Claude Opus 4.7 | 16 Apr 2026 | 3 | High reasoning; win prob 0.87; 95% CI 2.8..3.1 | No | Source | |
| Claude Sonnet 4.6 | 17 Feb 2026 | 2.80 | 16K thinking; win prob 0.86; 95% CI 2.7..3.0 | No | Source | |
| Claude Opus 4.6 | 05 Feb 2026 | 2.20 | 16K thinking; win prob 0.80; 95% CI 2.0..2.5 | No | Source | |
| Claude Opus 4.8 | 28 May 2026 | 1.70 | xHigh reasoning; win prob 0.75; 95% CI 1.6..1.9 | No | Source | |
| GPT 5.2 | 11 Dec 2025 | 1.50 | Medium reasoning; win prob 0.72; 95% CI 1.3..1.7 | No | Source | |
| Kimi K2.6 | 20 Apr 2026 | 1.20 | win prob 0.68; 95% CI 1.1..1.4 | No | Source | |
| Mistral Medium 3.1 | 12 Aug 2025 | 0.80 | win prob 0.63; 95% CI 0.6..1.0 | No | Source | |
| DeepSeek V4 Pro | 24 Apr 2026 | 0.60 | win prob 0.60; 95% CI 0.5..0.8 | No | Source | |
| MiMo V2.5 Pro | 22 Apr 2026 | 0.50 | win prob 0.58; 95% CI 0.3..0.6 | No | Source | |
| Qwen3 Max Preview | - | 0.40 | win prob 0.57; 95% CI 0.2..0.6 | No | Source | |
| Qwen 3.6 Max Preview | - | 0.10 | win prob 0.52; 95% CI -0.1..0.2 | No | Source | |
| GLM 5.1 | 07 Apr 2026 | -0.20 | win prob 0.48; 95% CI -0.4..0.0 | No | Source | |
| Kimi K2.5 | 27 Jan 2026 | -0.20 | win prob 0.47; 95% CI -0.4..-0.1 | No | Source | |
| MiMo V2 Pro | 18 Mar 2026 | -0.40 | win prob 0.45; 95% CI -0.7..-0.2 | No | Source | |
| Gemini 3.5 Flash | 19 May 2026 | -1.20 | win prob 0.33; 95% CI -1.3..-1.1 | No | Source | |
| Seed 2.0 Pro | 14 Feb 2026 | -1.40 | win prob 0.30; 95% CI -1.6..-1.2 | No | Source | |
| Gemini 3.1 Pro Preview | 19 Feb 2026 | -1.50 | win prob 0.29; 95% CI -1.7..-1.3 | No | Source | |
| Mistral Large 3.0 | 02 Dec 2025 | -1.60 | win prob 0.28; 95% CI -1.8..-1.3 | No | Source | |
| Gemma 4 31B | 02 Apr 2026 | -1.70 | Reasoning; win prob 0.26; 95% CI -1.9..-1.5 | No | Source | |
| Qwen 3.7 Max | 21 May 2026 | -1.80 | win prob 0.25; 95% CI -2.0..-1.6 | No | Source | |
| Mistral Medium 3.5 | 29 Apr 2026 | -1.90 | win prob 0.23; 95% CI -2.1..-1.7 | No | Source | |
| DeepSeek V3.2 | 01 Dec 2025 | -2.10 | win prob 0.21; 95% CI -2.4..-1.8 | No | Source | |
| Qwen 3.6 Plus | 01 Apr 2026 | -2.20 | win prob 0.20; 95% CI -2.5..-1.7 | No | Source | |
| MiniMax M2.7 | 18 Mar 2026 | -3.10 | win prob 0.11; 95% CI -3.3..-2.9 | No | Source | |
| GPT OSS 120b | 05 Aug 2025 | -3.20 | win prob 0.10; 95% CI -3.5..-2.9 | No | Source | |
| Grok 4.3 | 30 Apr 2026 | -3.60 | win prob 0.07; 95% CI -3.8..-3.4 | No | Source |