Loading
AI Stats is fetching the latest data for this page. This usually only takes a moment.
If this screen doesn't disappear after a short while, you can refresh the page or use one of the links above to continue.
Individual benchmark scores plotted by date.
| Organisation | Model | Reported | Top Score | Info | Self Reported | Source |
|---|---|---|---|---|---|---|
| Qwen3 235B A22B Thinking 2507 | 25 Jul 2025 | 0.84 | - | Yes | Source | |
| Grok 3 | 18 Apr 2025 | 0.83 | - | Yes | Source | |
| Qwen3 A235 A22B Instruct 2507 | 21 Jul 2025 | 0.83 | - | Yes | Source | |
| Grok 3 Mini | 18 Apr 2025 | 0.83 | High Reasoning Effort | Yes | Source | |
| EXAONE 4.0 32B | 15 Jul 2025 | 0.82 | Reasoning | Yes | Source | |
| Kimi K2 Instruct | 11 Jul 2025 | 0.81 | EM | Yes | Source | |
| Grok 3 Beta | 19 Feb 2025 | 0.80 | - | Yes | Source | |
| Grok 3 Mini Beta | 19 Feb 2025 | 0.79 | - | Yes | Source | |
| Kimi K2 Base | 11 Jul 2025 | 0.69 | EM | Yes | Source | |
| Mistral Small 3.2 | 20 Jun 2025 | 0.69 | 5 Shot CoT | Yes | Source | |
| EXAONE 4.0 1.2B | 15 Jul 2025 | 0.59 | Reasoning | Yes | Source |