Loading
AI Stats is fetching the latest data for this page. This usually only takes a moment.
If this screen doesn't disappear after a short while, you can refresh the page or use one of the links above to continue.
Individual benchmark scores plotted by date.
| Organisation | Model | Reported | Top Score | Info | Self Reported | Source |
|---|---|---|---|---|---|---|
| GPT OSS 120b | 05 Aug 2025 | 0.90 | High Reasoning Effort | Yes | Source | |
| Kimi K2 Instruct | 11 Jul 2025 | 0.90 | EM | Yes | Source | |
| Gemini 2.5 Pro Preview (2025-06-05) | 05 Jun 2025 | 0.89 | - | Yes | Source | |
| Gemini 2.5 Pro Preview (2025-05-06) | 06 May 2025 | 0.89 | Lite | Yes | Source | |
| Gemini 2.0 Flash | 05 Feb 2025 | 0.88 | Lite | Yes | Source | |
| Gemini 2.5 Flash Preview (2025-05-20) | 20 May 2025 | 0.88 | Lite | Yes | Source | |
| Gemini 2.5 Flash Preview (2025-04-17) | 17 Apr 2025 | 0.88 | Thinking, Lite | Yes | Source | |
| Kimi K2 Base | 11 Jul 2025 | 0.88 | EM | Yes | Source | |
| GPT OSS 20b | 05 Aug 2025 | 0.85 | High Reasoning Effort | Yes | Source | |
| Gemini 1.0 Ultra | 06 Dec 2023 | 0.84 | - | No | Source | |
| Mistral Small 3.2 | 20 Jun 2025 | 0.81 | - | Yes | Source | |
| Grok 1 | 03 Nov 2023 | 0.73 | 5 Shot | Yes | Source | |
| Gemini Diffusion | 20 May 2025 | 0.69 | Pass@1 | Yes | Source | |
| Grok 0 | - | 0.66 | 5 Shot | Yes | Source |