Individual benchmark scores plotted by date.
| Organisation | Model | Reported | Top Score | Info | Self Reported | Source |
|---|---|---|---|---|---|---|
| Phi 4 Reasoning | 30 Apr 2025 | 92.90% | - | Yes | Source | |
| Phi 4 Reasoning Plus | 30 Apr 2025 | 92.30% | - | Yes | Source | |
| Granite 3.2 8B Instruct | - | 86.09% | inferred high-confidence family alias from granite-3.3-8b-instruct (score=0.4911; benches=14) | Yes | Source | |
| Granite 3.0 8B Instruct | - | 86.09% | inferred high-confidence family alias from granite-3.3-8b-instruct (score=0.4911; benches=14) | Yes | Source | |
| Granite 3.3 2B Instruct | 16 Apr 2025 | 86.09% | inferred family alias from granite-3.3-8b-instruct (score=0.3627; benches=14) | Yes | Source | |
| Granite 3.3 8B Instruct | 16 Apr 2025 | 86.09% | - | Yes | Source | |
| Granite Guardian 3.0 8B | - | 86.09% | inferred family alias from granite-3.3-8b-instruct (score=0.4062; benches=14) | Yes | Source | |
| Granite 3.1 8B Instruct | - | 86.09% | inferred high-confidence family alias from granite-3.3-8b-instruct (score=0.4911; benches=14) | Yes | Source | |
| Granite Guardian 3.3 8B | - | 86.09% | inferred high-confidence family alias from granite-3.3-8b-instruct (score=0.5071; benches=14) | Yes | Source | |
| Granite Speech 3.3 8B | - | 86.09% | inferred high-confidence family alias from granite-3.3-8b-instruct (score=0.5071; benches=14) | Yes | Source | |
| Granite Speech 3.2 8B | - | 86.09% | inferred family alias from granite-3.3-8b-instruct (score=0.4062; benches=14) | Yes | Source | |
| Granite 3.2 8B Instruct Preview | - | 86.09% | inferred high-confidence family alias from granite-3.3-8b-instruct (score=0.4687; benches=14) | Yes | Source | |
| Granite Guardian 3.1 8B | - | 86.09% | inferred family alias from granite-3.3-8b-instruct (score=0.4062; benches=14) | Yes | Source | |
| Granite 4.1 30B | 29 Apr 2026 | 85.37% | pass@1 | Yes | Source | |
| Phi 1 | - | 82.80% | inferred family alias from phi-4 (score=0.3100; benches=13) | Yes | Source | |
| Phi 2 | - | 82.80% | inferred family alias from phi-4 (score=0.3100; benches=13) | Yes | Source | |
| Phi 4 | 12 Dec 2024 | 82.80% | - | Yes | Source | |
| Granite 4.1 8B | 29 Apr 2026 | 79.88% | pass@1 | Yes | Source | |
| Granite 4.0 Tiny Preview | 02 May 2025 | 78.30% | - | Yes | Source | |
| Granite 4.0 Tiny | 02 Oct 2025 | 78.30% | inferred alias from granite-4.0-tiny-preview | Yes | Source | |
| Granite 4.0 Micro | 02 Oct 2025 | 78.30% | inferred high-confidence family alias from granite-4.0-tiny-preview (score=0.4700; benches=12) | Yes | Source | |
| Granite 4.0 Small | 02 Oct 2025 | 78.30% | inferred high-confidence family alias from granite-4.0-tiny-preview (score=0.4700; benches=12) | Yes | Source | |
| Granite 4.1 3B | 29 Apr 2026 | 76.83% | pass@1 | Yes | Source | |
| Qwen 14B | - | 51.20% | inferred family alias from qwen-2.5-14b-instruct (score=0.3060; benches=16) | Yes | Source | |
| Ernie 4.5 21B A3B | - | 25% | inferred version-family alias from ernie-4.5 | Yes | Source | |
| Ernie 4.5 VL 28B A3B | - | 25% | inferred version-family alias from ernie-4.5 | Yes | Source | |
| Ernie 4.5 VL 424B A47B | - | 25% | inferred version-family alias from ernie-4.5 | Yes | Source | |
| Ernie 4.5 300B A47B | - | 25% | inferred version-family alias from ernie-4.5 | Yes | Source | |
| Ernie 4.5 Turbo | - | 25% | inferred version-family alias from ernie-4.5 | Yes | Source | |
| Ernie 4.5 21B A3B Thinking | - | 25% | inferred version-family alias from ernie-4.5 | Yes | Source |