Individual benchmark scores plotted by date.
| Organisation | Model | Reported | Top Score | Info | Self Reported | Source |
|---|---|---|---|---|---|---|
| Llama 3.3 Nemotron Super 49B v1 | 18 Mar 2025 | 0.91 | LLM Stats (ZeroEval) | Yes | Source | |
| Llama 3.3 Nemotron Super 49B V1.5 | - | 0.91 | LLM Stats (ZeroEval) | inferred version-family alias from llama-3.3-nemotron-super-49b-v1 | Yes | Source | |
| Qwen 72B | - | 0.88 | LLM Stats (ZeroEval) | inferred family alias from qwen-2.5-72b-instruct (score=0.3060; benches=14) | Yes | Source | |
| Llama 3.1 Nemotron Nano 8B V1 | 18 Mar 2025 | 0.85 | LLM Stats (ZeroEval) | Yes | Source | |
| Llama 3.1 Nemotron Nano 4B V1.1 | - | 0.85 | LLM Stats (ZeroEval) | inferred high-confidence family alias from llama-3.1-nemotron-nano-8b-v1 (score=0.5523; benches=7) | Yes | Source | |
| Qwen 2.5 VL 3B Instruct | - | 0.84 | LLM Stats (ZeroEval) | inferred high-confidence family alias from qwen2.5-vl-32b (score=0.4914; benches=28) | Yes | Source | |
| Qwen 2.5 Coder 32B Instruct | - | 0.84 | LLM Stats (ZeroEval) | inferred high-confidence family alias from qwen2.5-vl-32b (score=0.4641; benches=28) | Yes | Source | |
| Qwen 2.5 VL 32B Instruct | - | 0.84 | LLM Stats (ZeroEval) | Yes | Source | |
| Qwen 14B | - | 0.82 | LLM Stats (ZeroEval) | inferred family alias from qwen-2.5-14b-instruct (score=0.3060; benches=16) | Yes | Source | |
| Qwen 3 235B A22B | - | 0.81 | LLM Stats (ZeroEval) | Yes | Source | |
| Phi 3.5 MoE instruct | 23 Aug 2024 | 0.81 | LLM Stats (ZeroEval) | Yes | Source | |
| Qwen 2 Math 72B | - | 0.80 | LLM Stats (ZeroEval) | inferred high-confidence family alias from qwen2-72b-instruct (score=0.4667; benches=17) | Yes | Source | |
| Qwen 2 Math RM 72B | - | 0.80 | LLM Stats (ZeroEval) | inferred family alias from qwen2-72b-instruct (score=0.3917; benches=17) | Yes | Source | |
| Qwen 7B | - | 0.79 | LLM Stats (ZeroEval) | inferred family alias from qwen-2.5-7b-instruct (score=0.3083; benches=14) | Yes | Source | |
| Codestral (2025-01-13) | 13 Jan 2025 | 0.78 | LLM Stats (ZeroEval) | inferred modality/version alias from codestral-22b | Yes | Source | |
| Codestral (2024-05-29) | 29 May 2024 | 0.78 | LLM Stats (ZeroEval) | inferred modality/version alias from codestral-22b | Yes | Source | |
| Codestral (2025-07-30) | 30 Jul 2025 | 0.78 | LLM Stats (ZeroEval) | inferred version-family alias from codestral-22b | Yes | Source | |
| Llama 4 Maverick | 05 Apr 2025 | 0.78 | LLM Stats (ZeroEval) | Yes | Source | |
| Gemini Diffusion | 20 May 2025 | 0.76 | LLM Stats (ZeroEval) | Yes | Source | |
| Qwen 2.5 Omni 7B | - | 0.73 | LLM Stats (ZeroEval) | Yes | Source | |
| Qwen 2.5 Coder 7B | - | 0.73 | LLM Stats (ZeroEval) | inferred high-confidence family alias from qwen2.5-omni-7b (score=0.4700; benches=45) | Yes | Source | |
| Qwen 2.5 Coder 3B | - | 0.73 | LLM Stats (ZeroEval) | inferred family alias from qwen2.5-omni-7b (score=0.3000; benches=45) | Yes | Source | |
| Qwen 2.5 Math 7B | - | 0.73 | LLM Stats (ZeroEval) | inferred high-confidence family alias from qwen2.5-omni-7b (score=0.4767; benches=45) | Yes | Source | |
| Qwen 2.5 Math 7B PRM800K | - | 0.73 | LLM Stats (ZeroEval) | inferred family alias from qwen2.5-omni-7b (score=0.3696; benches=45) | Yes | Source | |
| Qwen 2.5 Math PRM 7B | - | 0.73 | LLM Stats (ZeroEval) | inferred family alias from qwen2.5-omni-7b (score=0.4092; benches=45) | Yes | Source | |
| Qwen 2.5 Omni 3B | - | 0.73 | LLM Stats (ZeroEval) | inferred high-confidence family alias from qwen2.5-omni-7b (score=0.4933; benches=45) | Yes | Source | |
| Phi 3 Mini 128K Instruct | - | 0.70 | LLM Stats (ZeroEval) | inferred family alias from phi-3.5-mini-instruct (score=0.3533; benches=31) | Yes | Source | |
| Phi 3.5 mini instruct | 23 Aug 2024 | 0.70 | LLM Stats (ZeroEval) | Yes | Source | |
| Llama 4 Scout | 05 Apr 2025 | 0.68 | LLM Stats (ZeroEval) | Yes | Source | |
| Qwen 2 Audio 7B | - | 0.67 | LLM Stats (ZeroEval) | inferred modality/version alias from qwen2-7b-instruct | Yes | Source | |
| Qwen 2 Math 7B | - | 0.67 | LLM Stats (ZeroEval) | inferred high-confidence family alias from qwen2-7b-instruct (score=0.4706; benches=14) | Yes | Source |