Individual benchmark scores plotted by date.
| Organisation | Model | Reported | Top Score | Info | Self Reported | Source |
|---|---|---|---|---|---|---|
| Llama 2 70B Chat | 20 Jun 2023 | 77.30 | LLM Stats (ZeroEval) | inferred family alias from llama-3.3-70b-instruct (score=0.3129; benches=9) | Yes | Source | |
| Llama 3.1 Nemotron Ultra 253B v1 | 07 Apr 2025 | 74.10 | LLM Stats (ZeroEval) | Yes | Source | |
| Llama 3.3 Nemotron Super 49B v1 | 18 Mar 2025 | 73.70 | LLM Stats (ZeroEval) | Yes | Source | |
| Llama 3.3 Nemotron Super 49B V1.5 | - | 73.70 | LLM Stats (ZeroEval) | inferred version-family alias from llama-3.3-nemotron-super-49b-v1 | Yes | Source | |
| Llama 3.1 Nemotron Nano 8B V1 | 18 Mar 2025 | 63.60 | LLM Stats (ZeroEval) | Yes | Source | |
| Llama 3.1 Nemotron Nano 4B V1.1 | - | 63.60 | LLM Stats (ZeroEval) | inferred high-confidence family alias from llama-3.1-nemotron-nano-8b-v1 (score=0.5523; benches=7) | Yes | Source |