Arena Hard

Arena Hard - Benchmark Leaderboard & Model Performance | AI Stats

Models Using This Benchmark

Organisation	Model	Reported	Top Score	Info	Self Reported	Source
Qwen	Qwen 3 235B A22B	-	95.60%	-	Yes	Source
Qwen	Qwen 3 32B	-	93.80%	-	Yes	Source
Qwen	Qwen 3 30B A3B	-	91%	-	Yes	Source
Qwen	Qwen 3 30B A3B Thinking 2507	-	91%	inferred version-family alias from qwen3-30b-a3b	Yes	Source
Qwen	Qwen 3 Coder 30B A3B Instruct	-	91%	inferred high-confidence family alias from qwen3-30b-a3b (score=0.5007; benches=8)	Yes	Source
Qwen	Qwen 3 Omni 30B A3B Instruct	-	91%	inferred high-confidence family alias from qwen3-30b-a3b (score=0.4819; benches=8)	Yes	Source
Qwen	Qwen 3 Omni 30B A3B Captioner	-	91%	inferred family alias from qwen3-30b-a3b (score=0.4129; benches=8)	Yes	Source
Qwen	Qwen 3 Omni 30B A3B Thinking	-	91%	inferred high-confidence family alias from qwen3-30b-a3b (score=0.4819; benches=8)	Yes	Source
Qwen	Qwen 3 30B A3B Instruct 2507	-	91%	inferred version-family alias from qwen3-30b-a3b	Yes	Source
Nvidia	Llama 3.3 Nemotron Super 49B V1.5	-	88.30%	inferred version-family alias from llama-3.3-nemotron-super-49b-v1	Yes	Source
Nvidia	Llama 3.3 Nemotron Super 49B v1	18 Mar 2025	88.30%	-	Yes	Source
Qwen	Qwen 72B	-	81.20%	inferred family alias from qwen-2.5-72b-instruct (score=0.3060; benches=14)	Yes	Source
Microsoft	Phi 4 Reasoning Plus	30 Apr 2025	79%	-	Yes	Source
DeepSeek	DeepSeek V2.5 (2024-12-10)	10 Dec 2024	76.20%	inferred alias from deepseek-v2.5	Yes	Source
DeepSeek	DeepSeek V2.5 (2024-09-05)	05 Sept 2024	76.20%	inferred alias from deepseek-v2.5	Yes	Source
Microsoft	Phi 1	-	75.40%	inferred family alias from phi-4 (score=0.3100; benches=13)	Yes	Source
Microsoft	Phi 2	-	75.40%	inferred family alias from phi-4 (score=0.3100; benches=13)	Yes	Source
Microsoft	Phi 4	12 Dec 2024	75.40%	-	Yes	Source
Microsoft	Phi 4 Reasoning	30 Apr 2025	73.30%	-	Yes	Source
IBM	Granite 4.1 30B	29 Apr 2026	71.02%	-	Yes	Source
IBM	Granite 4.1 8B	29 Apr 2026	68.98%	-	Yes	Source
Mistral	Mistral Small 1.0	26 Feb 2024	58.30%	inferred family alias from mistral-small-latest (score=0.3650; benches=9)	Yes	Source
Mistral	Mistral Small 4	16 Mar 2026	58.30%	-	Yes	Source
Mistral	Mistral Small 2.0	17 Sept 2024	58.30%	inferred family alias from mistral-small-latest (score=0.3650; benches=9)	Yes	Source
Mistral	Mistral Small Creative	16 Dec 2025	58.30%	inferred family alias from mistral-small-latest (score=0.4273; benches=9)	Yes	Source
IBM	Granite 3.3 8B Instruct	16 Apr 2025	57.56%	-	Yes	Source
IBM	Granite Speech 3.3 8B	-	57.56%	inferred high-confidence family alias from granite-3.3-8b-instruct (score=0.5071; benches=14)	Yes	Source
IBM	Granite 3.2 8B Instruct	-	57.56%	inferred high-confidence family alias from granite-3.3-8b-instruct (score=0.4911; benches=14)	Yes	Source
IBM	Granite 3.1 8B Instruct	-	57.56%	inferred high-confidence family alias from granite-3.3-8b-instruct (score=0.4911; benches=14)	Yes	Source
IBM	Granite 3.3 2B Instruct	16 Apr 2025	57.56%	inferred family alias from granite-3.3-8b-instruct (score=0.3627; benches=14)	Yes	Source
IBM	Granite Guardian 3.1 8B	-	57.56%	inferred family alias from granite-3.3-8b-instruct (score=0.4062; benches=14)	Yes	Source
IBM	Granite Speech 3.2 8B	-	57.56%	inferred family alias from granite-3.3-8b-instruct (score=0.4062; benches=14)	Yes	Source
IBM	Granite 3.2 8B Instruct Preview	-	57.56%	inferred high-confidence family alias from granite-3.3-8b-instruct (score=0.4687; benches=14)	Yes	Source
IBM	Granite 3.0 8B Instruct	-	57.56%	inferred high-confidence family alias from granite-3.3-8b-instruct (score=0.4911; benches=14)	Yes	Source
IBM	Granite Guardian 3.0 8B	-	57.56%	inferred family alias from granite-3.3-8b-instruct (score=0.4062; benches=14)	Yes	Source
IBM	Granite Guardian 3.3 8B	-	57.56%	inferred high-confidence family alias from granite-3.3-8b-instruct (score=0.5071; benches=14)	Yes	Source
Mistral	Ministral 3.0 14B	02 Dec 2025	55.10%	-	Yes	Source
Qwen	Qwen 7B	-	52%	inferred family alias from qwen-2.5-7b-instruct (score=0.3083; benches=14)	Yes	Source
Mistral	Ministral 3.0 8B	02 Dec 2025	50.90%	-	Yes	Source
Microsoft	Phi 3.5 MoE instruct	23 Aug 2024	37.90%	-	Yes	Source
IBM	Granite 4.1 3B	29 Apr 2026	37.80%	-	Yes	Source
Microsoft	Phi 3.5 mini instruct	23 Aug 2024	37%	-	Yes	Source
Microsoft	Phi 3 Mini 128K Instruct	-	37%	inferred family alias from phi-3.5-mini-instruct (score=0.3533; benches=31)	Yes	Source
Microsoft	Phi 4 Mini	01 Feb 2025	32.80%	-	Yes	Source
Mistral	Ministral 3.0 3B	02 Dec 2025	30.50%	-	Yes	Source
IBM	Granite 4.0 Tiny Preview	02 May 2025	26.70%	-	Yes	Source
IBM	Granite 4.0 Small	02 Oct 2025	26.70%	inferred high-confidence family alias from granite-4.0-tiny-preview (score=0.4700; benches=12)	Yes	Source
IBM	Granite 4.0 Micro	02 Oct 2025	26.70%	inferred high-confidence family alias from granite-4.0-tiny-preview (score=0.4700; benches=12)	Yes	Source
IBM	Granite 4.0 Tiny	02 Oct 2025	26.70%	inferred alias from granite-4.0-tiny-preview	Yes	Source

Recorded Results

Average Score

Score Range

Leading Model

Models Using This Benchmark