IFBench

Leading Model

81.20% - Grok 4.20

IFBench - Benchmark Leaderboard & Model Performance | AI Stats

Models Using This Benchmark

Organisation	Model	Reported	Top Score	Info	Self Reported	Source
SpaceXAI	Grok 4.20	17 Feb 2026	81.20%	Artificial Analysis structured model metrics	No	Source
SpaceXAI	Grok 4.3	30 Apr 2026	81%	-	No	Source
Qwen	Qwen 3.7 Max	21 May 2026	79.10%	-	Yes	Source
Qwen	Qwen 3.5 Flash	23 Feb 2026	76.50%	inferred family alias from qwen3.5-27b (score=0.4147; benches=81)	Yes	Source
Qwen	Qwen 3.5 397B A17B	16 Feb 2026	76.50%	-	Yes	Source
Qwen	Qwen 3.5 27B	24 Feb 2026	76.50%	-	Yes	Source
Qwen	Qwen 3.5 122B A10B	24 Feb 2026	76.10%	-	Yes	Source
Qwen	Qwen 3.6 Plus	01 Apr 2026	74.20%	-	Yes	Source
z.AI	GLM 5 Turbo	15 Mar 2026	73.20%	Artificial Analysis structured model metrics	No	Source
Nvidia	Nemotron 3 Super	11 Mar 2026	72.56%	-	Yes	Source
Upstage	Solar Pro 3 (2026-01-26)	26 Jan 2026	71.20%	Artificial Analysis structured model metrics	No	Source
Inception	Mercury 2	24 Feb 2026	71%	-	Yes	Source
Qwen	Qwen 3.5 35B A3B	24 Feb 2026	70.20%	-	Yes	Source
MiniMax	MiniMax M2.1	23 Dec 2025	70%	Instruction following benchmark	Yes	Source
LG	K EXAONE	31 Dec 2025	67.30%	inferred modality/version alias from k-exaone-236b-a23b	Yes	Source
Qwen	Qwen 3.5 9B	02 Mar 2026	64.50%	-	Yes	Source
Qwen	Qwen 3.5 4B	02 Mar 2026	59.20%	-	Yes	Source
Arcee AI	Trinity Large Thinking	01 Apr 2026	52.30%	Hugging Face model card benchmark table (arcee-ai/Trinity-Large-Thinking)	Yes	Source
Mistral	Mistral Small 1.0	26 Feb 2024	48%	inferred family alias from mistral-small-latest (score=0.3650; benches=9)	Yes	Source
Mistral	Mistral Small Creative	16 Dec 2025	48%	inferred family alias from mistral-small-latest (score=0.4273; benches=9)	Yes	Source
Mistral	Mistral Small 2.0	17 Sept 2024	48%	inferred family alias from mistral-small-latest (score=0.3650; benches=9)	Yes	Source
Mistral	Mistral Small 4	16 Mar 2026	48%	-	Yes	Source
Qwen	Qwen 3.5 2B	02 Mar 2026	41.30%	-	Yes	Source
Qwen	Qwen 3.5 0.8B	02 Mar 2026	21%	-	Yes	Source

Average Score

Score Range

Leading Model

Recorded Results

Average Score

Score Range

Leading Model

Models Using This Benchmark