TruthfulQA

Leading Model

77.50% - Phi 3.5 MoE instruct

TruthfulQA - Benchmark Leaderboard & Model Performance | AI Stats

Models Using This Benchmark

Organisation	Model	Reported	Top Score	Info	Self Reported	Source
Microsoft	Phi 3.5 MoE instruct	23 Aug 2024	77.50%	-	Yes	Source
IBM	Granite 3.0 8B Instruct	-	66.86%	inferred high-confidence family alias from granite-3.3-8b-instruct (score=0.4911; benches=14)	Yes	Source
IBM	Granite Guardian 3.0 8B	-	66.86%	inferred family alias from granite-3.3-8b-instruct (score=0.4062; benches=14)	Yes	Source
IBM	Granite Guardian 3.3 8B	-	66.86%	inferred high-confidence family alias from granite-3.3-8b-instruct (score=0.5071; benches=14)	Yes	Source
IBM	Granite 3.3 2B Instruct	16 Apr 2025	66.86%	inferred family alias from granite-3.3-8b-instruct (score=0.3627; benches=14)	Yes	Source
IBM	Granite Guardian 3.1 8B	-	66.86%	inferred family alias from granite-3.3-8b-instruct (score=0.4062; benches=14)	Yes	Source
IBM	Granite 3.2 8B Instruct Preview	-	66.86%	inferred high-confidence family alias from granite-3.3-8b-instruct (score=0.4687; benches=14)	Yes	Source
IBM	Granite 3.3 8B Instruct	16 Apr 2025	66.86%	-	Yes	Source
IBM	Granite Speech 3.2 8B	-	66.86%	inferred family alias from granite-3.3-8b-instruct (score=0.4062; benches=14)	Yes	Source
IBM	Granite 3.1 8B Instruct	-	66.86%	inferred high-confidence family alias from granite-3.3-8b-instruct (score=0.4911; benches=14)	Yes	Source
IBM	Granite Speech 3.3 8B	-	66.86%	inferred high-confidence family alias from granite-3.3-8b-instruct (score=0.5071; benches=14)	Yes	Source
IBM	Granite 3.2 8B Instruct	-	66.86%	inferred high-confidence family alias from granite-3.3-8b-instruct (score=0.4911; benches=14)	Yes	Source
Microsoft	Phi 4 Mini	01 Feb 2025	66.40%	-	Yes	Source
Microsoft	Phi 3 Mini 128K Instruct	-	64%	inferred family alias from phi-3.5-mini-instruct (score=0.3533; benches=31)	Yes	Source
Microsoft	Phi 3.5 mini instruct	23 Aug 2024	64%	-	Yes	Source
Nvidia	Llama 3.1 Nemotron 70B Instruct	01 Oct 2024	58.63%	-	Yes	Source
Qwen	Qwen 14B	-	58.40%	inferred family alias from qwen-2.5-14b-instruct (score=0.3060; benches=16)	Yes	Source
IBM	Granite 4.0 Tiny Preview	02 May 2025	58.10%	-	Yes	Source
IBM	Granite 4.0 Micro	02 Oct 2025	58.10%	inferred high-confidence family alias from granite-4.0-tiny-preview (score=0.4700; benches=12)	Yes	Source
IBM	Granite 4.0 Small	02 Oct 2025	58.10%	inferred high-confidence family alias from granite-4.0-tiny-preview (score=0.4700; benches=12)	Yes	Source
IBM	Granite 4.0 Tiny	02 Oct 2025	58.10%	inferred alias from granite-4.0-tiny-preview	Yes	Source
Qwen	Qwen 2 Math RM 72B	-	54.80%	inferred family alias from qwen2-72b-instruct (score=0.3917; benches=17)	Yes	Source
Qwen	Qwen 2 Math 72B	-	54.80%	inferred high-confidence family alias from qwen2-72b-instruct (score=0.4667; benches=17)	Yes	Source
Mistral	Mistral Nemo 12B	18 Jul 2024	50.30%	inferred family alias from mistral-nemo-instruct-2407 (score=0.3250; benches=8)	Yes	Source

Average Score

Score Range

Leading Model

Recorded Results

Average Score

Score Range

Leading Model

Models Using This Benchmark