BIG-Bench Hard

Leading Model

89.20% - Gemini 1.5 Pro Exp (2024-08-01)

BIG-Bench Hard - Benchmark Leaderboard & Model Performance | AI Stats

Models Using This Benchmark

Organisation	Model	Reported	Top Score	Info	Self Reported	Source
Google	Gemini 1.5 Pro Exp (2024-08-01)	01 Aug 2024	89.20%	inferred alias from gemini-1.5-pro	Yes	Source
Google	Gemini Robotics ER 1.5 Preview	25 Sept 2025	89.20%	inferred family alias from gemini-1.5-pro (score=0.3717; benches=23)	Yes	Source
Google	Gemini 1.5 Pro Exp (2024-08-27)	27 Aug 2024	89.20%	inferred alias from gemini-1.5-pro	Yes	Source
Google	Gemini 1.5 Pro 001	23 May 2024	89.20%	inferred alias from gemini-1.5-pro	Yes	Source
Google	LearnLM 1.5 Pro Experimental	19 Nov 2024	89.20%	inferred family alias from gemini-1.5-pro (score=0.3700; benches=23)	Yes	Source
Google	Gemini 1.5 Flash Preview	14 May 2024	85.50%	inferred alias from gemini-1.5-flash	Yes	Source
Google	Gemini 1.5 Flash 001	23 May 2024	85.50%	inferred alias from gemini-1.5-flash	Yes	Source
Microsoft	Phi 3.5 MoE instruct	23 Aug 2024	79.10%	-	Yes	Source
Microsoft	Phi 4 Mini	01 Feb 2025	70.40%	-	Yes	Source
IBM	Granite Guardian 3.1 8B	-	69.13%	inferred family alias from granite-3.3-8b-instruct (score=0.4062; benches=14)	Yes	Source
IBM	Granite Speech 3.2 8B	-	69.13%	inferred family alias from granite-3.3-8b-instruct (score=0.4062; benches=14)	Yes	Source
IBM	Granite Guardian 3.3 8B	-	69.13%	inferred high-confidence family alias from granite-3.3-8b-instruct (score=0.5071; benches=14)	Yes	Source
IBM	Granite 3.0 8B Instruct	-	69.13%	inferred high-confidence family alias from granite-3.3-8b-instruct (score=0.4911; benches=14)	Yes	Source
IBM	Granite Speech 3.3 8B	-	69.13%	inferred high-confidence family alias from granite-3.3-8b-instruct (score=0.5071; benches=14)	Yes	Source
IBM	Granite 3.3 8B Instruct	16 Apr 2025	69.13%	-	Yes	Source
IBM	Granite Guardian 3.0 8B	-	69.13%	inferred family alias from granite-3.3-8b-instruct (score=0.4062; benches=14)	Yes	Source
IBM	Granite 3.2 8B Instruct Preview	-	69.13%	inferred high-confidence family alias from granite-3.3-8b-instruct (score=0.4687; benches=14)	Yes	Source
IBM	Granite 3.2 8B Instruct	-	69.13%	inferred high-confidence family alias from granite-3.3-8b-instruct (score=0.4911; benches=14)	Yes	Source
IBM	Granite 3.3 2B Instruct	16 Apr 2025	69.13%	inferred family alias from granite-3.3-8b-instruct (score=0.3627; benches=14)	Yes	Source
IBM	Granite 3.1 8B Instruct	-	69.13%	inferred high-confidence family alias from granite-3.3-8b-instruct (score=0.4911; benches=14)	Yes	Source
Microsoft	Phi 3.5 mini instruct	23 Aug 2024	69%	-	Yes	Source
Microsoft	Phi 3 Mini 128K Instruct	-	69%	inferred family alias from phi-3.5-mini-instruct (score=0.3533; benches=31)	Yes	Source
IBM	Granite 4.0 Tiny	02 Oct 2025	55.70%	inferred alias from granite-4.0-tiny-preview	Yes	Source
IBM	Granite 4.0 Small	02 Oct 2025	55.70%	inferred high-confidence family alias from granite-4.0-tiny-preview (score=0.4700; benches=12)	Yes	Source
IBM	Granite 4.0 Micro	02 Oct 2025	55.70%	inferred high-confidence family alias from granite-4.0-tiny-preview (score=0.4700; benches=12)	Yes	Source
IBM	Granite 4.0 Tiny Preview	02 May 2025	55.70%	-	Yes	Source
Google	Gemma 3n E4B	25 Jun 2025	52.90%	-	Yes	Source
Google	Gemma 3n E2B	25 Jun 2025	44.30%	-	Yes	Source

Average Score

Score Range

Leading Model

Recorded Results

Average Score

Score Range

Leading Model

Models Using This Benchmark