OfficeQA

54.10% - 86.30%

OfficeQA - Benchmark Leaderboard & Model Performance | AI Stats

Organisation	Model	Reported	Top Score	Info	Self Reported	Source
Anthropic	Claude Opus 4.7	16 Apr 2026	86.30%	-	Yes	Source
Anthropic	Claude Sonnet 5	30 Jun 2026	73.30%	Exact-match accuracy on Anthropic internal agentic harness; mean of five trials	Yes	Source
OpenAI	GPT 5.4	05 Mar 2026	68.10%	-	Yes	Source
OpenAI	GPT 5.5	23 Apr 2026	54.10%	Pro	Yes	Source