Loading

Preparing your page

AI Stats is fetching the latest data for this page. This usually only takes a moment.

Return to the main page and start from a clean slate.

Explore the catalogue of AI models and their details.

Check the documentation for guides, references, and examples.

Explore the Gateway

Access the largest unified AI Gateway and use over 300+ models easily.

If this screen doesn't disappear after a short while, you can refresh the page or use one of the links above to continue.

Qwen3 235B A22B Thinking 2507 Benchmarks - Performance Metrics & Comparisons | AI Stats

Qwen3 235B A22B Thinking 2507

Qwen3 235B A22B Thinking 2507

Qwen

Overview Family Timeline Benchmarks Availability Pricing Quickstart Performance

Highlights

Top benchmark results for qwen/qwen3-235b-a22b-thinking-2507-2025-07-25.

0.92#10

16.77#22

Creative Story Writing

8.24#4

0.81#19

0.84#6

Humanity's Last Exam

0.18#11

0.88#2

0.78#1

LiveCodeBench V6

0.74#1

0.05#1

0.94#1

0.84#1

Online Judgement Benchmark

0.33#1

0.65#1

0.58#6

0.72#6

0.46#6

Benchmark table

Benchmark	Category	Top Score	Info	Self Reported	Source
AIME 2025	math	0.92	-	Yes	Source
Confabulations	-	16.77	-	No	Source
Creative Story Writing	-	8.24	-	No	Source
GPQA Diamond	general-knowledge	0.81	-	Yes	Source
HMMT 2025	-	0.84	-	Yes	Source
Humanity's Last Exam	-	0.18	Text Only	Yes	Source
IFEval	-	0.88	-	Yes	Source
LiveBench	-	0.78	2024-11-25	Yes	Source
LiveCodeBench V6	-	0.74	-	Yes	Source
MathArena Apex	-	0.05	-	No	Source
MMLU Redux	-	0.94	-	Yes	Source
MMLU-Pro	-	0.84	-	Yes	Source
Online Judgement Benchmark	-	0.33	-	Yes	Source
SuperGPQA	-	0.65	-	Yes	Source
Tau 2 Airline	-	0.58	-	Yes	Source
Tau 2 Retail	-	0.72	-	Yes	Source
Tau 2 Telecom	-	0.46	-	Yes	Source

Benchmark comparisons

Use the selector to switch benchmarks and see how this model stacks up against its closest competitors.

AIME 2025

Compare this model with the leading peers for the selected benchmark.

Benchmark

0.92

Rank #10/40

40 models

Showing 11 models around the selected model (out of 40 total).

View benchmark page