Individual benchmark scores plotted by date.
| Organisation | Model | Reported | Top Score | Info | Self Reported | Source |
|---|---|---|---|---|---|---|
| DeepSeek OCR 2 | - | 97.10% | inferred family alias from deepseek-v3.2-exp (score=0.3809; benches=14) | Yes | Source | |
| Deepseek V3.2 Exp | 29 Sept 2025 | 97.10% | - | Yes | Source | |
| DeepSeek V3.1 | 21 Aug 2025 | 93.40% | Search agent evaluation | Yes | Source | |
| DeepSeek V3.1 Terminus | 22 Sept 2025 | 93.40% | inferred alias from deepseek-v3.1 | Yes | Source | |
| Ernie 5.0 0110 | - | 75% | inferred version-family alias from ernie-5.0 | Yes | Source | |
| Ernie 5.0 Preview 1220 | - | 75% | inferred version-family alias from ernie-5.0 | Yes | Source | |
| Ernie 5.0 | 22 Jan 2026 | 75% | - | Yes | Source | |
| Ernie 5.0 Preview 1203 | - | 75% | inferred version-family alias from ernie-5.0 | Yes | Source | |
| Gemini 3 Pro Image Preview (Nano Banana Pro) | 20 Nov 2025 | 72.10% | inferred modality/version alias from gemini-3-pro-preview | Yes | Source | |
| Gemini 3 Pro Preview | 18 Nov 2025 | 72.10% | - | Yes | Source | |
| Gemini 3 Flash Preview | 17 Dec 2025 | 68.70% | - | Yes | Source | |
| GPT 4.5 | 27 Feb 2025 | 62.50% | - | Yes | Source | |
| Qwen 3 VL 32B Thinking | - | 55.40% | - | Yes | - | |
| Qwen 3 A235 A22B Instruct 2507 | - | 54.30% | - | Yes | Source | |
| Gemini 2.5 Pro Preview (2025-06-05) | 05 Jun 2025 | 54% | - | Yes | Source | |
| Gemini 2.5 Pro Preview TTS (2025-05-20) | 20 May 2025 | 54% | inferred family alias from gemini-2.5-pro-preview-06-05 (score=0.4243; benches=13) | Yes | Source | |
| Qwen 3 VL 235B A22B Instruct | - | 51.90% | - | Yes | - | |
| Gemini 2.5 Computer Use Preview | 07 Oct 2025 | 50.80% | inferred family alias from gemini-2.5-pro (score=0.3960; benches=16) | Yes | Source | |
| Gemini 2.5 Pro Preview TTS (2025-12-10) | 10 Dec 2025 | 50.80% | inferred modality/version alias from gemini-2.5-pro | Yes | Source | |
| Gemini Embedding 2 Preview | 10 Mar 2026 | 50.80% | manual fallback alias from gemini-2.5-pro | Yes | Source | |
| Gemini 2.5 Pro Experimental (2025-03-25) | 25 Mar 2025 | 50.80% | inferred alias from gemini-2.5-pro | Yes | Source | |
| Gemini 2.5 Pro Preview (2025-05-06) | 06 May 2025 | 50.80% | - | Yes | Source | |
| Qwen 3 Guard Gen 8B | - | 49.60% | inferred family alias from qwen3-vl-8b-thinking (score=0.3400; benches=50) | Yes | - | |
| Qwen 3 Reranker 8B | - | 49.60% | inferred family alias from qwen3-vl-8b-thinking (score=0.3850; benches=50) | Yes | - | |
| Qwen 3 8B | - | 49.60% | inferred high-confidence family alias from qwen3-vl-8b-thinking (score=0.4600; benches=50) | Yes | - | |
| Qwen 3 Embedding 8B | - | 49.60% | inferred family alias from qwen3-vl-8b-thinking (score=0.3850; benches=50) | Yes | - | |
| Qwen 3 Guard Stream 8B | - | 49.60% | inferred family alias from qwen3-vl-8b-thinking (score=0.3371; benches=50) | Yes | - | |
| Qwen 3 VL Reranker 8B | - | 49.60% | inferred high-confidence family alias from qwen3-vl-8b-thinking (score=0.5275; benches=50) | Yes | - | |
| Qwen 3 VL 8B Thinking | - | 49.60% | - | Yes | - | |
| Qwen 3 VL Embedding 8B | - | 49.60% | inferred high-confidence family alias from qwen3-vl-8b-thinking (score=0.5232; benches=50) | Yes | - | |
| Qwen 3 VL 4B Instruct | - | 48% | - | Yes | - | |
| Qwen 3 VL 235B A22B Thinking | - | 44.40% | - | Yes | - | |
| Grok 3 Beta | 19 Feb 2025 | 43.60% | - | Yes | Source | |
| Veo 3.1 Lite Preview | 31 Mar 2026 | 43.30% | manual fallback alias from gemini-3.1-flash-lite-preview | Yes | Source | |
| Gemini 3.1 Flash Lite Preview | 03 Mar 2026 | 43.30% | - | Yes | Source | |
| Gemini 3.1 Flash Image Preview (Nano Banana 2) | 26 Feb 2026 | 43.30% | inferred modality/version alias from gemini-3.1-flash-lite-preview | Yes | Source | |
| o1 preview | 12 Sept 2024 | 42.40% | - | Yes | Source | |
| GPT 4o Transcribe | 20 Mar 2025 | 38.20% | inferred modality/version alias from gpt-4o-2024-08-06 | Yes | Source | |
| GPT 4o Audio (2025-06-03) | 03 Jun 2025 | 38.20% | inferred modality/version alias from gpt-4o-2024-08-06 | Yes | Source | |
| GPT 4o Audio (2024-10-01) | 01 Oct 2024 | 38.20% | inferred modality/version alias from gpt-4o-2024-08-06 | Yes | Source | |
| GPT 4o (2024-08-06) | 06 Aug 2024 | 38.20% | - | Yes | Source | |
| GPT 4o Search Preview | 11 Mar 2025 | 38.20% | inferred modality/version alias from gpt-4o-2024-08-06 | Yes | Source | |
| GPT 4o Audio (2024-12-17) | 17 Dec 2024 | 38.20% | inferred modality/version alias from gpt-4o-2024-08-06 | Yes | Source | |
| GPT 4o Realtime Preview (2024-10-01) | 01 Oct 2024 | 38.20% | inferred modality/version alias from gpt-4o-2024-08-06 | Yes | Source | |
| GPT 4o Transcribe Diarize | 15 Oct 2025 | 38.20% | inferred modality/version alias from gpt-4o-2024-08-06 | Yes | Source | |
| GPT 4o Realtime Preview (2025-06-03) | 03 Jun 2025 | 38.20% | inferred modality/version alias from gpt-4o-2024-08-06 | Yes | Source | |
| Kimi K2 (2025-07-11) | 11 Jul 2025 | 35.30% | Correct | Yes | Source | |
| Kimi K2 (2025-09-05) | 05 Sept 2025 | 31% | Correct | Yes | Source | |
| Gemini 2.0 Flash | 05 Feb 2025 | 29.90% | - | Yes | Source | |
| Gemini 2.5 Flash Preview (2025-04-17) | 17 Apr 2025 | 29.70% | Thinking | Yes | Source | |
| Qwen 3 VL 30B A3B Instruct | - | 27% | - | Yes | - | |
| Gemini 2.5 Flash Preview (2025-05-20) | 20 May 2025 | 26.90% | - | Yes | Source | |
| Gemini Live 2.5 Flash Preview | 09 Apr 2025 | 26.90% | inferred high-confidence family alias from gemini-2.5-flash (score=0.5083; benches=14) | Yes | Source | |
| Gemini 2.5 Flash Exp Native Audio Thinking Dialog | - | 26.90% | inferred modality/version alias from gemini-2.5-flash | Yes | Source | |
| Gemini 2.5 Flash Image Preview (Nano Banana) | 25 Aug 2025 | 26.90% | inferred modality/version alias from gemini-2.5-flash | Yes | Source | |
| Gemini 2.5 Flash Preview (2025-09-25) | 25 Sept 2025 | 26.90% | inferred alias from gemini-2.5-flash | Yes | Source | |
| Gemini 2.5 Flash Preview TTS (2025-05-20) | 20 May 2025 | 26.90% | inferred modality/version alias from gemini-2.5-flash | Yes | Source | |
| Gemini 2.5 Flash Image (Nano Banana) | 02 Oct 2025 | 26.90% | inferred modality/version alias from gemini-2.5-flash | Yes | Source | |
| Gemini 2.5 Flash Native Audio Preview (2025-09-23) | - | 26.90% | inferred modality/version alias from gemini-2.5-flash | Yes | Source | |
| Gemini 2.5 Flash Preview Native Audio Dialog | - | 26.90% | inferred modality/version alias from gemini-2.5-flash | Yes | Source | |
| Gemini 2.5 Flash Preview TTS (2025-12-10) | 10 Dec 2025 | 26.90% | inferred modality/version alias from gemini-2.5-flash | Yes | Source | |
| DeepSeek OCR | 20 Oct 2025 | 24.90% | inferred family alias from deepseek-v3 (score=0.3000; benches=20) | Yes | Source | |
| DeepSeek V4 | - | 24.90% | inferred high-confidence family alias from deepseek-v3 (score=0.5818; benches=20) | Yes | Source | |
| DeepSeek V2 (2024-06-28) | 28 Jun 2024 | 24.90% | inferred family alias from deepseek-v3 (score=0.4159; benches=20) | Yes | Source | |
| Qwen 3 VL 30B A3B Thinking | - | 23.90% | - | Yes | - | |
| Mistral Large 3.0 | 02 Dec 2025 | 23.80% | Exact Match | Yes | Source | |
| Mistral Large 1.0 | 26 Feb 2024 | 23.80% | inferred family alias from mistral-large-latest (score=0.3650; benches=5) | Yes | Source | |
| Gemini 2.0 Flash Lite | 05 Feb 2025 | 21.70% | - | Yes | Source | |
| Grok 3 Mini Beta | 19 Feb 2025 | 21.70% | - | Yes | Source | |
| MiniMax M1 80K | 16 Jun 2025 | 18.50% | - | Yes | - | |
| Minimax M1 40K | 16 Jun 2025 | 17.90% | - | Yes | - | |
| o3 mini | 30 Jan 2025 | 15% | - | Yes | Source | |
| Gemini 2.5 Flash Lite Preview (2025-06-17) | 17 Jun 2025 | 13% | Thinking | Yes | Source | |
| Mistral Small 3.2 | 20 Jun 2025 | 12.10% | TotalAcc | Yes | Source | |
| Gemini 2.5 Flash Lite Preview (2025-09-25) | 25 Sept 2025 | 10.70% | inferred alias from gemini-2.5-flash-lite | Yes | Source | |
| Phi 2 | - | 3% | inferred family alias from phi-4 (score=0.3100; benches=13) | Yes | Source | |
| Phi 1 | - | 3% | inferred family alias from phi-4 (score=0.3100; benches=13) | Yes | Source | |
| Phi 4 | 12 Dec 2024 | 3% | - | Yes | Source | |
| Ernie 4.5 21B A3B Thinking | - | 1.80% | inferred version-family alias from ernie-4.5 | Yes | Source | |
| Ernie 4.5 Turbo | - | 1.80% | inferred version-family alias from ernie-4.5 | Yes | Source | |
| Ernie 4.5 21B A3B | - | 1.80% | inferred version-family alias from ernie-4.5 | Yes | Source | |
| Ernie 4.5 VL 424B A47B | - | 1.80% | inferred version-family alias from ernie-4.5 | Yes | Source | |
| Ernie 4.5 300B A47B | - | 1.80% | inferred version-family alias from ernie-4.5 | Yes | Source | |
| Ernie 4.5 VL 28B A3B | - | 1.80% | inferred version-family alias from ernie-4.5 | Yes | Source |