Individual benchmark scores plotted by date.
| Organisation | Model | Reported | Top Score | Info | Self Reported | Source |
|---|---|---|---|---|---|---|
| Qwen 3 A235 A22B Instruct 2507 | - | 84.30% | - | Yes | Source | |
| Qwen 3 VL 235B A22B Instruct | - | 83.40% | - | Yes | - | |
| Kimi K2 (2025-07-11) | 11 Jul 2025 | 77.60% | Correct | Yes | Source | |
| DeepSeek OCR | 20 Oct 2025 | 64.80% | inferred family alias from deepseek-v3 (score=0.3000; benches=20) | Yes | Source | |
| DeepSeek V2 (2024-06-28) | 28 Jun 2024 | 64.80% | inferred family alias from deepseek-v3 (score=0.4159; benches=20) | Yes | Source | |
| DeepSeek V4 | - | 64.80% | inferred high-confidence family alias from deepseek-v3 (score=0.5818; benches=20) | Yes | Source |