Individual benchmark scores plotted by date.
| Organisation | Model | Reported | Top Score | Info | Self Reported | Source |
|---|---|---|---|---|---|---|
| Trinity Large Thinking | 01 Apr 2026 | 88.00 | Hugging Face model card benchmark table (arcee-ai/Trinity-Large-Thinking) | Yes | Source | |
| GPT 5.1 Chat | 13 Nov 2025 | 67.00 | LLM Stats (ZeroEval) | inferred alias from gpt-5.1-2025-11-13 | Yes | Source | |
| Claude Haiku 4.5 | 15 Oct 2025 | 63.60 | inferred alias from claude-haiku-4-5-20251001 | Yes | Source | |
| Qwen 3 Next 80B A3B Thinking | - | 60.50 | LLM Stats (ZeroEval) | Yes | Source | |
| Qwen 3 235B A22B Thinking 2507 | - | 58.00 | LLM Stats (ZeroEval) | Yes | Source | |
| Longcat Flash Cat | - | 58.00 | inferred high-confidence family alias from longcat-flash-chat (score=0.4667; benches=16) | Yes | Source | |
| Nemotron 3 Super | 11 Mar 2026 | 56.25 | LLM Stats (ZeroEval) | Yes | Source | |
| Mercury 2 | 24 Feb 2026 | 53.00 | - | Yes | Source | |
| Nemotron Nano 3 30B A3B | 15 Dec 2025 | 48.00 | LLM Stats (ZeroEval) | Yes | Source | |
| GPT 4o (2024-08-06) | 06 Aug 2024 | 45.50 | LLM Stats (ZeroEval) | Yes | Source | |
| GPT 4o Audio (2025-06-03) | 03 Jun 2025 | 45.50 | LLM Stats (ZeroEval) | inferred modality/version alias from gpt-4o-2024-08-06 | Yes | Source | |
| GPT 4o Audio (2024-10-01) | 01 Oct 2024 | 45.50 | LLM Stats (ZeroEval) | inferred modality/version alias from gpt-4o-2024-08-06 | Yes | Source | |
| GPT 4o Audio (2024-12-17) | 17 Dec 2024 | 45.50 | LLM Stats (ZeroEval) | inferred modality/version alias from gpt-4o-2024-08-06 | Yes | Source | |
| GPT 4o Realtime Preview (2025-06-03) | 03 Jun 2025 | 45.50 | LLM Stats (ZeroEval) | inferred modality/version alias from gpt-4o-2024-08-06 | Yes | Source | |
| GPT 4o Realtime Preview (2024-10-01) | 01 Oct 2024 | 45.50 | LLM Stats (ZeroEval) | inferred modality/version alias from gpt-4o-2024-08-06 | Yes | Source | |
| GPT 4o Search Preview | 11 Mar 2025 | 45.50 | LLM Stats (ZeroEval) | inferred modality/version alias from gpt-4o-2024-08-06 | Yes | Source | |
| GPT 4o Transcribe | 20 Mar 2025 | 45.50 | LLM Stats (ZeroEval) | inferred modality/version alias from gpt-4o-2024-08-06 | Yes | Source | |
| GPT 4o Transcribe Diarize | 15 Oct 2025 | 45.50 | LLM Stats (ZeroEval) | inferred modality/version alias from gpt-4o-2024-08-06 | Yes | Source | |
| Qwen 3 Next 80B A3B Instruct | - | 45.50 | LLM Stats (ZeroEval) | Yes | Source |