Individual benchmark scores plotted by date.
| Organisation | Model | Reported | Top Score | Info | Self Reported | Source |
|---|---|---|---|---|---|---|
| Claude Mythos Preview | 07 Apr 2026 | 83.10 | Vulnerability reproduction | Yes | Source | |
| GPT 5.3 Codex | 05 Feb 2026 | 77.60 | Cybersecurity Capture The Flag Challenges; xhigh reasoning | Yes | Source | |
| GLM 5.1 | - | 68.70 | - | Yes | Source | |
| Claude Opus 4.6 | 05 Feb 2026 | 66.60 | LLM Stats (ZeroEval) | Yes | Source | |
| Kimi K2.5 | 27 Jan 2026 | 41.30 | LLM Stats (ZeroEval) | Yes | Source |