Individual benchmark scores plotted by date.
| Organisation | Model | Reported | Top Score | Info | Self Reported | Source |
|---|---|---|---|---|---|---|
| GPT 5.5 | 23 Apr 2026 | 70% | mini-swe-agent; effort xhigh; 70% +/-4%; avg cost $6.61; avg time 21m; out tok 47k | No | Source | |
| Claude Opus 4.8 | 28 May 2026 | 58% | mini-swe-agent; effort max; 58% +/-5%; avg cost $12.58; avg time 43m; out tok 136k | No | Source | |
| GPT 5.4 | 05 Mar 2026 | 56% | mini-swe-agent; effort xhigh; 56% +/-5%; avg cost $4.38; avg time 27m; out tok 71k | No | Source | |
| Claude Opus 4.7 | 16 Apr 2026 | 54% | mini-swe-agent; effort max; 54% +/-5%; avg cost $18.19; avg time 39m; out tok 103k | No | Source | |
| Claude Sonnet 4.6 | 17 Feb 2026 | 32% | mini-swe-agent; effort high; 32% +/-4%; avg cost $5.52; avg time 42m; out tok 76k | No | Source | |
| Claude Opus 4.6 | 05 Feb 2026 | 28% | mini-swe-agent; effort max; 28% +/-4%; avg cost $5.39; avg time 30m; out tok 44k | No | Source | |
| Gemini 3.5 Flash | 19 May 2026 | 28% | mini-swe-agent; effort medium; 28% +/-4%; avg cost $7.42; avg time 17m; out tok 189k | No | Source | |
| Kimi K2.6 | 20 Apr 2026 | 24% | mini-swe-agent; 24% +/-4%; avg cost $3.16; avg time 56m; out tok 84k | No | Source | |
| GPT 5.4 Mini | 17 Mar 2026 | 24% | mini-swe-agent; effort xhigh; 24% +/-4%; avg cost $2.08; avg time 33m; out tok 135k | No | Source | |
| MiMo V2.5 Pro | 22 Apr 2026 | 19% | mini-swe-agent; 19% +/-4%; avg cost $1.99; avg time 28m; out tok 49k | No | Source | |
| GLM 5.1 | 07 Apr 2026 | 18% | mini-swe-agent; 18% +/-4%; avg cost $7.46; avg time 35m; out tok 49k | No | Source | |
| Grok Build 0.1 | 15 May 2026 | 13% | mini-swe-agent; 13% +/-3%; avg cost $6.60; avg time 44m; out tok 52k | No | Source | |
| Gemini 3.1 Pro Preview | 19 Feb 2026 | 10% | DeepSWE label: gemini-3.1-pro; mini-swe-agent; 10% +/-3%; avg cost $1.84; avg time 36m; out tok 53k | No | Source | |
| DeepSeek V4 Pro | 24 Apr 2026 | 8% | mini-swe-agent; 8% +/-2%; avg cost $4.22; avg time 37m; out tok 50k | No | Source | |
| Gemini 3 Flash Preview | 17 Dec 2025 | 5% | DeepSWE label: gemini-3-flash; mini-swe-agent; 5% +/-2%; avg cost $1.53; avg time 39m; out tok 233k | No | Source |