Capabilities, modalities, and lifecycle fields pulled from the model database.
Comparative results across benchmarks shared by the selected models.
| BrowseComp | 54.9% |
| Graphwalks parents <128k | 73.3% |
| MMLU Pro | 54.9% |
| HealthBench | 54.3% |
| HealthBench Hard | 25.5% |
| Humanity's Last Exam | 6.3% |
| LongFact-Concepts hallucination rate | 1.0% |
| ERQA | 42.0% |
| FActScore hallucination rate | 2.8% |
| Tau 2 Airline | 55.0% |
| HMMT 2025 | 93.3% |
| Confabulations | 1034.0% |
| BrowseComp Long Context 128k | 90.0% |
| Creative Story Writing | 860.0% |
| Frontier Math | 13.5% |
| Tau 2 Telecom | 38.6% |
| VideoMME | 86.7% |
| Video MMMU | 61.6% |
| ARC-AGI-1 | 6.0% |
| ARC-AGI-2 | 0.0% |
| CharXiv-Reasoning | 57.8% |
| COLLIE | 70.5% |
| BrowseComp Long Context 256k | 88.8% |
| OpenAI-MRCR: 2 needle 128k | 95.2% |
| OpenAI-MRCR: 2 needle 256k | 86.8% |
| GPQA Diamond | 77.8% |
| Graphwalks bfs <128k | 78.3% |
| MMMU | 74.4% |
| MMMU Pro | 62.7% |
| SWE-Bench | 52.8% |
| Tau 2 Retail | 72.8% |
| MathArena Apex | 1.0% |
| SimpleBench | 56.7% |
| Aider-Polyglot | 26.7% |
| AIME 2025 | 61.9% |
| LongFact-Objects hallucination rate | 1.2% |
Observed provider pricing per million tokens.
All unique meters observed across the selected models.
| Meter | GPT 5 |
|---|---|
| Input Text Tokens | $1.25 |
Providers that expose each model based on observed pricing data.
Maximum input and output token capacity.
Usage and distribution terms.
Model release chronology.
Most recent training data date (when available).
A deeper field-by-field view (including benchmarks, pricing, and links).