Capabilities, modalities, and lifecycle fields pulled from the model database.
Comparative results across benchmarks shared by the selected models.
| GPQA Diamond | 67.1% |
| HealthBench | 53.0% |
| HealthBench Concensus | 89.9% |
| AIME 2024 | 56.3% |
| Tau Bench (Airline) | 42.6% |
| Codeforces | 159500.0% |
| Confabulations | 1565.0% |
| Humanity's Last Exam | 5.2% |
| MMLU | 85.9% |
| AIME 2025 | 50.4% |
| MMMLU | 74.1% |
| SWE-Bench | 47.9% |
| Tau Bench (Retail) | 49.4% |
| HealthBench Hard | 22.8% |
| Aider-Polyglot | 24.0% |
| EQ-Bench 3 | 115210.0% |
| MathArena Apex | 1.0% |
Observed provider pricing per million tokens.
All unique meters observed across the selected models.
| Meter | GPT OSS 120b |
|---|---|
| Input Text Tokens | $0.05 |
Providers that expose each model based on observed pricing data.
Maximum input and output token capacity.
Usage and distribution terms.
Model release chronology.
Most recent training data date (when available).
A deeper field-by-field view (including benchmarks, pricing, and links).
| General Information | |
| Context Window | Input: 131,072 Output: 131,072 |
| Modalities | In: Text Out: Text |
| Reasoning | - |
| Web access | - |
| Parameters | 20.9B |
| Training Tokens | - |
| License | Apache 2.0 |
| Knowledge Cutoff | Jun 2024 |
| Status | Available |
| Release | Aug 2025 |
| Announced | Aug 2025 |
| Deprecation | - |
| Retirement | - |
| Links | |
| Operational Metrics | |
| Cost per 1M Tokens | Input: $0.05 Output: $0.25 |
| Latency | - |
| Throughput | - |
| Benchmarks | |
| AIME 2024 | |
| AIME 2025 | |
| Aider-Polyglot | |
| Codeforces | |
| Confabulations | |
| EQ-Bench 3 | |
| GPQA Diamond | |
| HealthBench | |
| HealthBench Concensus | |
| HealthBench Hard | |
| Humanity's Last Exam | |
| MMLU | |
| MMMLU | |
| MathArena Apex | |
| SWE-Bench | |
| Tau Bench (Airline) | |
| Tau Bench (Retail) | |