How AI Stats measures latency and throughput

What we are measuring

Latency represents the elapsed request time recorded by the gateway for a routed completion, subject to the instrumentation available for that request type.

Throughput represents the output rate observed for successful requests, typically normalized to tokens per second for text-generation style workloads.

Aggregation windows

Public performance views use rolling windows such as the last 24 hours for detailed performance and longer periods for leaderboard and trend summaries.

Median values are preferred over averages for public displays because a small number of slow outliers can distort means and produce misleading rankings.

Filtering and eligibility

AI Stats excludes obviously invalid records, unknown identifiers, and rows without enough usable request volume to support a meaningful comparison.

Performance pages and leaderboards only rank rows with finite, positive throughput or latency values once the relevant thresholds are met.

Why values can move

Latency and throughput are operational measurements, not intrinsic constants of a model. They can move because of provider routing, regional load, model updates, queueing behavior, prompt length, output length, and transport conditions.

A model may therefore rank differently across providers, across time windows, or across the gateway and the provider's own direct benchmarks.

Caveats

Public performance charts are designed for directional comparison. They are not a substitute for your own workload-specific benchmarking under your own prompt mix, concurrency, and latency budget.

If a page has insufficient current data, AI Stats may show an empty state and treat that route as a weak search candidate until enough public volume exists.

What we are measuring

Latency represents the elapsed request time recorded by the gateway for a routed completion, subject to the instrumentation available for that request type.

Throughput represents the output rate observed for successful requests, typically normalized to tokens per second for text-generation style workloads.

Aggregation windows

Public performance views use rolling windows such as the last 24 hours for detailed performance and longer periods for leaderboard and trend summaries.

Median values are preferred over averages for public displays because a small number of slow outliers can distort means and produce misleading rankings.

Filtering and eligibility

AI Stats excludes obviously invalid records, unknown identifiers, and rows without enough usable request volume to support a meaningful comparison.

Performance pages and leaderboards only rank rows with finite, positive throughput or latency values once the relevant thresholds are met.

Why values can move

A model may therefore rank differently across providers, across time windows, or across the gateway and the provider's own direct benchmarks.

Caveats

Public performance charts are designed for directional comparison. They are not a substitute for your own workload-specific benchmarking under your own prompt mix, concurrency, and latency budget.

If a page has insufficient current data, AI Stats may show an empty state and treat that route as a weak search candidate until enough public volume exists.

How AI Stats measures latency and throughput

What we are measuring

Aggregation windows

Filtering and eligibility

Why values can move

Caveats

Related pages

How AI Stats measures latency and throughput

What we are measuring

Aggregation windows

Filtering and eligibility

Why values can move

Caveats

Related pages