Providers

API providers, route pricing, availability, and recent reliability signals.

Performance

Latency, throughput, and reliability signals from recent traffic.

Pricing

Effective prices over the last 30 days, with current provider list prices for context.

Benchmarks

Headline benchmark standings and comparison context.

Activity

Daily gateway activity over the last 30 days, with current UTC-day pace projection.

Apps Using This Model

Public apps observed in gateway request traffic for this model.

Model Uptime

Uptime trend for this model over the last 24 hours.

Quickstart

Start calling this model with endpoint-specific examples.

About

Key dates, capabilities, and model metadata.

Subscriptions

Commercial plans and bundled access that currently include this model.

Google: Gemma 4 31B

Chat Compare

Gemma 4 31B is an instruction-tuned Mixture-of-Experts (MoE) model from Google DeepMind. It supports turning still images into animated clips as well as prompt-driven video workflows.

Providers

API providers, route pricing, availability, and recent reliability signals.


Google AI Studio	$0	$0	--	--	--	--
Quantization: --\|Context Length: --\|Max Output: -- Pricing Input and output $0 Per 1M tokens Supported parameters Hover for details
CrofAI	$0.1	$0.3	$0.02	--	--	--
Quantization: Q4_0\|Context Length: --\|Max Output: -- Pricing Input $0.1 Per 1M tokens Output $0.3 Per 1M tokens Cache Reads $0.02 Per 1M tokens Supported parameters No parameter metadata is published for this route.
SiliconFlow	$0.13	$0.4	--	--	--	--
Quantization: FP8\|Context Length: --\|Max Output: -- Pricing Input $0.13 Per 1M tokens Output $0.4 Per 1M tokens Supported parameters No parameter metadata is published for this route.
AkashML	$0.14	$0.4	--	--	--	--
Quantization: --\|Context Length: --\|Max Output: -- Pricing Input $0.14 Per 1M tokens Output $0.4 Per 1M tokens Supported parameters No parameter metadata is published for this route.
GMICloud	$0.14	$0.4	--	--	--	--
Quantization: FP8\|Context Length: --\|Max Output: -- Pricing Input $0.14 Per 1M tokens Output $0.4 Per 1M tokens Supported parameters No parameter metadata is published for this route.
NovitaAI	$0.14	$0.4	--	--	--	--
Quantization: BF16\|Context Length: --\|Max Output: -- Pricing Input $0.14 Per 1M tokens Output $0.4 Per 1M tokens Supported parameters Hover for details
Venice	$0.17	$0.5	--	--	--	--
Quantization: BF16\|Context Length: --\|Max Output: -- Pricing Input $0.17 Per 1M tokens Output $0.5 Per 1M tokens Supported parameters No parameter metadata is published for this route.
Together	$0.2	$0.5	--	--	--	--
Quantization: FP8\|Context Length: --\|Max Output: -- Pricing Input $0.2 Per 1M tokens Output $0.5 Per 1M tokens Supported parameters No parameter metadata is published for this route.
Weights & Biases	$0.3	$1.25	--	--	--	--
Quantization: --\|Context Length: --\|Max Output: -- Pricing Input $0.3 Per 1M tokens Output $1.25 Per 1M tokens Supported parameters No parameter metadata is published for this route.
Cerebras	$0.99	$1.49	--	--	--	--
Quantization: --\|Context Length: --\|Max Output: -- Pricing Input $0.99 Per 1M tokens Output $1.49 Per 1M tokens Supported parameters No parameter metadata is published for this route.
DigitalOcean	$0.18	$0.5	--	--	--	--
Quantization: --\|Context Length: --\|Max Output: -- Pricing Input $0.18 Per 1M tokens Output $0.5 Per 1M tokens Supported parameters No parameter metadata is published for this route.
Friendli	$0.14	$0.4	--	--	--	--
Quantization: --\|Context Length: --\|Max Output: -- Pricing Input $0.14 Per 1M tokens Output $0.4 Per 1M tokens Supported parameters No parameter metadata is published for this route.


Google AI Studio	--	--	100.0%	237
AkashML	$0.14	$0.4	--	0
Cerebras	$0.99	$1.49	--	0
CrofAI	$0.1	$0.3	--	0
DigitalOcean	$0.18	$0.5	--	0
Friendli	$0.14	$0.4	--	0
GMICloud	$0.14	$0.4	--	0
NovitaAI	$0.14	$0.4	--	0
SiliconFlow	$0.13	$0.4	--	0
Together	$0.2	$0.5	--	0
Venice	$0.17	$0.5	--	0
Weights & Biases	$0.3	$1.25	--	0

import AIStats from '@ai-stats/sdk'; const client = new AIStats({ apiKey: process.env.AI_STATS_API_KEY, }); const response = await client.generateResponse({ "model": "google/gemma-4-31b", "input": "Give me one fun fact about cURL." }); const outputText = response.output ?.flatMap((item) => item.content ?? []) .find((item) => item.type === "output_text") ?.text; console.log(outputText ?? response);

Parameter	Description
`temperature`	Controls how random token selection can be.
`top_p`	Applies nucleus sampling by limiting candidates to a probability mass threshold.
`top_k`	Restricts sampling to the top-k candidate tokens on providers that expose it.
`max_tokens`	Caps output length on endpoints and providers that use the max_tokens field name.
`frequency_penalty`	Discourages repeated tokens in proportion to how often they already appeared.
`presence_penalty`	Encourages the model to explore new wording or topics after they first appear.
`seed`	Requests deterministic sampling when the upstream provider supports seeded generation.
`stop`	Defines one or more sequences that terminate generation early.
`logprobs`	Requests token-level probability data in the response.
`structured_outputs`	Capability signal for reliable schema-constrained output workflows.
`reasoning`	Provider-specific reasoning configuration for reasoning-capable APIs.
`logit_bias`	Adjusts token selection bias directly when a provider exposes logit control.
`top_logprobs`	Limits how many alternative token probabilities are returned per position.

Parameter

Description

temperature

Controls how random token selection can be.

top_p

Applies nucleus sampling by limiting candidates to a probability mass threshold.

top_k

Restricts sampling to the top-k candidate tokens on providers that expose it.

max_tokens

Caps output length on endpoints and providers that use the max_tokens field name.

frequency_penalty

Discourages repeated tokens in proportion to how often they already appeared.