Nemotron 3 Super Pricing, Benchmarks, Latency & Providers

Performance

Core latency and throughput trends from recent traffic.

No gateway telemetry yet

This model hasn't processed any gateway traffic in the selected window. Live charts will appear as soon as requests arrive.

Quickstart

Start calling this model with endpoint-specific examples.

Step 1

Get an API key

Create an API key inSettingsKeysand store it asAI_STATS_API_KEY

Keep it server-side, never commit it, and rotate it immediately if exposed.

Step 2

Send the request

Choose a supported endpoint, pick a main language, then select the example style you want to copy.

Streaming

import AIStats from '@ai-stats/sdk';

const client = new AIStats({
  apiKey: process.env.AI_STATS_API_KEY,
});

const response = await client.generateResponse({
    "model": "nvidia/nemotron-3-super-120b-a12b",
    "input": "Give me one fun fact about cURL.",
    "service_tier": "standard"
});

const outputText = response.output
  ?.flatMap((item) => item.content ?? [])
  .find((item) => item.type === "output_text")
  ?.text;

console.log(outputText ?? response);

Accepted IDsClick to use and copy

Parameters

Aggregated across active providers for the responses route.

Routing will select a compatible provider when a parameter narrows availability, so this list stays model-facing instead of provider-facing.

View all parameters

Parameter	Description
`temperature`	Controls how random token selection can be.
`top_p`	Applies nucleus sampling by limiting candidates to a probability mass threshold.
`top_k`	Restricts sampling to the top-k candidate tokens on providers that expose it.
`stop`	Defines one or more sequences that terminate generation early.
`tool_choice`	Controls which tool, if any, the model should call.
`tools`	Defines callable tools or functions the model can invoke.
`response_format`	Requests plain text, JSON, or schema-constrained output formats.
`structured_outputs`	Capability signal for reliable schema-constrained output workflows.