Now live — DeepSeek R1-0528 reasoning · UltraFast LPU inference available in all plans Explore → ×

All 33 models.
One API.

Groq UltraFast (7) on LPU silicon, plus the full Nebius Token Factory catalog (26) on H100 — reasoning, vision, embeddings, and open-weight chat — all through a single OpenAI-compatible interface.

33
Models in catalog

1,800
Max TPS

262K
Max context






Catalog

White documentation cards for every model. UltraFast models on LPU silicon carry the orange accent. Groq UltraFast (7) + Nebius Token Factory catalog (26), all with 20% standard markup on the base price.

33 models · sorted by category, then release date

Groq · OpenAI

UltraFast

GPT OSS 20B 128k

OpenAI’s open-weight 20B MoE (3.6B active, 32 experts) on Groq LPU silicon. 1,800 TPS — MXFP4 quantized, adjustable CoT effort, native tool use.

TPS
1,800
Context
128K
Input
$0.09/M
Output
$0.36/M
Live · 99.99% SLA
Try it

Groq · OpenAI

UltraFast

GPT OSS Safeguard 20B

Safety-tuned sibling of GPT OSS 20B on Groq LPU. 1,800 TPS — content classification, jailbreak detection, and policy enforcement at sub-second latency.

TPS
1,800
Context
128K
Input
$0.09/M
Output
$0.36/M
Live · 99.99% SLA
Try it

Groq · OpenAI

UltraFast

GPT OSS 120B 128k

OpenAI’s flagship 120B open-weight MoE (5.1B active, 128 experts) on Groq LPU. 1,000 TPS — near-frontier reasoning with adjustable CoT effort, Apache 2.0.

TPS
1,000
Context
128K
Input
$0.18/M
Output
$0.72/M
Live · 99.99% SLA
Try it

Groq · Meta

UltraFast

Llama 4 Scout (17Bx16E) 128k

Meta’s MoE Llama 4 Scout (17B×16E experts) on Groq LPU. 1,200 TPS — 128K context, tuned for low-latency multi-turn chat and native function calling.

TPS
1,200
Context
128K
Input
$0.13/M
Output
$0.41/M
Live · 99.99% SLA
Try it

Groq · Alibaba

UltraFast

Qwen3 32B 131k

Alibaba’s Qwen3 32B dense (hybrid thinking mode) on Groq LPU. 1,300 TPS — 131K context, 100+ languages, strong tool calling.

TPS
1,300
Context
131K
Input
$0.35/M
Output
$0.71/M
Live · 99.99% SLA
Try it

Groq · Meta

UltraFast

Llama 3.3 70B Versatile 128k

Meta’s 70B production workhorse on Groq LPU. 1,100 TPS — high-throughput multi-turn chat and tool-calling agents at sub-second latency.

TPS
1,100
Context
128K
Input
$0.71/M
Output
$0.95/M
Live · 99.99% SLA
Try it

Groq · Meta

UltraFast

Llama 3.1 8B Instant 128k

Meta’s smallest 8B on Groq LPU. 1,800 TPS — ultra-cheap inference for high-volume chat, classification, and routing workloads.

TPS
1,800
Context
128K
Input
$0.06/M
Output
$0.10/M
Live · 99.99% SLA
Try it

NVIDIA · Nebius

REASONING

Nemotron-3-Ultra-550B-a55b

A 550B hybrid MoE (55B active) from NVIDIA on Nebius. Vision-capable, optimized for demanding multi-agent AI and complex reasoning.

TPS
59
Context
128K
Input
$1.20/M
Output
$3.60/M
Live · 99.9% SLA
Try it

NVIDIA · Nebius

VISION

Cosmos3-Super-Reasoner

NVIDIA’s 35B vision-reasoning model on Nebius — optimized for complex video/image understanding, agentic AI tasks, and high-throughput inference.

TPS
30
Context
128K
Input
$0.12/M
Output
$0.36/M
Live · 99.9% SLA
Try it

OpenBMB · Nebius

VISION

openbmb/MiniCPM-V-4.5

OpenBMB MiniCPM-V 4.5 — compact multimodal model for image, multi-image, high-FPS/single-video, OCR/PDf understanding with strong multilingual coverage.

TPS
49.5
Context
32K
Input
$0.07/M
Output
$0.13/M
Live · 99.9% SLA
Try it

Moonshot AI · Nebius

REASONING

Kimi-K2.6

Kimi K2.6 is an open-source, native multimodal agentic model built through continual pretraining on approximately 15 trillion mixed vision-text tokens.

TPS
60
Context
256K
Input
$1.14/M
Output
$4.80/M
Live · 99.9% SLA
Try it

DeepSeek · Nebius

REASONING

DeepSeek-V4-Pro

DeepSeek-V4 is designed for advanced reasoning, coding, and long-horizon agent workflows, with strong performance across the Nebius H100 cluster.

TPS
24
Context
128K
Input
$2.10/M
Output
$4.20/M
Live · 99.9% SLA
Try it

NVIDIA · Nebius

OPEN

Nemotron-3-Nano-Omni

The most open, efficient, and accurate error-modal reasoning model for agentic AI — compact 30B MoE, runs at the fastest TPS in the Nemotron lineup.

TPS
90
Context
256K
Input
$0.07/M
Output
$0.29/M
Live · 99.9% SLA
Try it

Z.ai · Nebius

REASONING

GLM-5.1

Zhipu AI’s latest flagship multimodal model with strong bilingual (Chinese–English) reasoning, long-context understanding, advanced agentic tool use.

TPS
25
Context
200K
Input
$1.68/M
Output
$5.28/M
Live · 99.9% SLA
Try it

MiniMax · Nebius

OPEN

MiniMax-M2.5

Open-source agentic coding model built for polyglot development and precise refactoring, using retrieval-thinking tools to ground outputs.

TPS
36.8
Context
192K
Input
$0.36/M
Output
$1.44/M
Live · 99.9% SLA
Try it

NVIDIA · Nebius

OPEN

Nemotron-3-Super-120b-a12b

Nemotron 3 Super is a 120B hybrid MoE model optimized for efficient multi-agent AI and complex reasoning tasks on the Nebius H100 cluster.

TPS
127
Context
128K
Input
$0.36/M
Output
$1.08/M
Live · 99.9% SLA
Try it

Alibaba · Nebius

REASONING

Qwen3.5-397B-A17B

Multimodal MoE featuring a Hybrid Mixture-of-Experts architecture, designed for state-of-the-art performance across chat, retrieval, reasoning, and tool use.

TPS
80
Context
262K
Input
$0.72/M
Output
$4.32/M
Live · 99.9% SLA
Try it

Z.ai · Nebius

REASONING

GLM-5

Zhipu AI’s latest flagship multimodal model with strong bilingual (Chinese–English) reasoning, long-context understanding, advanced agentic tool use.

TPS
47
Context
200K
Input
$1.20/M
Output
$3.84/M
Live · 99.9% SLA
Try it

DeepSeek · Nebius

OPEN

DeepSeek-V3.2

A model designed to harmonize high compute efficiency with strong reasoning and agentic tool-use performance, served on Nebius H100.

TPS
71
Context
128K
Input
$0.36/M
Output
$0.54/M
Live · 99.9% SLA
Try it

NousResearch · Nebius

REASONING

Hermes-4-405B

Hybrid reasoning model trained on verified CoT traces for strong math, coding, and step-by-step reliability. Frontier-tier open-weight.

TPS
20
Context
128K
Input
$1.20/M
Output
$3.60/M
Live · 99.9% SLA
Try it

NousResearch · Nebius

REASONING

Hermes-4-70B

Compact version of Hermes-4 delivering high-quality reasoning, coding, and tool use with lower inference cost than its 405B sibling.

TPS
20
Context
128K
Input
$0.16/M
Output
$0.48/M
Live · 99.9% SLA
Try it

OpenAI · Nebius

OPEN

gpt-oss-120b

Open-weight agentic model with configurable reasoning, full CoT visibility, strong tool use, and free-deployment support on Nebius.

TPS
40
Context
128K
Input
$0.18/M
Output
$0.72/M
Live · 99.9% SLA
Try it

Prime Intellect · Nebius

REASONING

INTELLECT-3

A 100B-plus parameter Mixture-of-Experts model from Prime Intellect, fine-tuned with large-scale RL to deliver top-tier math, code, science, and reasoning.

TPS
35
Context
128K
Input
$0.24/M
Output
$1.20/M
Live · 99.9% SLA
Try it

Alibaba · Nebius

OPEN

Qwen3-235B-A22B-Instruct-2507

Balanced Qwen3 flagship tuned for strong general reasoning, chat quality, and tool use at mid-size active cost.

TPS
27
Context
262K
Input
$0.24/M
Output
$0.72/M
Live · 99.9% SLA
Try it

Alibaba · Nebius

OPEN

Qwen3-30B-A3B-Instruct-2507

Versatile 30B instruct model optimized for high-quality chat, reasoning, and coding at low cost.

TPS
70
Context
32K
Input
$0.12/M
Output
$0.36/M
Live · 99.9% SLA
Try it

Alibaba · Nebius

EMBED

Qwen3-Embedding-8B

Qwen embedding model optimized for high-precision dense retrieval with multilingual coverage (100+ languages).

Dim
4096
Lang
100+
Input
$0.01/M
Output
Live · 99.9% SLA
Try it

Alibaba · Nebius

REASONING

Qwen3-Next-80B-A3B-Thinking

Qwen’s “thinking-optimized” 80B model designed for sustained multi-step reasoning, structured deliberation, and high-precision multi-domain reasoning.

TPS
85
Context
262K
Input
$0.18/M
Output
$1.44/M
Live · 99.9% SLA
Try it

Alibaba · Nebius

OPEN

Qwen3-32B

General model offering strong multilingual reasoning, coding, and long-context performance at mid scale.

TPS
23
Context
32K
Input
$0.12/M
Output
$0.36/M
Live · 99.9% SLA
Try it

Google · Nebius

OPEN

Gemma-3-27b-it

Google’s mid-size model optimized for high-quality instruction following, coding, and multilingual performance.

TPS
20
Context
32K
Input
$0.12/M
Output
$0.36/M
Live · 99.9% SLA
Try it

NVIDIA · Nebius

REASONING

Llama-3.1-Nemotron-Ultra-253B-v1

NVIDIA-tuned Llama variant built for high-efficiency reasoning, safety, and enterprise-grade performance.

TPS
25
Context
128K
Input
$0.72/M
Output
$2.16/M
Live · 99.9% SLA
Try it

NVIDIA · Nebius

OPEN

Nemotron-3-Nano-30B-A3B

Compact MoE model optimized for efficient reasoning, chat, and coding with strong multilingual support and long-context RAG/agent workflows.

TPS
60
Context
256K
Input
$0.07/M
Output
$0.29/M
Live · 99.9% SLA
Try it

Alibaba · Nebius

VISION

Qwen2.5-VL-72B-Instruct

High-end multimodal model delivering strong vision-language reasoning with long-context support.

TPS
20
Context
128K
Input
$0.30/M
Output
$0.90/M
Live · 99.9% SLA
Try it

Meta · Nebius

OPEN

Llama-3.3-70B-Instruct

Refined Llama instruct model with strong reasoning, chat quality, and broad benchmark performance.

TPS
25
Context
128K
Input
$0.16/M
Output
$0.48/M
Live · 99.9% SLA
Try it

/ Compare

Flagship models, side by side.

The numbers that matter: TPS, context, and price. Find the right model for your workload.

Model Type Platform Context Max TPS Input / M Output / M
GPT OSS 20B 128k
UltraFast · MoE Groq · OpenAI 128K 1,800 $0.09 $0.36
GPT OSS Safeguard 20B
UltraFast · MoE Groq · OpenAI 128K 1,800 $0.09 $0.36
GPT OSS 120B 128k
UltraFast · MoE Groq · OpenAI 128K 1,000 $0.18 $0.72
Llama 4 Scout (17Bx16E) 128k
UltraFast · MoE Groq · Meta 128K 1,200 $0.13 $0.41
Qwen3 32B 131k
UltraFast Groq · Alibaba 131K 1,300 $0.35 $0.71
Llama 3.3 70B Versatile 128k
UltraFast Groq · Meta 128K 1,100 $0.71 $0.95
Llama 3.1 8B Instant 128k
UltraFast Groq · Meta 128K 1,800 $0.06 $0.10
Nemotron-3-Ultra-550B-a55b
Reasoning · MoE 550B (55B active) NVIDIA · Nebius 128K 59 $1.20 $3.60
Cosmos3-Super-Reasoner
Reasoning · Vision NVIDIA · Nebius 128K 30 $0.12 $0.36
openbmb/MiniCPM-V-4.5
Reasoning · Vision OpenBMB · Nebius 32K 49.5 $0.07 $0.13
Kimi-K2.6
Reasoning · multimodal agentic Moonshot AI · Nebius 256K 60 $1.14 $4.80
DeepSeek-V4-Pro
Reasoning · long-horizon agent DeepSeek · Nebius 128K 24 $2.10 $4.20
GLM-5.1
Reasoning · multimodal Z.ai · Nebius 200K 25 $1.68 $5.28
Qwen3.5-397B-A17B
Reasoning · MoE 397B (17B active) Alibaba · Nebius 262K 80 $0.72 $4.32
GLM-5
Reasoning · multimodal Z.ai · Nebius 200K 47 $1.20 $3.84
Hermes-4-405B
Reasoning · verified-CoT NousResearch · Nebius 128K 20 $1.20 $3.60
Hermes-4-70B
Reasoning NousResearch · Nebius 128K 20 $0.16 $0.48
INTELLECT-3
Reasoning · MoE 100B+ RL-tuned Prime Intellect · Nebius 128K 35 $0.24 $1.20
Qwen3-Next-80B-A3B-Thinking
Reasoning · thinking-tuned MoE Alibaba · Nebius 262K 85 $0.18 $1.44
Llama-3.1-Nemotron-Ultra-253B-v1
Reasoning · enterprise NVIDIA · Nebius 128K 25 $0.72 $2.16
Nemotron-3-Nano-Omni
Open · error-modal reasoning NVIDIA · Nebius 256K 90 $0.07 $0.29
MiniMax-M2.5
Open · agentic coding MiniMax · Nebius 192K 36.8 $0.36 $1.44
Nemotron-3-Super-120b-a12b
Open · MoE 120B (12B active) NVIDIA · Nebius 128K 127 $0.36 $1.08
DeepSeek-V3.2
Open · efficient reasoning DeepSeek · Nebius 128K 71 $0.36 $0.54
gpt-oss-120b
Open · agentic MoE OpenAI · Nebius 128K 40 $0.18 $0.72
Qwen3-235B-A22B-Instruct-2507
Open · MoE 235B (22B active) Alibaba · Nebius 262K 27 $0.24 $0.72
Qwen3-30B-A3B-Instruct-2507
Open · MoE 30B (3B active) Alibaba · Nebius 32K 70 $0.12 $0.36
Qwen3-32B
Open · general 32B Alibaba · Nebius 32K 23 $0.12 $0.36
Gemma-3-27b-it
Open · mid-size instruct Google · Nebius 32K 20 $0.12 $0.36
Nemotron-3-Nano-30B-A3B
Open · MoE 30B (3B active) NVIDIA · Nebius 256K 60 $0.07 $0.29
Llama-3.3-70B-Instruct
Open · 70B instruct Meta · Nebius 128K 25 $0.16 $0.48
Qwen2.5-VL-72B-Instruct
Vision · 72B multimodal Alibaba · Nebius 128K 20 $0.30 $0.90
Qwen3-Embedding-8B
Embeddings · 4096-dim Alibaba · Nebius 32K $0.01