UltraFast
GPT OSS 20B 128k
OpenAI’s open-weight 20B MoE (3.6B active, 32 experts) on Groq LPU silicon. 1,800 TPS — MXFP4 quantized, adjustable CoT effort, native tool use.
- TPS
- 1,800
- Context
- 128K
- Input
- $0.09/M
- Output
- $0.36/M
Groq UltraFast (7) on LPU silicon, plus the full Nebius Token Factory catalog (26) on H100 — reasoning, vision, embeddings, and open-weight chat — all through a single OpenAI-compatible interface.
White documentation cards for every model. UltraFast models on LPU silicon carry the orange accent. Groq UltraFast (7) + Nebius Token Factory catalog (26), all with 20% standard markup on the base price.
UltraFast
OpenAI’s open-weight 20B MoE (3.6B active, 32 experts) on Groq LPU silicon. 1,800 TPS — MXFP4 quantized, adjustable CoT effort, native tool use.
UltraFast
Safety-tuned sibling of GPT OSS 20B on Groq LPU. 1,800 TPS — content classification, jailbreak detection, and policy enforcement at sub-second latency.
UltraFast
OpenAI’s flagship 120B open-weight MoE (5.1B active, 128 experts) on Groq LPU. 1,000 TPS — near-frontier reasoning with adjustable CoT effort, Apache 2.0.
UltraFast
Meta’s MoE Llama 4 Scout (17B×16E experts) on Groq LPU. 1,200 TPS — 128K context, tuned for low-latency multi-turn chat and native function calling.
UltraFast
Alibaba’s Qwen3 32B dense (hybrid thinking mode) on Groq LPU. 1,300 TPS — 131K context, 100+ languages, strong tool calling.
UltraFast
Meta’s 70B production workhorse on Groq LPU. 1,100 TPS — high-throughput multi-turn chat and tool-calling agents at sub-second latency.
UltraFast
Meta’s smallest 8B on Groq LPU. 1,800 TPS — ultra-cheap inference for high-volume chat, classification, and routing workloads.
REASONING
A 550B hybrid MoE (55B active) from NVIDIA on Nebius. Vision-capable, optimized for demanding multi-agent AI and complex reasoning.
VISION
NVIDIA’s 35B vision-reasoning model on Nebius — optimized for complex video/image understanding, agentic AI tasks, and high-throughput inference.
VISION
OpenBMB MiniCPM-V 4.5 — compact multimodal model for image, multi-image, high-FPS/single-video, OCR/PDf understanding with strong multilingual coverage.
REASONING
Kimi K2.6 is an open-source, native multimodal agentic model built through continual pretraining on approximately 15 trillion mixed vision-text tokens.
REASONING
DeepSeek-V4 is designed for advanced reasoning, coding, and long-horizon agent workflows, with strong performance across the Nebius H100 cluster.
OPEN
The most open, efficient, and accurate error-modal reasoning model for agentic AI — compact 30B MoE, runs at the fastest TPS in the Nemotron lineup.
REASONING
Zhipu AI’s latest flagship multimodal model with strong bilingual (Chinese–English) reasoning, long-context understanding, advanced agentic tool use.
OPEN
Open-source agentic coding model built for polyglot development and precise refactoring, using retrieval-thinking tools to ground outputs.
OPEN
Nemotron 3 Super is a 120B hybrid MoE model optimized for efficient multi-agent AI and complex reasoning tasks on the Nebius H100 cluster.
REASONING
Multimodal MoE featuring a Hybrid Mixture-of-Experts architecture, designed for state-of-the-art performance across chat, retrieval, reasoning, and tool use.
REASONING
Zhipu AI’s latest flagship multimodal model with strong bilingual (Chinese–English) reasoning, long-context understanding, advanced agentic tool use.
OPEN
A model designed to harmonize high compute efficiency with strong reasoning and agentic tool-use performance, served on Nebius H100.
REASONING
Hybrid reasoning model trained on verified CoT traces for strong math, coding, and step-by-step reliability. Frontier-tier open-weight.
REASONING
Compact version of Hermes-4 delivering high-quality reasoning, coding, and tool use with lower inference cost than its 405B sibling.
OPEN
Open-weight agentic model with configurable reasoning, full CoT visibility, strong tool use, and free-deployment support on Nebius.
REASONING
A 100B-plus parameter Mixture-of-Experts model from Prime Intellect, fine-tuned with large-scale RL to deliver top-tier math, code, science, and reasoning.
OPEN
Balanced Qwen3 flagship tuned for strong general reasoning, chat quality, and tool use at mid-size active cost.
OPEN
Versatile 30B instruct model optimized for high-quality chat, reasoning, and coding at low cost.
EMBED
Qwen embedding model optimized for high-precision dense retrieval with multilingual coverage (100+ languages).
REASONING
Qwen’s “thinking-optimized” 80B model designed for sustained multi-step reasoning, structured deliberation, and high-precision multi-domain reasoning.
OPEN
General model offering strong multilingual reasoning, coding, and long-context performance at mid scale.
OPEN
Google’s mid-size model optimized for high-quality instruction following, coding, and multilingual performance.
REASONING
NVIDIA-tuned Llama variant built for high-efficiency reasoning, safety, and enterprise-grade performance.
OPEN
Compact MoE model optimized for efficient reasoning, chat, and coding with strong multilingual support and long-context RAG/agent workflows.
VISION
High-end multimodal model delivering strong vision-language reasoning with long-context support.
OPEN
Refined Llama instruct model with strong reasoning, chat quality, and broad benchmark performance.
The numbers that matter: TPS, context, and price. Find the right model for your workload.
| Model | Type | Platform | Context | Max TPS | Input / M | Output / M |
|---|---|---|---|---|---|---|
|
GPT OSS 20B 128k
|
UltraFast · MoE | Groq · OpenAI | 128K | 1,800 | $0.09 | $0.36 |
|
GPT OSS Safeguard 20B
|
UltraFast · MoE | Groq · OpenAI | 128K | 1,800 | $0.09 | $0.36 |
|
GPT OSS 120B 128k
|
UltraFast · MoE | Groq · OpenAI | 128K | 1,000 | $0.18 | $0.72 |
|
Llama 4 Scout (17Bx16E) 128k
|
UltraFast · MoE | Groq · Meta | 128K | 1,200 | $0.13 | $0.41 |
|
Qwen3 32B 131k
|
UltraFast | Groq · Alibaba | 131K | 1,300 | $0.35 | $0.71 |
|
Llama 3.3 70B Versatile 128k
|
UltraFast | Groq · Meta | 128K | 1,100 | $0.71 | $0.95 |
|
Llama 3.1 8B Instant 128k
|
UltraFast | Groq · Meta | 128K | 1,800 | $0.06 | $0.10 |
|
Nemotron-3-Ultra-550B-a55b
|
Reasoning · MoE 550B (55B active) | NVIDIA · Nebius | 128K | 59 | $1.20 | $3.60 |
|
Cosmos3-Super-Reasoner
|
Reasoning · Vision | NVIDIA · Nebius | 128K | 30 | $0.12 | $0.36 |
|
openbmb/MiniCPM-V-4.5
|
Reasoning · Vision | OpenBMB · Nebius | 32K | 49.5 | $0.07 | $0.13 |
|
Kimi-K2.6
|
Reasoning · multimodal agentic | Moonshot AI · Nebius | 256K | 60 | $1.14 | $4.80 |
|
DeepSeek-V4-Pro
|
Reasoning · long-horizon agent | DeepSeek · Nebius | 128K | 24 | $2.10 | $4.20 |
|
GLM-5.1
|
Reasoning · multimodal | Z.ai · Nebius | 200K | 25 | $1.68 | $5.28 |
|
Qwen3.5-397B-A17B
|
Reasoning · MoE 397B (17B active) | Alibaba · Nebius | 262K | 80 | $0.72 | $4.32 |
|
GLM-5
|
Reasoning · multimodal | Z.ai · Nebius | 200K | 47 | $1.20 | $3.84 |
|
Hermes-4-405B
|
Reasoning · verified-CoT | NousResearch · Nebius | 128K | 20 | $1.20 | $3.60 |
|
Hermes-4-70B
|
Reasoning | NousResearch · Nebius | 128K | 20 | $0.16 | $0.48 |
|
INTELLECT-3
|
Reasoning · MoE 100B+ RL-tuned | Prime Intellect · Nebius | 128K | 35 | $0.24 | $1.20 |
|
Qwen3-Next-80B-A3B-Thinking
|
Reasoning · thinking-tuned MoE | Alibaba · Nebius | 262K | 85 | $0.18 | $1.44 |
|
Llama-3.1-Nemotron-Ultra-253B-v1
|
Reasoning · enterprise | NVIDIA · Nebius | 128K | 25 | $0.72 | $2.16 |
|
Nemotron-3-Nano-Omni
|
Open · error-modal reasoning | NVIDIA · Nebius | 256K | 90 | $0.07 | $0.29 |
|
MiniMax-M2.5
|
Open · agentic coding | MiniMax · Nebius | 192K | 36.8 | $0.36 | $1.44 |
|
Nemotron-3-Super-120b-a12b
|
Open · MoE 120B (12B active) | NVIDIA · Nebius | 128K | 127 | $0.36 | $1.08 |
|
DeepSeek-V3.2
|
Open · efficient reasoning | DeepSeek · Nebius | 128K | 71 | $0.36 | $0.54 |
|
gpt-oss-120b
|
Open · agentic MoE | OpenAI · Nebius | 128K | 40 | $0.18 | $0.72 |
|
Qwen3-235B-A22B-Instruct-2507
|
Open · MoE 235B (22B active) | Alibaba · Nebius | 262K | 27 | $0.24 | $0.72 |
|
Qwen3-30B-A3B-Instruct-2507
|
Open · MoE 30B (3B active) | Alibaba · Nebius | 32K | 70 | $0.12 | $0.36 |
|
Qwen3-32B
|
Open · general 32B | Alibaba · Nebius | 32K | 23 | $0.12 | $0.36 |
|
Gemma-3-27b-it
|
Open · mid-size instruct | Google · Nebius | 32K | 20 | $0.12 | $0.36 |
|
Nemotron-3-Nano-30B-A3B
|
Open · MoE 30B (3B active) | NVIDIA · Nebius | 256K | 60 | $0.07 | $0.29 |
|
Llama-3.3-70B-Instruct
|
Open · 70B instruct | Meta · Nebius | 128K | 25 | $0.16 | $0.48 |
|
Qwen2.5-VL-72B-Instruct
|
Vision · 72B multimodal | Alibaba · Nebius | 128K | 20 | $0.30 | $0.90 |
|
Qwen3-Embedding-8B
|
Embeddings · 4096-dim | Alibaba · Nebius | 32K | — | $0.01 | — |