Intelligence yield
Same price. More intelligence per dollar.
You don’t choose between cost and capability anymore. The same budget that buys a handful of Opus calls elsewhere funds an entire production workload on Token Factory — with routing, batching, caching, and an SLA behind it.
Per-query routing
Hard reasoning goes to Opus. Lookups, summaries, and structured extraction go to Grok 3 UltraFast. You stop over-paying for tasks that don’t need a frontier model — and stop under-paying for ones that do.
One key, 60+ models
Skip the per-provider billing dashboards, the per-provider key rotation, the per-provider rate limits. One API key, one invoice, one usage chart — for every model on the catalog.
Both SDKs, no migration
The same models are reachable through /openai/v1 and /anthropic/v1. Keep your existing openai and anthropic SDK code — just point it at our base URL.
NPX auto-config
One command writes your .zshrc, your .claude/CLAUDE.md, your .codex/config.toml — and matches the model list to your plan. Idempotent, safe to re-run.
99.9% uptime SLA
Multi-region failover, automatic retries, request hedging. If a frontier provider has a regional incident, your traffic gets routed around it — you don’t see the outage.
Sub-100ms tail latency
UltraFast LPU inference for chat traffic means p99 stays under 100ms, even when the underlying provider’s tail would otherwise be 4–6× that.
Open by design
OpenAI- and Anthropic-compatible endpoints, transparent model catalogs, open-source NPX setup flow. If you ever outgrow us, your code and your data ship with you.
No markup, no surprise bills
You see the per-token cost of every model on every request. Move between them in one line. No black-box markups, no surprise overage invoices at the end of the month.