Mainnet Models

Input: $5.00/MOutput: $25.00/MContext: 1M

Claude Opus 4.8

anthropic

Most capable Claude for complex reasoning and agentic coding. 1M context, 128k output, adaptive thinking

Gemini 3.1 Pro

Input: $2.00/MOutput: $12.00/MContext: 1M

Latest Gemini with improved thinking, token efficiency, and agentic capabilities. Optimized for software engineering (requires new SDK)

Gemini 3 Flash Preview

Input: $0.50/MOutput: $3.00/MContext: 1M

Frontier-class performance with Pro-level intelligence at Flash speed and pricing. Includes thinking mode (requires new SDK)

Gemini 3.5 Flash

Input: $0.50/MOutput: $3.00/MContext: 1M

Latest-generation Flash with built-in thinking mode — frontier-class quality at Flash speed and pricing

Gemini 2.5 Pro

Input: $1.25/MOutput: $10.00/MContext: 1M

State-of-the-art for reasoning, coding, and mathematics

Gemini 2.5 Flash

Input: $0.30/MOutput: $2.50/MContext: 1M

Fast and efficient Gemini model with vision support

Gemini 3.1 Flash Lite

Input: $0.25/MOutput: $1.50/MContext: 1M

Ultra-fast and lightweight Gemini 3.1 model with thinking mode for high-throughput tasks

Gemini 2.5 Flash Lite

Input: $0.10/MOutput: $0.40/MContext: 1M

Most economical Gemini model - ultra-fast and lightweight (requires new SDK)

Input: $0.43/MOutput: $0.87/MContext: 1M

DeepSeek V4 Pro

deepseek

DeepSeek V4 flagship — 1.6T MoE / 49B active, 1M context. Strongest open-weight reasoner. Thinking mode default.

Input: $0.20/MOutput: $0.40/MContext: 1M

DeepSeek V4 Flash Chat

deepseek

Paid V4 Flash in non-thinking mode (1.6T-class quality at $0.20 in / $0.40 out). Same model as the free nvidia/deepseek-v4-flash but on a paid endpoint with higher reliability and 5MB request bodies.

Input: $0.20/MOutput: $0.40/MContext: 1M

DeepSeek V4 Flash Reasoner

deepseek

Paid V4 Flash in thinking mode for reasoning tasks. Same upstream as deepseek/deepseek-chat but with thinking enabled by default.

Input: $0.95/MOutput: $4.00/MContext: 262K

Kimi K2.7

moonshot

Moonshot's flagship multi-modal reasoning model — 256K context, image + VIDEO input, returns reasoning_content. Served via the OpenRouter credit pool (slug moonshotai/kimi-k2.7-code), failing over to direct Moonshot.

GLM-5.2

Input: $1.40/MOutput: $4.40/MContext: 1M

Z.AI's newest flagship — 1M-token context, top open-source on long-horizon coding. Verified live on Z.AI.

GLM-5.1

Input: $1.40/MOutput: $4.40/MContext: 200K

Z.AI flagship — #1 open source on SWE-Bench Pro, 8-hour autonomous execution. 200K context

GLM-5

Input: $0.60/MOutput: $1.92/MContext: 200K

Z.AI's foundation model with 200K context. Strong reasoning and agentic capabilities

GLM-5 Turbo

Input: $1.20/MOutput: $4.00/MContext: 200K

Optimized GLM-5 variant with faster inference

Input: $1.50/MOutput: $4.00/MContext: 1M

Grok 4.3

xai

xAI's Grok 4.3 reasoning model. 1M context, vision-capable, tuned for agentic workflows and instruction-following.

Input: $1.50/MOutput: $3.00/MContext: 256K

Grok Build 0.1

xai

xAI's fast agentic coding model, trained for interactive software-engineering workflows. 256K context, text + image input.

Input: $0.30/MOutput: $1.20/MContext: 205K

MiniMax M2.7

minimax

MiniMax's flagship reasoning model with recursive self-improvement. Great value for complex tasks (~60 tps)

Input: $0.30/MOutput: $1.20/MContext: 1M

MiniMax M3

minimax

MiniMax's M3 flagship — 1M context, strong reasoning + coding. Served via OpenRouter.

Nemotron 3 Nano Omni (Free)

Input: Free/MOutput: Free/MContext: 256K

NVIDIA's multimodal reasoning Nemotron Nano Omni hosted free by NVIDIA. 31B / 3.2B active MoE. Accepts text, images, video, audio. ChartQA 90.3, DocVQA 95.6, MMMU 70.8 — the only vision-capable free model in our catalog

Mistral Large 3 675B (Free)

Mistral's flagship 675B model hosted free by NVIDIA. Largest Mistral model ever released

Llama 4 Maverick (Free)

Meta's Llama 4 Maverick MoE (17B x 128 experts) hosted free by NVIDIA

Qwen3-Next 80B Instruct (Free)

Input: Free/MOutput: Free/MContext: 262K

Qwen3-Next 80B (3B active MoE) hosted free by NVIDIA. 262K context, strong reasoning + coding, fast.

Qwen3.5 122B (Free)

Qwen3.5 122B MoE (10B active) hosted free by NVIDIA. Balanced reasoning + coding, 131K context.

Mistral Nemotron (Free)

Mistral × NVIDIA Nemotron instruction model hosted free by NVIDIA. Fast (~0.2s), strong instruction following.

StepFun Step 3.7 Flash (Free)

StepFun Step 3.7 Flash hosted free by NVIDIA. Fast lightweight reasoning, 131K context.

ByteDance Seed-OSS 36B (Free)

ByteDance Seed-OSS 36B instruct hosted free by NVIDIA. Strong open-source coder, 131K context.

Nemotron Nano 9B v2 (Free)

NVIDIA Nemotron Nano 9B v2 hosted free by NVIDIA. Compact + fast (~0.7s), good for high-volume light tasks.

Nemotron Nano 12B v2 VL (Free)

NVIDIA Nemotron Nano 12B v2 Vision-Language hosted free by NVIDIA. Accepts images; compact + fast.