39 models
GPT-5.5
openaiNewest OpenAI flagship — first fully retrained base since GPT-4.5. 1M context, 128K output, native agent + computer use
GPT-5.4
openaiMost capable and efficient frontier model with 1M context, native computer use, and thinking mode
GPT-5.4 Pro
openaiPremium GPT-5.4 with maximum compute for the hardest problems
GPT-5.3
openaiHigh intelligence with medium speed. Multimodal with vision, function calling, and structured outputs
GPT-5.2
openaiFrontier model with 400K context and adaptive reasoning
GPT-5.4 Mini
openaiStrongest mini model for coding, computer use, and subagents with GPT-5.4 capabilities
GPT-5 Mini
openaiCost-optimized reasoning and chat
GPT-5.4 Nano
openaiFastest and most affordable GPT-5.4 model for high-throughput tasks
GPT-5.2 Pro
openaiUses more compute for consistently better answers
GPT-5.3 Codex
openaiIndustry-leading agentic coding model. 400K context, reasoning, tool use, and complex execution
o1
openaiAdvanced reasoning model for complex tasks
o1-mini
openaiFast reasoning model optimized for STEM
o3
openaiLatest reasoning model with improved performance
o3-mini
openaiEfficient reasoning model for STEM tasks
Claude Haiku 4.5
anthropicFastest and most efficient Claude, near-frontier intelligence
Claude Sonnet 4.6
anthropicBest balance of intelligence, speed, and cost
Claude Opus 4.5
anthropicLatest Anthropic flagship with enhanced reasoning and creativity
Claude Opus 4.7
anthropicMost capable Claude for complex reasoning and agentic coding. 1M context, 128k output, adaptive thinking
Gemini 3.1 Pro
googleLatest Gemini with improved thinking, token efficiency, and agentic capabilities. Optimized for software engineering (requires new SDK)
Gemini 3 Pro Preview
googleFlagship frontier model for high-precision multimodal reasoning
Gemini 3 Flash Preview
googleFrontier-class performance with Pro-level intelligence at Flash speed and pricing. Includes thinking mode (requires new SDK)
Gemini 2.5 Pro
googleState-of-the-art for reasoning, coding, and mathematics
Gemini 2.5 Flash
googleFast and efficient Gemini model with vision support
Gemini 3.1 Flash Lite
googleUltra-fast and lightweight Gemini 3.1 model with thinking mode for high-throughput tasks
Gemini 2.5 Flash Lite
googleMost economical Gemini model - ultra-fast and lightweight (requires new SDK)
DeepSeek V4 Pro
deepseek75% Off until 2026-05-31DeepSeek V4 flagship — 1.6T MoE / 49B active, 1M context. Strongest open-weight reasoner. Thinking mode default.
DeepSeek V4 Flash Chat
deepseekPaid V4 Flash in non-thinking mode (1.6T-class quality at $0.20 in / $0.40 out). Same model as the free nvidia/deepseek-v4-flash but on a paid endpoint with higher reliability and 5MB request bodies.
DeepSeek V4 Flash Reasoner
deepseekPaid V4 Flash in thinking mode for reasoning tasks. Same upstream as deepseek/deepseek-chat but with thinking enabled by default.
Kimi K2.6
moonshotMoonshot's flagship multi-modal reasoning model. 256K context, vision + text, returns reasoning_content. Upstream: $0.95 in / $4.00 out per 1M.
GLM-5.1
zaiLimited PromotionZ.AI's latest flagship — #1 open source on SWE-Bench Pro, 8-hour autonomous execution. 200K context
GLM-5
zaiLimited PromotionZ.AI's foundation model with 200K context. Strong reasoning and agentic capabilities
GLM-5 Turbo
zaiLimited PromotionOptimized GLM-5 variant with faster inference
MiniMax M2.7
minimaxMiniMax's flagship reasoning model with recursive self-improvement. Great value for complex tasks (~60 tps)
DeepSeek V4 Flash (Free)
nvidiaFreeDeepSeek V4 Flash hosted free by NVIDIA. 284B / 13B active MoE, 1M context, ~5x faster than V4 Pro. Best for chat, summarization, light reasoning. Weaker factual recall — pick V4 Pro for fact-heavy agentic loops
Nemotron 3 Nano Omni (Free)
nvidiaFreeNVIDIA's multimodal reasoning Nemotron Nano Omni hosted free by NVIDIA. 31B / 3.2B active MoE. Accepts text, images, video, audio. ChartQA 90.3, DocVQA 95.6, MMMU 70.8 — the only vision-capable free model in our catalog
Qwen3 Coder 480B (Free)
nvidiaFreeQwen's 480B MoE coding model (35B active) hosted by NVIDIA. Optimized for code generation
Llama 4 Maverick (Free)
nvidiaFreeMeta's Llama 4 Maverick MoE (17B x 128 experts) hosted free by NVIDIA
Qwen3-Next 80B Thinking (Free)
nvidiaFreeQwen3-Next 80B MoE (3B active params) with thinking mode. Fastest top-tier reasoning on the free tier — 116 tok/s on our benchmark
Mistral Small 4 119B (Free)
nvidiaFreeMistral Small 4 (119B) hosted free by NVIDIA. 114 tok/s — fastest free chat model we ship