Dated log of every meaningful change to BlockRun. Newest first. Subscribe via the GitHub org or follow @BlockRunAI for daily ship notes.
Free tier revamp — self-healing health gate + refreshed model lineup
New self-healing health gate: a runtime circuit breaker now routes free requests around any free model whose upstream has died (410 end-of-life, 404 pulled, or a hung deployment), and auto-recovers it when the upstream comes back. Replaces the hand-maintained redirect list — dead free models no longer cause 60-second hangs.
Refreshed the free catalog with 7 newly-verified models: Qwen3-Next 80B Instruct (262K context), Qwen3.5 122B, Mistral Nemotron, StepFun Step 3.7 Flash, ByteDance Seed-OSS 36B, and two compact Nemotron Nano models (one vision-capable). Every visible free model is now live-verified.
Retired free models that NVIDIA pulled upstream (DeepSeek V4 Flash/Pro, Qwen3 Coder 480B, Devstral 2) are hidden and auto-rerouted to a healthy free model, so existing calls still return 200.
Free landing pages refreshed to point at live models: /free-qwen and /free-mistral now showcase healthy models, and /free-deepseek became a general free-reasoning page led by Qwen3-Next 80B.
moonshot/kimi-k2.7 — Moonshot flagship upgraded
Added moonshot/kimi-k2.7: 256K context multi-modal reasoning model with image + VIDEO input and returned reasoning_content. $0.95 in / $4.00 out per 1M.
kimi-k2.6 marked hidden but kept routable so existing integrations don't break; new traffic auto-prefers k2.7 via fallbackModel chaining.
New BlockRun Voice: ElevenLabs text-to-speech at /v1/audio/speech. Flash v2.5 (~75ms latency, for real-time voice agents) and Turbo v2.5 at $0.05/1k characters; Multilingual v2 and Eleven v3 (maximum expressiveness) at $0.10/1k characters. Pay per call in USDC via x402 — no ElevenLabs subscription.
Price is billed per input character and quoted up front in the 402, then recomputed from the request body on the paid call so it can't be under-paid. Synchronous — returns a hosted MP3 URL. Settlement fires only after successful synthesis; a failed upstream call is never charged.
Also added sound effects at /v1/audio/sound-effects ($0.0525/generation, up to 22s) and a free voice-discovery endpoint /v1/audio/voices.
Listed on the marketplace under the new Voice & Speech category with a dedicated /marketplace/elevenlabs page.
OpenAI Sora 2 added to video generation (via Azure AI Foundry)
Added azure/sora-2 to the video catalog — OpenAI's Sora 2, served through Azure AI Foundry. Realistic text-to-video at 720p (portrait or landscape) with synchronized audio, in 4, 8, or 12-second clips.
Pricing: flat $0.10/sec + 5% margin — a 4s clip with audio runs ~$0.42, undercutting our Seedance 2.0 tiers per clip. Pay-per-second in USDC via x402, no OpenAI account or Azure subscription required.
Routes through the existing async /v1/videos/generations submit→poll pipeline under the new 'azure' provider prefix; settlement fires on the first completed poll, so a failed or never-polled job is never charged. Reuses BlockRun's existing Azure OpenAI resource.
Sora's content-download endpoint is authenticated, so the GCS mirror step now replays the api-key header when backing up the finished MP4.
DeepSeek V4 Pro added to paid catalog — 75% launch promo
Added deepseek/deepseek-v4-pro to the paid catalog against api.deepseek.com — 1.6T MoE / 49B active, 1M context, 65K max output. Launch-promo pricing $0.50 in / $1.00 out per 1M tokens through 2026-05-31 (75% off list); reverts to $2.00 / $4.00 after.
V4 Flash is NOT exposed as a separate paid SKU — the free nvidia/deepseek-v4-flash already covers that need, and a paid duplicate would just confuse callers. Customers who need paid-tier V4 Flash (for production reliability or 5MB request bodies) reach it via the legacy deepseek/deepseek-chat / deepseek/deepseek-reasoner aliases, which DeepSeek upstream serves as V4 Flash non-thinking / thinking modes.
Backward compat: deepseek/deepseek-chat and deepseek/deepseek-reasoner keep working — relabeled to 'V4 Flash Chat' and 'V4 Flash Reasoner' to reflect what's actually served upstream, context bumped from 128K to 1M, price dropped to $0.20 in / $0.40 out (down from $0.28 / $0.42). Existing integrations need no changes.
Routing: bare deepseek-v4-pro resolves to the new paid SKU. Bare deepseek-v4-flash continues to resolve to the free nvidia/deepseek-v4-flash. The nvidia/deepseek-v4-pro → nvidia/deepseek-v4-flash redirect stays in place since NVIDIA's free V4 Pro deployment is still hung.
NVIDIA upstream sweep — three more models auto-redirected, GPT-OSS re-enabled
Direct probe of NVIDIA NIM revealed three more models with broken upstream: nvidia/nemotron-ultra-253b returns HTTP 404 (NVIDIA retired the checkpoint), nvidia/deepseek-v3.2 hangs the connection (60s, zero bytes — same fail mode as V4 Pro), and nvidia/glm-4.7 also hangs the connection.
All three are now auto-redirected via MODEL_REDIRECTS to working free alternatives: nemotron-ultra-253b → qwen3-next-80b-thinking; deepseek-v3.2 → deepseek-v4-flash; glm-4.7 → qwen3-coder-480b.
Targets are spread across qwen3-coder, qwen3-next-thinking, and v4-flash to avoid funneling all the load onto qwen3-next-thinking (which is already getting 429 capacity throttles from NVIDIA at peak hours).
Free-tier upstreams are unstable — catalog entries stay available: true (hidden: true), so each model self-heals when NVIDIA re-deploys; clearing the redirect line is the only step needed to re-enable.
nvidia/gpt-oss-120b and nvidia/gpt-oss-20b re-enabled (were available: false since 2026-04-28). NVIDIA upstream is healthy on direct probe; the privacy concern that drove the original retirement (NVIDIA's free tier may use prompts for service improvement) is preserved by hidden: true so the public /v1/models browser still doesn't list them — but legacy ClawRouter callers using the full ID now get a 200 instead of a 400.
DeepSeek V4 Pro delisted — auto-redirected to V4 Flash
nvidia/deepseek-v4-pro is delisted from the public catalog and the model picker. Direct NVIDIA NIM probe (2026-04-30) confirms the upstream deployment is hung — V4 Pro is published in NVIDIA's catalog but every request hangs the connection indefinitely (zero bytes received in 300s). V4 Flash works fine on the same NIM endpoint, so it's an NVIDIA-side V4-Pro-specific issue.
Behavior change: calls to nvidia/deepseek-v4-pro (and the bare deepseek-v4-pro alias) now redirect deterministically to nvidia/deepseek-v4-flash via MODEL_REDIRECTS — same V4 family, 1M context, free, healthy. Replaces the previous fallback-cascade behavior which non-deterministically landed on V4 Flash, qwen3-next-80b-thinking, or zai/glm-5.1.
nvidia/deepseek-v3.2 fallback retargeted from V4 Pro to V4 Flash so V3.2 callers don't inherit the cascade either.
V4 Flash + Nemotron Omni (added 2026-04-29) remain healthy and free.
We'll re-list V4 Pro when a smoke test (single non-streaming request) returns inside 30s.
Free-tier catalog refresh — DeepSeek V4 Flash + first vision-capable free model
Added nvidia/deepseek-v4-pro: 1.6T MoE / 49B active, 1M context. Top open reasoning (MMLU-Pro 87.5, GPQA 90.1, SWE-bench 80.6, LiveCodeBench 93.5). Note (2026-04-30): held back from public availability — see today's entry above.
Added nvidia/deepseek-v4-flash: 284B / 13B active MoE, 1M context. ~5x faster than V4 Pro for chat/summarization. Caveat: weaker factual recall (SimpleQA 34% vs Pro's 58%) — pick V4 Pro for fact-heavy agent loops once it's re-enabled.
Added nvidia/nemotron-3-nano-omni-30b-a3b-reasoning: first vision-capable model in our free tier. ChartQA 90.3, DocVQA 95.6, MMMU 70.8. Accepts text, images, video (≤2 min), audio (≤1 hr). 256K context.
Skipped after benchmark review: qwen3-next-80b-a3b-instruct (loses reasoning on 18/20 benchmarks vs -thinking variant); qwen3.5-122b-a10b (redundant once Omni covers vision); nemotron-nano-3-30b-a3b (strictly worse than our 49B Super on text reasoning); mistral-medium-3.5-128b (released today, no benchmarks published yet).
Image generation API switched to hybrid sync/async — fast models keep returning {data:[…]} inline; slow models (gpt-image-2, grok-imagine-image-pro) return {id, poll_url} 202. Stops Cloudflare 524 timeouts on long generations.
@blockrun/llm SDK 1.12.0 ships transparent polling for the new async image flow — public API unchanged.
MCP error classifier now uses `instanceof PaymentError` instead of substring match on the error message (no more false 'fund your wallet' messages on 524s).
Privacy policy + terms rewritten with explicit 'we do not share your data' stance.
NVIDIA gpt-oss-120b/20b free models pulled from public catalog (their free tier may train on prompts).
Homepage: TrustStrip + FAQ added, hero copy fixed (dropped 'Talk to us' from product list, retired YOPO eyebrow), Franklin section rewritten to disambiguate 'wallet', partners bar shows brand names alongside icons, 'FREE' pill added to models section.