BlockRun

Changelog

Dated log of every meaningful change to BlockRun. Newest first. Subscribe via the GitHub org or follow @BlockRunAI for daily ship notes.

DeepSeek V4 Pro added to paid catalog — 75% launch promo

  • Added deepseek/deepseek-v4-pro to the paid catalog against api.deepseek.com — 1.6T MoE / 49B active, 1M context, 65K max output. Launch-promo pricing $0.50 in / $1.00 out per 1M tokens through 2026-05-31 (75% off list); reverts to $2.00 / $4.00 after.
  • V4 Flash is NOT exposed as a separate paid SKU — the free nvidia/deepseek-v4-flash already covers that need, and a paid duplicate would just confuse callers. Customers who need paid-tier V4 Flash (for production reliability or 5MB request bodies) reach it via the legacy deepseek/deepseek-chat / deepseek/deepseek-reasoner aliases, which DeepSeek upstream serves as V4 Flash non-thinking / thinking modes.
  • Backward compat: deepseek/deepseek-chat and deepseek/deepseek-reasoner keep working — relabeled to 'V4 Flash Chat' and 'V4 Flash Reasoner' to reflect what's actually served upstream, context bumped from 128K to 1M, price dropped to $0.20 in / $0.40 out (down from $0.28 / $0.42). Existing integrations need no changes.
  • Routing: bare deepseek-v4-pro resolves to the new paid SKU. Bare deepseek-v4-flash continues to resolve to the free nvidia/deepseek-v4-flash. The nvidia/deepseek-v4-pro → nvidia/deepseek-v4-flash redirect stays in place since NVIDIA's free V4 Pro deployment is still hung.

NVIDIA upstream sweep — three more models auto-redirected, GPT-OSS re-enabled

  • Direct probe of NVIDIA NIM revealed three more models with broken upstream: nvidia/nemotron-ultra-253b returns HTTP 404 (NVIDIA retired the checkpoint), nvidia/deepseek-v3.2 hangs the connection (60s, zero bytes — same fail mode as V4 Pro), and nvidia/glm-4.7 also hangs the connection.
  • All three are now auto-redirected via MODEL_REDIRECTS to working free alternatives: nemotron-ultra-253b → qwen3-next-80b-thinking; deepseek-v3.2 → deepseek-v4-flash; glm-4.7 → qwen3-coder-480b.
  • Targets are spread across qwen3-coder, qwen3-next-thinking, and v4-flash to avoid funneling all the load onto qwen3-next-thinking (which is already getting 429 capacity throttles from NVIDIA at peak hours).
  • Free-tier upstreams are unstable — catalog entries stay available: true (hidden: true), so each model self-heals when NVIDIA re-deploys; clearing the redirect line is the only step needed to re-enable.
  • nvidia/gpt-oss-120b and nvidia/gpt-oss-20b re-enabled (were available: false since 2026-04-28). NVIDIA upstream is healthy on direct probe; the privacy concern that drove the original retirement (NVIDIA's free tier may use prompts for service improvement) is preserved by hidden: true so the public /v1/models browser still doesn't list them — but legacy ClawRouter callers using the full ID now get a 200 instead of a 400.

DeepSeek V4 Pro delisted — auto-redirected to V4 Flash

  • nvidia/deepseek-v4-pro is delisted from the public catalog and the model picker. Direct NVIDIA NIM probe (2026-04-30) confirms the upstream deployment is hung — V4 Pro is published in NVIDIA's catalog but every request hangs the connection indefinitely (zero bytes received in 300s). V4 Flash works fine on the same NIM endpoint, so it's an NVIDIA-side V4-Pro-specific issue.
  • Behavior change: calls to nvidia/deepseek-v4-pro (and the bare deepseek-v4-pro alias) now redirect deterministically to nvidia/deepseek-v4-flash via MODEL_REDIRECTS — same V4 family, 1M context, free, healthy. Replaces the previous fallback-cascade behavior which non-deterministically landed on V4 Flash, qwen3-next-80b-thinking, or zai/glm-5.1.
  • nvidia/deepseek-v3.2 fallback retargeted from V4 Pro to V4 Flash so V3.2 callers don't inherit the cascade either.
  • V4 Flash + Nemotron Omni (added 2026-04-29) remain healthy and free.
  • We'll re-list V4 Pro when a smoke test (single non-streaming request) returns inside 30s.

Free-tier catalog refresh — DeepSeek V4 Flash + first vision-capable free model

  • Added nvidia/deepseek-v4-pro: 1.6T MoE / 49B active, 1M context. Top open reasoning (MMLU-Pro 87.5, GPQA 90.1, SWE-bench 80.6, LiveCodeBench 93.5). Note (2026-04-30): held back from public availability — see today's entry above.
  • Added nvidia/deepseek-v4-flash: 284B / 13B active MoE, 1M context. ~5x faster than V4 Pro for chat/summarization. Caveat: weaker factual recall (SimpleQA 34% vs Pro's 58%) — pick V4 Pro for fact-heavy agent loops once it's re-enabled.
  • Added nvidia/nemotron-3-nano-omni-30b-a3b-reasoning: first vision-capable model in our free tier. ChartQA 90.3, DocVQA 95.6, MMMU 70.8. Accepts text, images, video (≤2 min), audio (≤1 hr). 256K context.
  • Skipped after benchmark review: qwen3-next-80b-a3b-instruct (loses reasoning on 18/20 benchmarks vs -thinking variant); qwen3.5-122b-a10b (redundant once Omni covers vision); nemotron-nano-3-30b-a3b (strictly worse than our 49B Super on text reasoning); mistral-medium-3.5-128b (released today, no benchmarks published yet).
  • Total free-tier model count now 15 (was 12).

Async image flow + free-tier rate limit removed + GEO content sweep

  • Image generation API switched to hybrid sync/async — fast models keep returning {data:[…]} inline; slow models (gpt-image-2, grok-imagine-image-pro) return {id, poll_url} 202. Stops Cloudflare 524 timeouts on long generations.
  • @blockrun/llm SDK 1.12.0 ships transparent polling for the new async image flow — public API unchanged.
  • MCP error classifier now uses `instanceof PaymentError` instead of substring match on the error message (no more false 'fund your wallet' messages on 524s).
  • Privacy policy + terms rewritten with explicit 'we do not share your data' stance.
  • NVIDIA gpt-oss-120b/20b free models pulled from public catalog (their free tier may train on prompts).
  • Homepage: TrustStrip + FAQ added, hero copy fixed (dropped 'Talk to us' from product list, retired YOPO eyebrow), Franklin section rewritten to disambiguate 'wallet', partners bar shows brand names alongside icons, 'FREE' pill added to models section.
  • GEO content: /what-is-x402, /what-is-pay-per-call-ai, /glossary, /vs-openrouter, /vs-portkey, /vs-helicone, /changelog (this page). JSON-LD added to /about, /enterprise, /products, /marketplace, /get-started.

Free-tier rescue when paid payment fails

  • Wallets with insufficient USDC are now downgraded to the free fallback model instead of receiving a hard 402.
  • Z.AI GLM-5.1 added as tertiary free fallback (200K context, zero upstream cost via partnership).

openai/gpt-5.5 — flagship released

  • Added openai/gpt-5.5 to the model catalog as featured. Replaces gpt-5.4 in the homepage table.
  • Awesome-blockrun submodule bumped with gpt-5.5 sweep.

Image edit timeout bumped + free fallback updated

  • image2image (edit) timeout raised to 180s for gpt-image-2 at >=1536px.
  • Default free-fallback model updated to nvidia/qwen3-next-80b-a3b-thinking (116 tok/s with thinking mode).

moonshot/kimi-k2.6 — Moonshot flagship added

  • Added moonshot/kimi-k2.6: 256K context multi-modal reasoning model with vision and returned reasoning_content. $0.95 in / $4.00 out per 1M.
  • kimi-k2.5 marked hidden but kept routable so existing integrations don't break; new traffic auto-prefers k2.6 via fallbackModel chaining.

Async video generation

  • Video generation switched to async submit + polled settlement. Removed the 85s upper bound on video duration that the sync flow imposed.
  • Same x-payment header binds caller to job ID across the POST→GET cycle.

On-chain revenue reconciliation

  • Public /metrics page now reconciles cumulative revenue against on-chain settlement transactions on Base.
  • Cumulative wallet count and call count never shrink due to rolling-window aggregation.

Multimodal made visible on homepage

  • ChatGPT Images 2.0 (gpt-image-2), Seedance video, and MiniMax music surfaced in the homepage models table.
  • Pricing page gained first-class Image/Video/Music filters with their own SEO metadata.