What's the difference between V4 Flash and V4 Pro?

V4 Flash is 284B / 13B active — about 5× faster, great for chat and summarization (MMLU-Pro 86.2). V4 Pro is the full 1.6T MoE / 49B active — slower but stronger on factual recall (SimpleQA 58 vs 34) and the hardest reasoning. Both expose 1M context.

Yes. Hosted free on NVIDIA build.nvidia.com tier, passed straight through BlockRun. No signup, no wallet.

Can I use the full 1M context for free?

Yes — but request body is capped at 128 KB on the free tier. For multi-megabyte context dumps, switch to paid models with a wallet (5 MB cap).

Is DeepSeek V4 good for code?

Yes. V4 Pro hits LiveCodeBench 93.5 and SWE-bench 80.6 — competitive with Claude Sonnet on code. V4 Flash is the cheap fast option for routine code chat.

What if NVIDIA's upstream is down?

BlockRun auto-fallbacks across the free tier (V4 Pro → V4 Flash → other healthy free models) so your call still returns 200.

Free · DeepSeek V4

Free DeepSeek V4 API.
No key. No subscription.

Name: BlockRun Free DeepSeek V4 API
Author: BlockRun

DeepSeek V4 Flash (284B / 13B active, 1M context, ~5× faster than V4 Pro) and V4 Pro (1.6T MoE, 1M context). Frontier reasoning. No key.

Try in 10 seconds See all 6 ways Want Claude / GPT-5? Pay-per-call →

Quickstart · 10 seconds

Try it now.

No API key. No wallet. No signup. Paste this into any terminal — the response streams back from DeepSeek hosted free on NVIDIA, routed through BlockRun.

curl

curl https://blockrun.ai/api/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "nvidia/deepseek-v4-flash",
    "messages": [{"role": "user", "content": "Solve: 7 friends share 23 cookies equally; how many are left over?"}]
  }'

See all 6 ways Full API reference →

Spotlight

DeepSeek V4 Flash (Free)

nvidia/deepseek-v4-flash

Context: 1M
Price: free
Best for: reasoning · coding

DeepSeek V4 Pro (Free)

nvidia/deepseek-v4-pro

Context: 1M
Price: free
Best for: reasoning · coding

Six ways to call it

6 ways to use DeepSeek free.

BlockRun is the access layer. Pick the surface that matches how you build — terminal, notebook, IDE, agent runtime — and the same free models work everywhere.

Franklin

DeepSeek V4 + Franklin = strong terminal reasoning agent, free

Learn more →

shell

# Install Franklin
curl -fsSL https://franklin.run/install | sh

# Run with this model
franklin chat --model nvidia/deepseek-v4-flash "Summarize the README"

Python SDK

pip install blockrun-llm — or any OpenAI-compatible client

Learn more →

python

# Works with the OpenAI SDK — no key required for free models
from openai import OpenAI

client = OpenAI(
    base_url="https://blockrun.ai/api/v1",
    api_key="not-needed-for-free-models",
)

response = client.chat.completions.create(
    model="nvidia/deepseek-v4-flash",
    messages=[{"role": "user", "content": "Solve: 7 friends share 23 cookies equally; how many are left over?"}],
)
print(response.choices[0].message.content)

cURL

no key, no wallet, paste in any terminal

Learn more →

shell

curl https://blockrun.ai/api/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "nvidia/deepseek-v4-flash",
    "messages": [{"role": "user", "content": "Solve: 7 friends share 23 cookies equally; how many are left over?"}]
  }'

ClawRouter

smart router for OpenClaw / Claude Code — auto-picks free models when possible

Learn more →

shell

# Install once
npm install -g @blockrun/clawrouter

# Then point any OpenAI-compatible client at the local proxy.
# ClawRouter routes to nvidia/deepseek-v4-flash (or the cheapest capable model)
# without changing your code.

Claude Code MCP

8 tools for Claude Code, Cursor & ChatGPT — call any free model from inside your editor

Learn more →

shell

# Add the BlockRun MCP server (Claude Code, Cursor, or ChatGPT desktop)
claude mcp add blockrun --transport http https://mcp.blockrun.ai/mcp

# Then call from inside the editor:
#   blockrun_chat(model="nvidia/deepseek-v4-flash", messages=[{role:"user", content:"…"}])

TypeScript SDK

npm install @blockrun/llm — or any OpenAI-compatible client

Learn more →

typescript

// Works with the OpenAI SDK — no key required for free models
import OpenAI from "openai";

const client = new OpenAI({
  baseURL: "https://blockrun.ai/api/v1",
  apiKey: "not-needed-for-free-models",
});

const r = await client.chat.completions.create({
  model: "nvidia/deepseek-v4-flash",
  messages: [{ role: "user", content: "Solve: 7 friends share 23 cookies equally; how many are left over?" }],
});
console.log(r.choices[0].message.content);

Trust / Defaults

We don't share
your data.

Your prompt goes to the AI provider you picked. Nothing else, nowhere else. No training, no retention beyond the request, no profile linking.

Read the privacy policy Terms →

We don't share your data: No training, no retention beyond the request. Your prompt is forwarded only to the AI provider you select.
No accounts, no KYC: Wallet in, prompt out. Pseudonymous by default — no email, no phone number, no identity documents.
Open-source SDKs, MIT: Read the code, audit the wire format, run it yourself. @blockrun/llm and blockrun-llm on npm and PyPI.

When free isn't enough

Want Claude, GPT-5,
or Gemini too?

No subscription. No monthly minimum. Pay per call in USDC via x402 — works the same endpoint, same SDK, same model IDs. Connect a wallet, top up $5, call any frontier model. No credit card.

See pay-per-call pricing Get started

FAQ

Everything you might
be wondering.

What's the difference between V4 Flash and V4 Pro?: V4 Flash is 284B / 13B active — about 5× faster, great for chat and summarization (MMLU-Pro 86.2). V4 Pro is the full 1.6T MoE / 49B active — slower but stronger on factual recall (SimpleQA 58 vs 34) and the hardest reasoning. Both expose 1M context.
Both are free?: Yes. Hosted free on NVIDIA build.nvidia.com tier, passed straight through BlockRun. No signup, no wallet.
Can I use the full 1M context for free?: Yes — but request body is capped at 128 KB on the free tier. For multi-megabyte context dumps, switch to paid models with a wallet (5 MB cap).
Is DeepSeek V4 good for code?: Yes. V4 Pro hits LiveCodeBench 93.5 and SWE-bench 80.6 — competitive with Claude Sonnet on code. V4 Flash is the cheap fast option for routine code chat.
What if NVIDIA's upstream is down?: BlockRun auto-fallbacks across the free tier (V4 Pro → V4 Flash → other healthy free models) so your call still returns 200.

Free DeepSeek V4 API.No key. No subscription.

Try it now.

6 ways to use DeepSeek free.

Franklin

Python SDK

cURL

ClawRouter

Claude Code MCP

TypeScript SDK

We don't shareyour data.

Want Claude, GPT-5,or Gemini too?

Everything you mightbe wondering.

Free DeepSeek V4 API.
No key. No subscription.

We don't share
your data.

Want Claude, GPT-5,
or Gemini too?

Everything you might
be wondering.