Is the Llama 4 API really free?

Yes. Meta's Llama 4 Maverick is hosted free on NVIDIA's build.nvidia.com tier and exposed through BlockRun without payment, signup, or wallet. We pass NVIDIA's quota straight through.

Do I need an API key?

No. The endpoint accepts unauthenticated requests when the model is free. For paid models (Claude, GPT-5, Gemini), you connect a wallet and pay per call via x402.

NVIDIA's free tier may use prompts for service improvement — don't send PII or proprietary data. For private inference, route to paid models or run Llama yourself.

What's the request size limit?

128 KB per request body for free models (vs 5 MB for paid). Plenty for most chat workloads but not for long context dumps.

Can I use the OpenAI SDK?

Yes. BlockRun is OpenAI-compatible — set base_url to https://blockrun.ai/api/v1 and any OpenAI client just works.

Free · Llama 4 Maverick

Free Llama 4 API.
No key. No subscription.

Name: BlockRun Free Llama 4 API
Author: BlockRun

Meta's Llama 4 Maverick (17B × 128 experts MoE), 131K context. No key, no wallet, no subscription. Just call it.

Try in 10 seconds See all 6 ways Want Claude / GPT-5? Pay-per-call →

Quickstart · 10 seconds

Try it now.

No API key. No wallet. No signup. Paste this into any terminal — the response streams back from Llama hosted free on NVIDIA, routed through BlockRun.

curl

curl https://blockrun.ai/api/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "nvidia/llama-4-maverick",
    "messages": [{"role": "user", "content": "Write a haiku about open-source models"}]
  }'

See all 6 ways Full API reference →

Spotlight

Llama 4 Maverick (Free)

nvidia/llama-4-maverick

Context: 131K
Price: free
Best for: reasoning · coding

Six ways to call it

6 ways to use Llama free.

BlockRun is the access layer. Pick the surface that matches how you build — terminal, notebook, IDE, agent runtime — and the same free models work everywhere.

Python SDK

pip install blockrun-llm — or any OpenAI-compatible client

Learn more →

python

# Works with the OpenAI SDK — no key required for free models
from openai import OpenAI

client = OpenAI(
    base_url="https://blockrun.ai/api/v1",
    api_key="not-needed-for-free-models",
)

response = client.chat.completions.create(
    model="nvidia/llama-4-maverick",
    messages=[{"role": "user", "content": "Write a haiku about open-source models"}],
)
print(response.choices[0].message.content)

Franklin

Llama 4 is great for agent loops — Franklin runs it on your terminal

Learn more →

shell

# Install Franklin
curl -fsSL https://franklin.run/install | sh

# Run with this model
franklin chat --model nvidia/llama-4-maverick "Summarize the README"

cURL

no key, no wallet, paste in any terminal

Learn more →

shell

curl https://blockrun.ai/api/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "nvidia/llama-4-maverick",
    "messages": [{"role": "user", "content": "Write a haiku about open-source models"}]
  }'

ClawRouter

smart router for OpenClaw / Claude Code — auto-picks free models when possible

Learn more →

shell

# Install once
npm install -g @blockrun/clawrouter

# Then point any OpenAI-compatible client at the local proxy.
# ClawRouter routes to nvidia/llama-4-maverick (or the cheapest capable model)
# without changing your code.

Claude Code MCP

8 tools for Claude Code, Cursor & ChatGPT — call any free model from inside your editor

Learn more →

shell

# Add the BlockRun MCP server (Claude Code, Cursor, or ChatGPT desktop)
claude mcp add blockrun --transport http https://mcp.blockrun.ai/mcp

# Then call from inside the editor:
#   blockrun_chat(model="nvidia/llama-4-maverick", messages=[{role:"user", content:"…"}])

TypeScript SDK

npm install @blockrun/llm — or any OpenAI-compatible client

Learn more →

typescript

// Works with the OpenAI SDK — no key required for free models
import OpenAI from "openai";

const client = new OpenAI({
  baseURL: "https://blockrun.ai/api/v1",
  apiKey: "not-needed-for-free-models",
});

const r = await client.chat.completions.create({
  model: "nvidia/llama-4-maverick",
  messages: [{ role: "user", content: "Write a haiku about open-source models" }],
});
console.log(r.choices[0].message.content);

Trust / Defaults

We don't share
your data.

Your prompt goes to the AI provider you picked. Nothing else, nowhere else. No training, no retention beyond the request, no profile linking.

Read the privacy policy Terms →

We don't share your data: No training, no retention beyond the request. Your prompt is forwarded only to the AI provider you select.
No accounts, no KYC: Wallet in, prompt out. Pseudonymous by default — no email, no phone number, no identity documents.
Open-source SDKs, MIT: Read the code, audit the wire format, run it yourself. @blockrun/llm and blockrun-llm on npm and PyPI.

When free isn't enough

Want Claude, GPT-5,
or Gemini too?

No subscription. No monthly minimum. Pay per call in USDC via x402 — works the same endpoint, same SDK, same model IDs. Connect a wallet, top up $5, call any frontier model. No credit card.

See pay-per-call pricing Get started

FAQ

Everything you might
be wondering.

Is the Llama 4 API really free?: Yes. Meta's Llama 4 Maverick is hosted free on NVIDIA's build.nvidia.com tier and exposed through BlockRun without payment, signup, or wallet. We pass NVIDIA's quota straight through.
Do I need an API key?: No. The endpoint accepts unauthenticated requests when the model is free. For paid models (Claude, GPT-5, Gemini), you connect a wallet and pay per call via x402.
What's the catch?: NVIDIA's free tier may use prompts for service improvement — don't send PII or proprietary data. For private inference, route to paid models or run Llama yourself.
What's the request size limit?: 128 KB per request body for free models (vs 5 MB for paid). Plenty for most chat workloads but not for long context dumps.
Can I use the OpenAI SDK?: Yes. BlockRun is OpenAI-compatible — set base_url to https://blockrun.ai/api/v1 and any OpenAI client just works.

Free Llama 4 API.No key. No subscription.

Try it now.

6 ways to use Llama free.

Python SDK

Franklin

cURL

ClawRouter

Claude Code MCP

TypeScript SDK

We don't shareyour data.

Want Claude, GPT-5,or Gemini too?

Everything you mightbe wondering.

Free Llama 4 API.
No key. No subscription.

We don't share
your data.

Want Claude, GPT-5,
or Gemini too?

Everything you might
be wondering.