Free GPT-OSS API.
No key. No subscription.
OpenAI's GPT-OSS — the only open-weights models OpenAI ever released. 120B and 20B variants, 128K context. Hosted free on NVIDIA, called through BlockRun.
Try it now.
No API key. No wallet. No signup. Paste this into any terminal — the response streams back from GPT-OSS hosted free on NVIDIA, routed through BlockRun.
curl https://blockrun.ai/api/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "nvidia/gpt-oss-120b",
"messages": [{"role": "user", "content": "Explain prompt caching in two sentences"}]
}'- Context
- 128K
- Price
- free
- Best for
- reasoning · coding
- Context
- 128K
- Price
- free
- Best for
- coding
6 ways to use GPT-OSS free.
BlockRun is the access layer. Pick the surface that matches how you build — terminal, notebook, IDE, agent runtime — and the same free models work everywhere.
- shell
curl https://blockrun.ai/api/v1/chat/completions \ -H "Content-Type: application/json" \ -d '{ "model": "nvidia/gpt-oss-120b", "messages": [{"role": "user", "content": "Explain prompt caching in two sentences"}] }' - python
# Works with the OpenAI SDK — no key required for free models from openai import OpenAI client = OpenAI( base_url="https://blockrun.ai/api/v1", api_key="not-needed-for-free-models", ) response = client.chat.completions.create( model="nvidia/gpt-oss-120b", messages=[{"role": "user", "content": "Explain prompt caching in two sentences"}], ) print(response.choices[0].message.content) - 03
ClawRouter
smart router for OpenClaw / Claude Code — auto-picks free models when possible
Learn more →shell# Install once npm install -g @blockrun/clawrouter # Then point any OpenAI-compatible client at the local proxy. # ClawRouter routes to nvidia/gpt-oss-120b (or the cheapest capable model) # without changing your code. - typescript
// Works with the OpenAI SDK — no key required for free models import OpenAI from "openai"; const client = new OpenAI({ baseURL: "https://blockrun.ai/api/v1", apiKey: "not-needed-for-free-models", }); const r = await client.chat.completions.create({ model: "nvidia/gpt-oss-20b", messages: [{ role: "user", content: "Explain prompt caching in two sentences" }], }); console.log(r.choices[0].message.content); - 05
Claude Code MCP
8 tools for Claude Code, Cursor & ChatGPT — call any free model from inside your editor
Learn more →shell# Add the BlockRun MCP server (Claude Code, Cursor, or ChatGPT desktop) claude mcp add blockrun --transport http https://mcp.blockrun.ai/mcp # Then call from inside the editor: # blockrun_chat(model="nvidia/gpt-oss-120b", messages=[{role:"user", content:"…"}]) - 06
Franklin
the AI agent with a wallet — free OSS models for routine tasks, paid models on demand
Learn more →shell# Install Franklin curl -fsSL https://franklin.run/install | sh # Run with this model franklin chat --model nvidia/gpt-oss-120b "Summarize the README"
We don't share
your data.
Your prompt goes to the AI provider you picked. Nothing else, nowhere else. No training, no retention beyond the request, no profile linking.
- No training, no retention beyond the request. Your prompt is forwarded only to the AI provider you select.
- Wallet in, prompt out. Pseudonymous by default — no email, no phone number, no identity documents.
- Read the code, audit the wire format, run it yourself. @blockrun/llm and blockrun-llm on npm and PyPI.
Want Claude, GPT-5,
or Gemini too?
No subscription. No monthly minimum. Pay per call in USDC via x402 — works the same endpoint, same SDK, same model IDs. Connect a wallet, top up $5, call any frontier model. No credit card.
Everything you might
be wondering.
- Is this a real OpenAI model?
- Yes — GPT-OSS is OpenAI's open-weights release. The model files are public; NVIDIA hosts inference free on build.nvidia.com, and BlockRun routes calls to it without auth.
- How does it compare to GPT-5?
- GPT-OSS 120B is closer to GPT-3.5 / GPT-4 class than current frontier — but it's free, fast, and runs on a permissive license. For frontier quality, use GPT-5.5 on the paid tier.
- Why is it not in the public catalog?
- Hidden from /v1/models for privacy reasons (NVIDIA's free tier may use prompts for service improvement). Still callable by ID for legacy / direct integrations.
- 120B vs 20B — which should I use?
- 120B for general reasoning and chat. 20B if you want lower latency or need to fit smaller context with the 128K window. Both free.