Responses API

POST /v1/responses is a drop-in replacement for api.openai.com/v1/responses — the native protocol for Codex CLI and agents that need tools + reasoning together on GPT-5.x (a combination OpenAI rejects on /v1/chat/completions). Same body shape, same response bytes, same SSE event stream; you pay per request in USDC over x402 instead of with an API key.

If your client speaks Chat Completions, use Chat Completions instead — both endpoints serve the same models at the same prices.

Endpoint

POST https://blockrun.ai/api/v1/responses

Request

Headers

Header	Required	Description
`Content-Type`	Yes	Must be `application/json`
`PAYMENT-SIGNATURE`	Conditional	Base64-encoded x402 payment payload (required after 402, x402 v2)

Body Parameters

Parameter	Type	Required	Description
`model`	string	Yes	OpenAI model ID (e.g., `gpt-5.5`, `openai/gpt-5.4-pro`, `gpt-5.3-codex`)
`input`	string \| array	Yes	A prompt string, or an array of Responses input items (messages, `function_call_output`, reasoning replays, …)
`instructions`	string	No	System/developer instructions
`max_output_tokens`	integer	No	Maximum tokens to generate (bounds the x402 quote)
`stream`	boolean	No	Stream native Responses SSE events (default: `false`)
`tools`	array	No	Responses tool definitions — works together with `reasoning` on all GPT-5.x models
`tool_choice`	string \| object	No	Tool selection strategy
`reasoning`	object	No	Reasoning config, e.g. `{"effort": "high"}`
`include`	array	No	Extra output fields, e.g. `["reasoning.encrypted_content"]` for reasoning continuity across turns
`text`	object	No	Output format options (`json_object`, `json_schema`, verbosity)
`temperature` / `top_p`	number	No	Sampling parameters (legacy models only; reasoning models ignore/reject them upstream)

Stateless gateway — what is different from OpenAI

BlockRun's upstream calls run under one org key shared by all payers, so server-side state is disabled. store: false is enforced on every request, and these parameters are rejected with a 400:

Parameter	Why
`store: true`	Responses are never retained upstream
`previous_response_id`	No stored responses to reference — resend full context in `input`
`conversation`	No server-side conversation state
`prompt`	No stored prompt templates
`background: true`	Background jobs complete after the HTTP exchange — nothing to bill against

This is the same model Codex CLI uses by default (store: false + full-context resend). For reasoning continuity across turns, request include: ["reasoning.encrypted_content"] and replay the returned reasoning items in the next call's input.

Supported models

All paid OpenAI models: GPT-5.x (including -pro tiers), o-series, and the codex family. GET /v1/models lists current IDs and prices; the openai/ prefix is optional. Other providers (Claude, Gemini, DeepSeek, …) are served via Chat Completions and Anthropic Messages.

Payment flow

Identical to Chat Completions: send the request without payment, receive a 402 with a USDC quote (estimated input tokens + 10% of max_output_tokens, minimum $0.001), sign the x402 authorization, and resend with the PAYMENT-SIGNATURE header. The SDKs handle this automatically.

Examples

Non-streaming

curl https://blockrun.ai/api/v1/responses \
  -H "Content-Type: application/json" \
  -H "PAYMENT-SIGNATURE: <base64-x402-payload>" \
  -d '{
    "model": "gpt-5.2",
    "input": "Explain x402 in one sentence.",
    "max_output_tokens": 200
  }'

Tools + reasoning (the combination Chat Completions rejects)

curl https://blockrun.ai/api/v1/responses \
  -H "Content-Type: application/json" \
  -H "PAYMENT-SIGNATURE: <base64-x402-payload>" \
  -d '{
    "model": "gpt-5.4",
    "input": "What is the weather in SF? Use the tool.",
    "reasoning": {"effort": "high"},
    "tools": [{
      "type": "function",
      "name": "get_weather",
      "parameters": {
        "type": "object",
        "properties": {"city": {"type": "string"}},
        "required": ["city"]
      }
    }]
  }'

Streaming

Set "stream": true and consume native Responses SSE events (response.created, response.output_text.delta, …, response.completed). The stream is piped through byte-for-byte, so the official OpenAI SDK's event parsing works unchanged.

Response

The native OpenAI Responses object (or SSE event stream) is returned unmodified — id (resp_…), output array with reasoning / message / function_call items, and usage with input_tokens, output_tokens, input_tokens_details.cached_tokens, and output_tokens_details.reasoning_tokens. The X-Payment-Response header carries the settlement transaction hash on non-streaming calls.