BlockRun

Rate Limits

BlockRun's rate-limiting model is intentionally minimal: paid inference endpoints have no platform-side quota. You pay per call in USDC via x402, and the cost of a paid request is the only cap on call volume. The effective rate limit your code will see comes from the upstream capacity behind the model you called — not from BlockRun's gateway.

No platform quota on paid inference

There is no per-wallet quota, no daily cap, no TPM/RPM limit imposed by BlockRun on paid inference. The economic cost of each call (settled in USDC at request time) is the abuse boundary. Only discovery/metadata endpoints carry small per-IP limits.

Summary

SurfacePlatform quotaNotes
POST /v1/chat/completions (paid LLMs)noneupstream limit applies
POST /v1/messages (Anthropic-compatible)noneupstream limit applies
POST /v1/images/generationsnoneupstream limit applies
POST /v1/videos/generationsnoneupstream limit applies
POST /v1/audio/generationsnoneupstream limit applies
POST /v1/voice/callnoneupstream limit applies
GET /v1/models, /v1/{image,video,audio}/models100 req / hour per IPmetadata endpoints
GET /api/pricing100 req / hour per IPmetadata endpoint
GET /api/health/*60 req / minute per IPinfrastructure health

There is no per-wallet quota, no daily cap, no TPM/RPM limit imposed by BlockRun on paid inference. The economic cost of each call (settled in USDC at request time) is the abuse mitigation.

How upstream rate limits surface

When an upstream rate-limits a request, BlockRun returns a 429 Rate Limited response with a source tag and a retry hint, so your client can either retry or fail over to a same-tier model.

Response shape

{
  "error": "Rate limited",
  "code": "RATE_LIMITED",
  "source": "<source-tag>",
  "retry_after_seconds": 60,
  "details": "<upstream error message>"
}

Response headers

HTTP/1.1 429 Too Many Requests
Retry-After: 60
X-RateLimit-Source: <source-tag>
  • Retry-After — RFC-7231 compliant; seconds to wait before retrying. BlockRun extracts this from the upstream error when available, otherwise defaults to 60.
  • X-RateLimit-Source — an opaque source tag for the capacity pool that hit the limit. Treat it as a coarse failover hint, not a stable identifier.
  • source field in JSON body — same value, mirrored for clients that prefer body parsing over headers.

Recommended client behavior

const res = await fetch(url, { method: 'POST', ... });

if (res.status === 429) {
  const retryAfter = parseInt(res.headers.get('retry-after') ?? '60', 10);
  const source = res.headers.get('x-ratelimit-source') ?? 'unknown';

  // Option A: same provider, exponential backoff
  await sleep(retryAfter * 1000);
  return retry();

  // Option B: fail over to a same-tier model
  // e.g. openai/gpt-5.4 -> anthropic/claude-sonnet-4.6 (200K out)
  return callWithModel('anthropic/claude-sonnet-4.6');
}

Upstream capacity (reference)

These are the orders of magnitude BlockRun's shared capacity currently runs at, by model family. They are not contractual and change as we re-tier capacity; treat them as ballpark, not SLAs.

Model familyTypical RPMTypical TPMNotes
Flagship chat (openai/*, anthropic/*, google/*)thousands / modelhundreds of K–millions / modelshared capacity across all paid traffic
Cost-efficient chat (deepseek/*, xai/*, moonshot/*, minimax/*, zai/*)thousands+ / modelgeneroususually no observed throttling at current traffic
Free tier (nvidia/* open-weight)~60 RPM per IPvariesper-source IP throttling on the free tier; high-concurrency callers should use a paid model
Video (bytedance/*, */sora-2)varies per modelvariesgeneration jobs are async; throttling typically surfaces as long queue waits, not 429
Music / speech / voiceper-jobn/aper-job or per-account concurrency caps

Why no platform quota?

BlockRun's pay-per-call model uses economic pricing as the abuse boundary instead of platform quotas:

  • Every paid request costs USDC settled at request time via x402.
  • A bad actor running 10,000 calls/sec costs themselves 10,000× the per-call price — at flagship-model prices that's actual money out of their wallet, not free abuse.
  • Hard quotas would force every customer into the same bucket regardless of willingness-to-pay, defeating the value proposition.

If you need guaranteed capacity (dedicated key pool, reserved provider TPM, custom 429 behavior, or an SLA), reach out about enterprise dedicated capacity — we'll provision isolated capacity outside the shared pool. Email care@blockrun.ai or DM @bc1max on Telegram.

Discovery endpoint quotas (metadata only)

The IP-throttled endpoints listed at the top of this page protect against discovery-endpoint scraping. Real product traffic should never hit these limits.

If you exceed them you'll get:

{ "error": "Rate limit exceeded" }

with HTTP 429 and X-RateLimit-Reset: <unix-ms>. Wait until reset, then retry.

Need higher limits?

  • Paid inference: there is no platform cap; the upstream provider's per-model RPM/TPM is your ceiling. Concurrency above that ceiling requires either fail-over to other providers or enterprise dedicated capacity.
  • Discovery endpoints: cache locally — /v1/models updates only when we ship a model change.
  • Enterprise dedicated capacity: isolated key pools, reserved provider TPM, custom SLAs. Contact us.

What's next?