BlockRun

Rate Limits

BlockRun's rate-limiting model is intentionally minimal: paid inference endpoints have no platform-side quota. You pay per call in USDC via x402, and the cost of a paid request is the only cap on call volume. The effective rate limit your code will see comes from the upstream provider whose model you called (OpenAI, Anthropic, Google, etc.) — not from BlockRun's gateway.

Summary

SurfacePlatform quotaNotes
POST /v1/chat/completions (paid LLMs)noneupstream provider limit applies
POST /v1/messages (Anthropic-compatible)noneAnthropic upstream limit applies
POST /v1/images/generationsnoneupstream provider limit applies
POST /v1/videos/generationsnonetoken360 upstream limit applies
POST /v1/audio/generationsnoneSuno upstream limit applies
POST /v1/voice/callnoneBland.ai upstream limit applies
GET /v1/models, /v1/{image,video,audio}/models100 req / hour per IPmetadata endpoints
GET /api/pricing100 req / hour per IPmetadata endpoint
GET /api/health/*60 req / minute per IPinfrastructure health

There is no per-wallet quota, no daily cap, no TPM/RPM limit imposed by BlockRun on paid inference. The economic cost of each call (settled in USDC at request time) is the abuse mitigation.

How upstream rate limits surface

When the upstream provider rate-limits a request, BlockRun returns a 429 Rate Limited response with the upstream provider name and a retry hint, so your client can either retry or fail over to a same-tier model on a different provider.

Response shape

{
  "error": "Rate limited",
  "code": "RATE_LIMITED",
  "source": "openai",
  "retry_after_seconds": 60,
  "details": "<upstream provider's error message>"
}

Response headers

HTTP/1.1 429 Too Many Requests
Retry-After: 60
X-RateLimit-Source: openai
  • Retry-After — RFC-7231 compliant; seconds to wait before retrying. BlockRun extracts this from the upstream provider's error when available, otherwise defaults to 60.
  • X-RateLimit-Source — which provider hit the limit (openai, anthropic, google, deepseek, xai, nvidia, moonshot, minimax, zai, token360, suno, bland, etc.).
  • source field in JSON body — same value, mirrored for clients that prefer body parsing over headers.

Recommended client behavior

const res = await fetch(url, { method: 'POST', ... });

if (res.status === 429) {
  const retryAfter = parseInt(res.headers.get('retry-after') ?? '60', 10);
  const source = res.headers.get('x-ratelimit-source') ?? 'unknown';

  // Option A: same provider, exponential backoff
  await sleep(retryAfter * 1000);
  return retry();

  // Option B: fail over to a same-tier model on a different provider
  // e.g. openai/gpt-5.4 (8K out) -> anthropic/claude-sonnet-4.6 (200K out)
  if (source === 'openai') {
    return callWithModel('anthropic/claude-sonnet-4.6');
  }
}

Per-provider upstream limits (reference)

These are the upstream provider tiers BlockRun currently routes through. They are not contractual and change as we re-tier with each provider; treat them as orders of magnitude, not SLAs.

ProviderTypical RPMTypical TPMNotes
OpenAI (tier 5)~10,000 / model~2M / modelshared key pool across all paid traffic
Anthropic~4,000 / model~400K / modelshared key pool
Google Gemini~1,000 / model~4M / modelshared key pool
DeepSeek~10,000+generousshared key pool
xAI (Grok)~3,000 / model~1M / modelshared key pool
NVIDIA (free models)~60 RPM per IPvariesNVIDIA enforces per-source IP throttling on the free tier; high-concurrency callers should consider a paid model
Moonshot, MiniMax, ZAIprovider-specificprovider-specificusually generous (no observed throttling at current traffic)
token360 (video)varies per modelvariesSeedance generation jobs are async; throttling typically surfaces as long queue waits, not 429
Suno (music)provider-specificn/aper-job quotas
Bland.ai (voice)provider-specificn/aper-account concurrency cap

Why no platform quota?

BlockRun's pay-per-call model uses economic pricing as the abuse boundary instead of platform quotas:

  • Every paid request costs USDC settled at request time via x402.
  • A bad actor running 10,000 calls/sec costs themselves 10,000× the per-call price — at OpenAI/Anthropic prices that's actual money out of their wallet, not free abuse.
  • Hard quotas would force every customer into the same bucket regardless of willingness-to-pay, defeating the value proposition.

If you need guaranteed capacity (dedicated key pool, reserved provider TPM, custom 429 behavior, or an SLA), reach out about enterprise dedicated capacity — we'll provision isolated capacity outside the shared pool. Email care@blockrun.ai or DM @bc1max on Telegram.

Discovery endpoint quotas (metadata only)

The IP-throttled endpoints listed at the top of this page protect against discovery-endpoint scraping. Real product traffic should never hit these limits.

If you exceed them you'll get:

{ "error": "Rate limit exceeded" }

with HTTP 429 and X-RateLimit-Reset: <unix-ms>. Wait until reset, then retry.

Need higher limits?

  • Paid inference: there is no platform cap; the upstream provider's per-model RPM/TPM is your ceiling. Concurrency above that ceiling requires either fail-over to other providers or enterprise dedicated capacity.
  • Discovery endpoints: cache locally — /v1/models updates only when we ship a model change.
  • Enterprise dedicated capacity: isolated key pools, reserved provider TPM, custom SLAs. Contact us.