Rate limits

X-RP-* headers, the 429 response shape, and the recommended back-off strategy.

The Public API enforces three independent guards:

  • A per-key burst limit of 120 requests per rolling minute, applied to every API key regardless of plan. This is a request count, not a sum of request units, and exists so a runaway loop on one key cannot starve the others. State is enforced server-side; you discover saturation via a 429 with Retry-After.
  • A stricter LLM-tier limit layered on top of the burst limiter for the handful of endpoints that dispatch LLM work. See LLM-tier rate limit below.
  • A monthly request-unit quota sized by your API plan. Every metered response carries the post-debit snapshot in X-RP-Quota-* headers so you can schedule work without polling.

LLM-tier rate limit [#llm-tier]

A small set of /v1 endpoints dispatch LLM work and therefore sit behind a secondary, much smaller per-minute bucket on top of the global burst limiter. The tighter cap is what protects you (and the platform) from a runaway loop spending your monthly credits before you can react.

The reference page for each affected endpoint shows a rate-limit badge in the header next to scope, cost and idempotent. Today the LLM tier applies to:

EndpointWhat it does
POST /v1/brands/{brand_id}/factsRe-runs the brand-facts research workflow.
POST /v1/reports/{report_id}/runsRe-runs an existing report.
POST /v1/reports/{report_id}/promptsBulk-creates prompts (manual or auto-generated).
POST /v1/scheduled-reports/{scheduled_report_id}/executionsTriggers an out-of-band run of a schedule.

The buckets are split per principal type so a flooded dashboard tab cannot blow through your integration’s budget:

  • API keys: 10 requests per minute (per key).
  • Dashboard / JWT principals: 30 requests per minute (per user).

Saturation returns the same 429 rate_limit_exceeded envelope as the burst limiter, but with details.scope = "llm" so a client can branch on the remediation:

{
  "error": {
    "code": "rate_limit_exceeded",
    "message": "Per-minute rate limit exceeded"
  },
  "request_id": "01HZK3X4P5..."
}

Use the same Retry-After-driven back-off described below — there is no extra header to read.

Response headers

Every metered /v1/* response (i.e. one that actually moved the quota counter, so 2xx + 4xx auth/validation errors after admission) carries:

HeaderMeaning
X-RP-Request-UnitsRequest units this single call cost. 0 for free endpoints (e.g. GET /v1/me).
X-RP-Quota-LimitTotal request units included in the active billing period.
X-RP-Quota-UsedRequest units consumed in the current period (post-debit).
X-RP-Quota-RemainingRequest units left until the next period reset.
X-RP-Quota-ResetUnix timestamp (seconds) when the billing period and quota counter reset.
X-RP-Duration-MsServer-side processing time for the request, in milliseconds.
Retry-AfterSeconds to wait before retrying. Only present on 429 rate_limit_exceeded responses.

The per-minute burst limit is not mirrored in headers; saturate it and you get a 429 rate_limit_exceeded with Retry-After (typically < 60s). The X-Correlation-ID response header echoes the request_id field surfaced in every error envelope.

When you get a 429

There are two distinct flavors of 429:

  • rate_limit_exceeded: you blew the per-minute burst window. The response carries Retry-After (seconds) and a details.retry_after_seconds mirror in the body. Sleep that long and resume.
  • request_quota_exceeded: you’ve used your full monthly allotment of request units. The response body’s details includes used, limit and current_period_end. Upgrade your plan or wait until the period resets (also exposed as X-RP-Quota-Reset on every prior metered response). No Retry-After is set, since waiting seconds doesn’t help.

Both bodies follow the canonical error envelope:

{
  "error": {
    "code": "rate_limit_exceeded",
    "message": "Per-minute rate limit exceeded"
  },
  "request_id": "01HZK3X4P5..."
}

A bounded exponential back-off keyed on Retry-After (when present) handles both flavors safely. Cap retries so a saturated quota doesn’t loop forever.

async function callWithBackoff(input, init, max = 5) {
for (let attempt = 0; attempt < max; attempt++) {
  const res = await fetch(input, init);
  if (res.status !== 429) return res;
  const retryAfter = Number(res.headers.get('Retry-After')) || 2 ** attempt;
  await new Promise((r) => setTimeout(r, retryAfter * 1000));
}
throw new Error('rate_limit_exceeded: gave up after retries');
}
import time, httpx

def call_with_backoff(client: httpx.Client, request: httpx.Request, max_attempts: int = 5):
  for attempt in range(max_attempts):
      response = client.send(request)
      if response.status_code != 429:
          return response
      retry_after = float(response.headers.get("Retry-After") or 2 ** attempt)
      time.sleep(retry_after)
  raise RuntimeError("rate_limit_exceeded: gave up after retries")

Tips

  • Poll cheaply. GET /v1/me/quota and GET /v1/me cost 0 request units, so use them to check remaining budget before kicking off an expensive job.
  • Read the headers, don’t poll. Every metered response already includes the post-debit X-RP-Quota-* snapshot, so a worker can keep its own view of the quota fresh without ever calling /v1/me/quota between writes.
  • Mind the burst limit. The per-minute guard counts requests, not units: 120 free GET /v1/me calls in the same minute will trip rate_limit_exceeded even though they cost zero quota. Pace concurrent workers per key, or mint extra keys to fan out across buckets.
  • Refunds on 5xx. When a metered call returns >= 500, the platform refunds the request units automatically. You don’t need to re-poll /v1/me/quota to “undo” a server-side failure.