Rate limits
X-RP-* headers, the 429 response shape, and the recommended back-off strategy.
The Public API enforces three independent guards:
- A per-key burst limit of 120 requests per rolling minute, applied to
every API key regardless of plan. This is a request count, not a sum of
request units, and exists so a runaway loop on one key cannot starve the
others. State is enforced server-side; you discover saturation via a
429withRetry-After. - A stricter LLM-tier limit layered on top of the burst limiter for the handful of endpoints that dispatch LLM work. See LLM-tier rate limit below.
- A monthly request-unit quota sized by your API plan. Every metered
response carries the post-debit snapshot in
X-RP-Quota-*headers so you can schedule work without polling.
LLM-tier rate limit [#llm-tier]
A small set of /v1 endpoints dispatch LLM work and therefore sit behind a
secondary, much smaller per-minute bucket on top of the global burst limiter.
The tighter cap is what protects you (and the platform) from a runaway loop
spending your monthly credits before you can react.
The reference page for each affected endpoint shows a rate-limit badge in
the header next to scope, cost and idempotent. Today the LLM tier
applies to:
| Endpoint | What it does |
|---|---|
POST /v1/brands/{brand_id}/facts | Re-runs the brand-facts research workflow. |
POST /v1/reports/{report_id}/runs | Re-runs an existing report. |
POST /v1/reports/{report_id}/prompts | Bulk-creates prompts (manual or auto-generated). |
POST /v1/scheduled-reports/{scheduled_report_id}/executions | Triggers an out-of-band run of a schedule. |
The buckets are split per principal type so a flooded dashboard tab cannot blow through your integration’s budget:
- API keys: 10 requests per minute (per key).
- Dashboard / JWT principals: 30 requests per minute (per user).
Saturation returns the same 429 rate_limit_exceeded envelope as the burst
limiter, but with details.scope = "llm" so a client can branch on the
remediation:
{
"error": {
"code": "rate_limit_exceeded",
"message": "Per-minute rate limit exceeded"
},
"request_id": "01HZK3X4P5..."
}
Use the same Retry-After-driven back-off described below — there is no
extra header to read.
Response headers
Every metered /v1/* response (i.e. one that actually moved the quota
counter, so 2xx + 4xx auth/validation errors after admission) carries:
| Header | Meaning |
|---|---|
X-RP-Request-Units | Request units this single call cost. 0 for free endpoints (e.g. GET /v1/me). |
X-RP-Quota-Limit | Total request units included in the active billing period. |
X-RP-Quota-Used | Request units consumed in the current period (post-debit). |
X-RP-Quota-Remaining | Request units left until the next period reset. |
X-RP-Quota-Reset | Unix timestamp (seconds) when the billing period and quota counter reset. |
X-RP-Duration-Ms | Server-side processing time for the request, in milliseconds. |
Retry-After | Seconds to wait before retrying. Only present on 429 rate_limit_exceeded responses. |
The per-minute burst limit is not mirrored in headers; saturate it and you
get a 429 rate_limit_exceeded with Retry-After (typically < 60s). The
X-Correlation-ID response header echoes the request_id field surfaced in
every error envelope.
When you get a 429
There are two distinct flavors of 429:
rate_limit_exceeded: you blew the per-minute burst window. The response carriesRetry-After(seconds) and adetails.retry_after_secondsmirror in the body. Sleep that long and resume.request_quota_exceeded: you’ve used your full monthly allotment of request units. The response body’sdetailsincludesused,limitandcurrent_period_end. Upgrade your plan or wait until the period resets (also exposed asX-RP-Quota-Reseton every prior metered response). NoRetry-Afteris set, since waiting seconds doesn’t help.
Both bodies follow the canonical error envelope:
{
"error": {
"code": "rate_limit_exceeded",
"message": "Per-minute rate limit exceeded"
},
"request_id": "01HZK3X4P5..."
}
Recommended back-off
A bounded exponential back-off keyed on Retry-After (when present) handles
both flavors safely. Cap retries so a saturated quota doesn’t loop forever.
async function callWithBackoff(input, init, max = 5) {
for (let attempt = 0; attempt < max; attempt++) {
const res = await fetch(input, init);
if (res.status !== 429) return res;
const retryAfter = Number(res.headers.get('Retry-After')) || 2 ** attempt;
await new Promise((r) => setTimeout(r, retryAfter * 1000));
}
throw new Error('rate_limit_exceeded: gave up after retries');
}import time, httpx
def call_with_backoff(client: httpx.Client, request: httpx.Request, max_attempts: int = 5):
for attempt in range(max_attempts):
response = client.send(request)
if response.status_code != 429:
return response
retry_after = float(response.headers.get("Retry-After") or 2 ** attempt)
time.sleep(retry_after)
raise RuntimeError("rate_limit_exceeded: gave up after retries")Tips
- Poll cheaply.
GET /v1/me/quotaandGET /v1/mecost0request units, so use them to check remaining budget before kicking off an expensive job. - Read the headers, don’t poll. Every metered response already includes
the post-debit
X-RP-Quota-*snapshot, so a worker can keep its own view of the quota fresh without ever calling/v1/me/quotabetween writes. - Mind the burst limit. The per-minute guard counts requests, not units:
120 free
GET /v1/mecalls in the same minute will triprate_limit_exceededeven though they cost zero quota. Pace concurrent workers per key, or mint extra keys to fan out across buckets. - Refunds on 5xx. When a metered call returns
>= 500, the platform refunds the request units automatically. You don’t need to re-poll/v1/me/quotato “undo” a server-side failure.