Rate limits | Rank Prompt API Docs

The Public API enforces three independent guards:

A per-key burst limit of 120 requests per rolling minute, applied to every API key regardless of plan. This is a request count, not a sum of request units, and exists so a runaway loop on one key cannot starve the others. State is enforced server-side; you discover saturation via a 429 with Retry-After.
A stricter LLM-tier limit layered on top of the burst limiter for the handful of endpoints that dispatch LLM work. See LLM-tier rate limit below.
A monthly request-unit quota sized by your API plan. Every metered response carries the post-debit snapshot in X-RP-Quota-* headers so you can schedule work without polling.

LLM-tier rate limit [#llm-tier]

A small set of /v1 endpoints dispatch LLM work and therefore sit behind a secondary, much smaller per-minute bucket on top of the global burst limiter. The tighter cap is what protects you (and the platform) from a runaway loop spending your monthly credits before you can react.

The reference page for each affected endpoint shows a rate-limit badge in the header next to scope, cost and idempotent. Today the LLM tier applies to:

Endpoint	What it does
`POST /v1/brands/{brand_id}/facts`	Re-runs the brand-facts research workflow.
`POST /v1/reports/{report_id}/runs`	Re-runs an existing report.
`POST /v1/reports/{report_id}/prompts`	Bulk-creates prompts (manual or auto-generated).
`POST /v1/scheduled-reports/{scheduled_report_id}/executions`	Triggers an out-of-band run of a schedule.

The buckets are split per principal type so a flooded dashboard tab cannot blow through your integration’s budget:

API keys: 10 requests per minute (per key).
Dashboard / JWT principals: 30 requests per minute (per user).

Saturation returns the same 429 rate_limit_exceeded envelope as the burst limiter, but with details.scope = "llm" so a client can branch on the remediation:

{
  "error": {
    "code": "rate_limit_exceeded",
    "message": "Per-minute rate limit exceeded"
  },
  "request_id": "01HZK3X4P5..."
}

Use the same Retry-After-driven back-off described below. There is no extra header to read.

Response headers

Every metered /v1/* response (i.e. one that actually moved the quota counter, so 2xx + 4xx auth/validation errors after admission) carries:

Header	Meaning
`X-RP-Request-Units`	Request units this single call cost. `0` for free endpoints (e.g. `GET /v1/me`).
`X-RP-Quota-Limit`	Total request units included in the active billing period.
`X-RP-Quota-Used`	Request units consumed in the current period (post-debit).
`X-RP-Quota-Remaining`	Request units left until the next period reset.
`X-RP-Quota-Reset`	Unix timestamp (seconds) when the billing period and quota counter reset.
`X-RP-Duration-Ms`	Server-side processing time for the request, in milliseconds.
`Retry-After`	Seconds to wait before retrying. Only present on `429 rate_limit_exceeded` responses.

The per-minute burst limit is not mirrored in headers; saturate it and you get a 429 rate_limit_exceeded with Retry-After (typically < 60s). The X-Correlation-ID response header echoes the request_id field surfaced in every error envelope.

When you get a `429`

There are two distinct flavors of 429:

rate_limit_exceeded: you blew the per-minute burst window. The response carries Retry-After (seconds) and a details.retry_after_seconds mirror in the body. Sleep that long and resume.
request_quota_exceeded: you’ve used your full monthly allotment of request units. The response body’s details includes used, limit and current_period_end. Upgrade your plan or wait until the period resets (also exposed as X-RP-Quota-Reset on every prior metered response). No Retry-After is set, since waiting seconds doesn’t help.

Both bodies follow the canonical error envelope:

{
  "error": {
    "code": "rate_limit_exceeded",
    "message": "Per-minute rate limit exceeded"
  },
  "request_id": "01HZK3X4P5..."
}

Recommended back-off

A bounded exponential back-off keyed on Retry-After (when present) handles both flavors safely. Cap retries so a saturated quota doesn’t loop forever.

async function callWithBackoff(input, init, max = 5) {
for (let attempt = 0; attempt < max; attempt++) {
  const res = await fetch(input, init);
  if (res.status !== 429) return res;
  const retryAfter = Number(res.headers.get('Retry-After')) || 2 ** attempt;
  await new Promise((r) => setTimeout(r, retryAfter * 1000));
}
throw new Error('rate_limit_exceeded: gave up after retries');
}

import time, httpx

def call_with_backoff(client: httpx.Client, request: httpx.Request, max_attempts: int = 5):
  for attempt in range(max_attempts):
      response = client.send(request)
      if response.status_code != 429:
          return response
      retry_after = float(response.headers.get("Retry-After") or 2 ** attempt)
      time.sleep(retry_after)
  raise RuntimeError("rate_limit_exceeded: gave up after retries")

Tips

Poll cheaply. GET /v1/me/quota and GET /v1/me cost 0 request units, so use them to check remaining budget before kicking off an expensive job.
Read the headers, don’t poll. Every metered response already includes the post-debit X-RP-Quota-* snapshot, so a worker can keep its own view of the quota fresh without ever calling /v1/me/quota between writes.
Mind the burst limit. The per-minute guard counts requests, not units: 120 free GET /v1/me calls in the same minute will trip rate_limit_exceeded even though they cost zero quota. Pace concurrent workers per key, or mint extra keys to fan out across buckets.
Refunds on 5xx. When a metered call returns >= 500, the platform refunds the request units automatically. You don’t need to re-poll /v1/me/quota to “undo” a server-side failure.

LLM-tier rate limit [#llm-tier]

Response headers

When you get a 429

Recommended back-off

Tips

When you get a `429`