Rate limits

bRRAIn rate-limits API traffic to protect both your pod and our shared infrastructure. This page covers what's limited, how to read the headers we return, and how to handle hitting a limit.

Three layers

Per-token limits

Every API token has its own limit, typically:

600 requests per minute for read operations.
120 requests per minute for write operations.

Service-account tokens can be granted higher limits at issue time.

Per-organization limits

Across all tokens for an organization:

6,000 requests per minute on Free / Team plans.
30,000 requests per minute on Business.
60,000 requests per minute on Enterprise.
Negotiated on Sovereign-Cloud.

Per-endpoint limits

A few high-cost endpoints have their own caps:

/api/handler/answer (Handler Q&A): 60 per minute per token (Handler is GPU-bound).
/api/orchestrator/orchestrations/{id}/runs (start a run): 30 per minute per token.
/api/exports/full (full org export): 1 per hour per organization.

Headers

Every successful response includes:

X-RateLimit-Limit: 600
X-RateLimit-Remaining: 537
X-RateLimit-Reset: 1715638200
X-RateLimit-Scope: token

Limit — the cap that applies for this request.
Remaining — how many you have left in the window.
Reset — Unix timestamp when the window resets.
Scope — which limit ate this request: token, org, or endpoint.

When you've hit a limit, the response is 429 Too Many Requests:

HTTP/1.1 429 Too Many Requests
Retry-After: 12
X-RateLimit-Limit: 600
X-RateLimit-Remaining: 0
X-RateLimit-Reset: 1715638212
X-RateLimit-Scope: token
Content-Type: application/json

{
  "type": "https://docs.brrain.io/api-errors#rate_limit_exceeded",
  "title": "Rate limit exceeded",
  "status": 429,
  "detail": "You've exceeded the per-token limit (600/min). Try again in 12 seconds.",
  "scope": "token"
}

Retry-After is the recommended wait in seconds.

Recommended client behavior

Read the X-RateLimit-Remaining header. If it drops below 10% of Limit, slow down proactively.
On a 429, sleep for Retry-After seconds plus a small random jitter (50–500 ms) before retrying.
Backoff exponentially on repeated 429s — 1.5× the previous delay each time.
Cap the retry attempts (commonly 5 retries). After cap, surface the error to the caller.

The bRRAIn SDK handles this automatically with the default retry policy.

Burst allowance

Limits are evaluated as a 60-second sliding window with a small burst allowance (~25% above the limit for short spikes). You shouldn't see 429s for normal bursty traffic; you'll see them on sustained over-limit traffic.

How to request a higher limit

If you're hitting the per-organization limit consistently and your usage is legitimate:

For Business and Enterprise, contact your customer success manager.
For Free and Team, upgrade to a higher tier.

Per-token limits can be raised by issuing the token with elevated quotas (Sovereign action under Settings → Service accounts).

Special endpoints

A few endpoints intentionally have no rate limit:

GET /healthz — health check.
GET /readyz — readiness check.
GET /version — version info.

These are safe to poll at any rate.

Webhook delivery rate limits

We also limit how fast we deliver webhooks to your endpoints — typically 100 deliveries per second per subscription. Delivery is queued; you don't have to handle bursts. See API: Webhooks.

Quotas vs rate limits

Rate limits are per-time-window (requests per minute). Quotas are per-period (requests per month, GB-stored, compute-hours). See Console: Settings → Quotas for quota management.

Hitting a quota returns 403 Forbidden with a quota-specific error code, not a 429.

Where to next

API: Errors — full error format.
API: Authentication — how token scoping interacts with limits.
Console: Observability — to see your current usage relative to limits.