Rate limits — bRRAIn Docs
Per-token, per-organization, and per-endpoint rate limits — how to read the headers and how to handle hitting them.
Rate limits
bRRAIn rate-limits API traffic to protect both your pod and our shared infrastructure. This page covers what's limited, how to read the headers we return, and how to handle hitting a limit.
Three layers
Per-token limits
Every API token has its own limit, typically:
- 600 requests per minute for read operations.
- 120 requests per minute for write operations.
Service-account tokens can be granted higher limits at issue time.
Per-organization limits
Across all tokens for an organization:
- 6,000 requests per minute on Free / Team plans.
- 30,000 requests per minute on Business.
- 60,000 requests per minute on Enterprise.
- Negotiated on Sovereign-Cloud.
Per-endpoint limits
A few high-cost endpoints have their own caps:
/api/handler/answer(Handler Q&A): 60 per minute per token (Handler is GPU-bound)./api/orchestrator/orchestrations/{id}/runs(start a run): 30 per minute per token./api/exports/full(full org export): 1 per hour per organization.
Headers
Every successful response includes:
X-RateLimit-Limit: 600
X-RateLimit-Remaining: 537
X-RateLimit-Reset: 1715638200
X-RateLimit-Scope: token
Limit— the cap that applies for this request.Remaining— how many you have left in the window.Reset— Unix timestamp when the window resets.Scope— which limit ate this request:token,org, orendpoint.
When you've hit a limit, the response is 429 Too Many Requests:
HTTP/1.1 429 Too Many Requests
Retry-After: 12
X-RateLimit-Limit: 600
X-RateLimit-Remaining: 0
X-RateLimit-Reset: 1715638212
X-RateLimit-Scope: token
Content-Type: application/json
{
"type": "https://docs.brrain.io/api-errors#rate_limit_exceeded",
"title": "Rate limit exceeded",
"status": 429,
"detail": "You've exceeded the per-token limit (600/min). Try again in 12 seconds.",
"scope": "token"
}
Retry-After is the recommended wait in seconds.
Recommended client behavior
- Read the
X-RateLimit-Remainingheader. If it drops below 10% ofLimit, slow down proactively. - On a 429, sleep for
Retry-Afterseconds plus a small random jitter (50–500 ms) before retrying. - Backoff exponentially on repeated 429s — 1.5× the previous delay each time.
- Cap the retry attempts (commonly 5 retries). After cap, surface the error to the caller.
The bRRAIn SDK handles this automatically with the default retry policy.
Burst allowance
Limits are evaluated as a 60-second sliding window with a small burst allowance (~25% above the limit for short spikes). You shouldn't see 429s for normal bursty traffic; you'll see them on sustained over-limit traffic.
How to request a higher limit
If you're hitting the per-organization limit consistently and your usage is legitimate:
- For Business and Enterprise, contact your customer success manager.
- For Free and Team, upgrade to a higher tier.
Per-token limits can be raised by issuing the token with elevated quotas (Sovereign action under Settings → Service accounts).
Special endpoints
A few endpoints intentionally have no rate limit:
GET /healthz— health check.GET /readyz— readiness check.GET /version— version info.
These are safe to poll at any rate.
Webhook delivery rate limits
We also limit how fast we deliver webhooks to your endpoints — typically 100 deliveries per second per subscription. Delivery is queued; you don't have to handle bursts. See API: Webhooks.
Quotas vs rate limits
Rate limits are per-time-window (requests per minute). Quotas are per-period (requests per month, GB-stored, compute-hours). See Console: Settings → Quotas for quota management.
Hitting a quota returns 403 Forbidden with a quota-specific error code, not a 429.
Where to next
- API: Errors — full error format.
- API: Authentication — how token scoping interacts with limits.
- Console: Observability — to see your current usage relative to limits.