Windows, tiers, headers

Rate limits

Limits are per project over a sliding 60-second window, with separate buckets for run creation, reads, and embeddings. Deep runs additionally have a concurrency cap, since they hold execution slots for seconds at a time.

Limits by tier

tier	runs / min	reads / min	embeddings / min	concurrent deep
Free	10	60	300	1
Pro	60	600	3,000	4
Scale	300	3,000	20,000	16
Enterprise	custom	custom	custom	custom

Reading the headers

header	meaning
X-RateLimit-Limit	Bucket size for this endpoint class
X-RateLimit-Remaining	Requests left in the window
X-RateLimit-Reset	Unix seconds until refill
Retry-After	On 429 — seconds to wait; treat as a floor

Behavior under limit

A 429 never partially executes — the run was not created, so retrying with the same Idempotency-Key is always safe. The SDKs retry with exponential backoff and jitter automatically; raw integrations should honor Retry-After and cap backoff at 60 seconds. Sustained 429s at your tier ceiling are the signal to upgrade or to batch embedding traffic.

Keep reading

[next]ErrorsEvery code, one table

[prev]AuthenticationBearer keys and scopes