Windows, tiers, headers

Rate limits

Limits are per project over a sliding 60-second window, with separate buckets for run creation, reads, and embeddings. Deep runs additionally have a concurrency cap, since they hold execution slots for seconds at a time.

Limits by tier

tierruns / minreads / minembeddings / minconcurrent deep
Free10603001
Pro606003,0004
Scale3003,00020,00016
Enterprisecustomcustomcustomcustom

Reading the headers

headermeaning
X-RateLimit-LimitBucket size for this endpoint class
X-RateLimit-RemainingRequests left in the window
X-RateLimit-ResetUnix seconds until refill
Retry-AfterOn 429 — seconds to wait; treat as a floor

Behavior under limit

A 429 never partially executes — the run was not created, so retrying with the same Idempotency-Key is always safe. The SDKs retry with exponential backoff and jitter automatically; raw integrations should honor Retry-After and cap backoff at 60 seconds. Sustained 429s at your tier ceiling are the signal to upgrade or to batch embedding traffic.