Windows, tiers, headers
Rate limits
Limits are per project over a sliding 60-second window, with separate buckets for run creation, reads, and embeddings. Deep runs additionally have a concurrency cap, since they hold execution slots for seconds at a time.
Limits by tier
| tier | runs / min | reads / min | embeddings / min | concurrent deep |
|---|---|---|---|---|
| Free | 10 | 60 | 300 | 1 |
| Pro | 60 | 600 | 3,000 | 4 |
| Scale | 300 | 3,000 | 20,000 | 16 |
| Enterprise | custom | custom | custom | custom |
Reading the headers
| header | meaning |
|---|---|
| X-RateLimit-Limit | Bucket size for this endpoint class |
| X-RateLimit-Remaining | Requests left in the window |
| X-RateLimit-Reset | Unix seconds until refill |
| Retry-After | On 429 — seconds to wait; treat as a floor |
Behavior under limit
A 429 never partially executes — the run was not created, so retrying with the same Idempotency-Key is always safe. The SDKs retry with exponential backoff and jitter automatically; raw integrations should honor Retry-After and cap backoff at 60 seconds. Sustained 429s at your tier ceiling are the signal to upgrade or to batch embedding traffic.