Docs — API Reference/docs/api-reference/rate-limits
Rate limits
Limits are per project over a sliding 60-second window, in separate buckets for run creation, reads, and embeddings, plus a concurrency cap on deep runs. Every response carries X-RateLimit-* headers; this endpoint exposes the same data on demand. Tier numbers live at /docs/rate-limits.
[ 01 ]get — /v1/rate_limits
GET
/v1/rate_limitsCurrent bucket state for your key's project — limits, remaining, and reset times.
[ response ]200 ok
{
"object": "rate_limits",
"tier": "pro",
"buckets": {
"runs": { "limit": 60, "remaining": 57, "reset_at": "2026-06-11T07:31:40Z" },
"reads": { "limit": 600, "remaining": 588, "reset_at": "2026-06-11T07:31:40Z" },
"embeddings": { "limit": 3000, "remaining": 2996, "reset_at": "2026-06-11T07:31:40Z" }
},
"deep_concurrency": { "limit": 4, "in_use": 1 }
}[ 02 ]get — /v1/rate_limits/history
GET
/v1/rate_limits/historyHourly utilization for the past 7 days — find your bursts before they find you.
| param | type | req | description |
|---|---|---|---|
| bucket | string | optional | Filter: "runs", "reads", or "embeddings". Query parameter. |
[ response ]200 ok
{
"object": "list",
"bucket": "runs",
"data": [
{ "hour": "2026-06-11T06:00:00Z", "peak_utilization": 0.42, "throttled": 0 },
{ "hour": "2026-06-11T05:00:00Z", "peak_utilization": 0.97, "throttled": 12 }
]
}More resources