Docs — API Reference/docs/api-reference/rate-limits

Rate limits

Limits are per project over a sliding 60-second window, in separate buckets for run creation, reads, and embeddings, plus a concurrency cap on deep runs. Every response carries X-RateLimit-* headers; this endpoint exposes the same data on demand. Tier numbers live at /docs/rate-limits.

Open in playground Runs resource

[ 01 ]get — /v1/rate_limits

GET/v1/rate_limits

Current bucket state for your key's project — limits, remaining, and reset times.

[ response ]200 ok

{
  "object": "rate_limits",
  "tier": "pro",
  "buckets": {
    "runs":       { "limit": 60,   "remaining": 57,   "reset_at": "2026-06-11T07:31:40Z" },
    "reads":      { "limit": 600,  "remaining": 588,  "reset_at": "2026-06-11T07:31:40Z" },
    "embeddings": { "limit": 3000, "remaining": 2996, "reset_at": "2026-06-11T07:31:40Z" }
  },
  "deep_concurrency": { "limit": 4, "in_use": 1 }
}

[ 02 ]get — /v1/rate_limits/history

GET/v1/rate_limits/history

Hourly utilization for the past 7 days — find your bursts before they find you.

param	type	req	description
bucket	string	optional	Filter: "runs", "reads", or "embeddings". Query parameter.

[ response ]200 ok

{
  "object": "list",
  "bucket": "runs",
  "data": [
    { "hour": "2026-06-11T06:00:00Z", "peak_utilization": 0.42, "throttled": 0 },
    { "hour": "2026-06-11T05:00:00Z", "peak_utilization": 0.97, "throttled": 12 }
  ]
}

More resources

[prev]Errors2 endpoints