Tiers, headers, backoff

Rate limits

What you get per tier, how the headers tell you where you stand, and how to back off.

Limits are enforced per project, measured over a sliding 60-second window, with separate buckets for run creation and read traffic. Hitting a limit returns 429 with a Retry-After header — honor it; retrying early extends the window.

Limits by tier

tierruns / minreads / minembeddings / minconcurrent deep runs
Free10603001
Pro606003,0004
Scale3003,00020,00016
Enterprisecustomcustomcustomcustom

Response headers

headermeaning
X-RateLimit-LimitBucket size for this endpoint class.
X-RateLimit-RemainingRequests left in the current window.
X-RateLimit-ResetUnix seconds until the window refills.
Retry-AfterOn 429 only — seconds to wait before retrying.

Backoff that behaves

Use exponential backoff with jitter, cap at 60 seconds, and treat Retry-After as a floor, not a suggestion. The SDKs do this automatically; raw HTTP integrations should copy the pattern.

[ ts ]ts
async function withBackoff<T>(fn: () => Promise<T>): Promise<T> {
  for (let attempt = 0; ; attempt++) {
    try {
      return await fn();
    } catch (err: unknown) {
      const e = err as { status?: number; retryAfter?: number };
      if (e.status !== 429 || attempt >= 5) throw err;
      const base = Math.max(e.retryAfter ?? 0, 2 ** attempt);
      await new Promise((r) =>
        setTimeout(r, Math.min(base + Math.random(), 60) * 1000),
      );
    }
  }
}