Quotas and rate limits

Two limits per team: a per-second request rate and a per-month event allowance. Short bursts above the rate are tolerated; sustained excess is rejected.

Per-tier limits

Tier Sustained rate Burst rate Events / month
Starter 100 req/sec 200 req/sec 100k
Team 500 req/sec 1,000 req/sec 1M
Business 2,000 req/sec 5,000 req/sec 10M
Enterprise custom custom custom

What counts as a request

One row in events, or one ingest delivery:

  • POST /v1/runs = 1 event
  • POST /v1/runs/{id}/events = 1 per item in the batch
  • POST /v1/runs/{id}/finish = 1 event
  • POST /v1/heartbeats = 1 event
  • GET /v1/ping = 1 event

Score writes are part of the finish payload and don't count separately. A typical run with one LLM call and a clean finish is three events: insert, llm_call, finish. Read endpoints (control API) have their own, more permissive limits.

Event cap behaviour

Three phases per calendar month, UTC:

Range Behaviour
Below 100% Normal ingest, no banner, no email.
100% to 150% (soft cap) Ingest continues.
Above 150% (hard cap) 429 Too Many Requests with Retry-After = seconds to next UTC month boundary.

A Starter team on 100k events/month can ingest up to 150k before the door closes. Past 150k, every subsequent ingest returns 429 until the first of the month.

Retry-After semantics

HTTP/1.1 429 Too Many Requests
Content-Type: application/json
Retry-After: 1209600

{
  "error": "quota_exceeded",
  "message": "Monthly event cap exceeded. Resets at 2026-06-01T00:00:00Z."
}

Counts down to the deadline. Hit the cap at noon UTC on the 15th and see Retry-After: 1296000 (15 days); hit it at 23:59 UTC on the 31st and see Retry-After: 60. Reset is always the start of the next UTC month.

SDKs honour Retry-After automatically. After 5 retries, the affected envelopes drop and dropped_count increments. Agent code is never blocked or slowed.

Sustained-rate excess

When you exceed sustained for long enough to drain the burst bucket, 429 returns immediately, no Retry-After. Slow down. SDKs apply exponential back-off.

Distinct from the monthly 429: rate-limit 429 clears as soon as incoming volume drops below sustained, typically within seconds.

Agent count and retention

Quota Behaviour
Agent cap At ceiling, you can't create new agents. Existing agents keep ingesting. Upgrade to lift.
Retention Events past the tier's retention window prune on partition drop, no user action. Upgrading widens the window for events still in range; pruned events are gone.

Resets

Monthly allowance resets at the first of each calendar month, 00:00 UTC. Hard-cap 429s clear at that moment. Banners and warning emails clear too; cross the threshold again later in the month and they reappear.

Sustained-rate enforcement is continuous; the bucket refills constantly.

Monitoring

The billing page surfaces live usage: events this month, percentage of cap, sustained-rate utilisation in the last hour, historical usage. Soft cap triggers an email and banner; hard cap triggers 429.

When you're close to the cap

  1. Upgrade. One click in the dashboard, immediate effect.
  2. Reduce volume at source. Audit auto-instrumentation. One event per stream chunk burns through caps fast.
  3. Sample at the call site. Gate run.event(...) with a random check for high-volume agents. Token counts on llm_call events remain authoritative for cost; sampling logs doesn't skew Spend.

Sales reaches out to teams that hit the soft cap two months running. The conversation is free.