000 spend · symptom

Why your OpenAI or Anthropic bill keeps growing.

If multiple agents share one or two API keys, you have no way to know which agent is responsible when the bill jumps. The provider dashboard shows tokens by key, not by agent, customer, or feature. By the time the invoice arrives, the spike is three weeks old and nobody on the team remembers what shipped.

001 what's actually happening

The bill rarely jumps for one big reason. It creeps for three small ones.

Across the teams we've talked to, the same three patterns account for most runaway-spend incidents. None of them produce an exception, none of them fail an existing alert, and none of them are visible in the provider's usage page.

The three usual suspects

  • Retry loops that carry full context. A new retry policy ships, a soft failure starts to recur, and each retry re-sends the full conversation history. One bad agent runs up two-thirds of the month's bill on its own.
  • Prompt creep. A handful of small additions to the system prompt over six weeks doubles the average input token count. No single change looked suspicious in code review; the cumulative effect is invisible until the invoice lands.
  • Ungated model swaps. An engineer changes claude-haiku to claude-opus for one agent because they were testing quality. The change ships. The cost-per-run for that agent jumps 12x and nobody notices for a week.
002 what catches it on the day

Cost-per-run, per agent, on a baseline that pages.

The smallest instrumentation that catches all three patterns is the same: a cost number per run, attributed to the agent that produced the run, with a rolling baseline that knows what normal looks like for each agent.

A real breakdown

spend · march 2026 · top customers by agent team rollup
Agent Customer Runs Cost Cost / run
support-triage acme-corp 12,408 £1,460 £0.118
research-agent (unattributed) 214 £409 £1.910
content-writer acme-corp 3,114 £967 £0.310
email-classifier acme-corp 18,672 £163 £0.009

The flagged row is a research-agent invocation running at 16x the cost-per-run of the other agents on the team, and missing its customer_id tag. That's a backfill job that started looping; the daily anomaly check finds it the morning it starts, not the first of next month.

The three things to instrument

  • Per-agent cost attribution. Every run records the agent, optional customer and feature tags, and provider-side token counts. Pricing is resolved server-side from a rate card so the SDK can't be wrong about your bill.
  • Cost-per-successful-run as the headline. Total cost over runs that finished cleanly. Loops and timeouts are excluded so the metric matches reality.
  • Anomaly detection on a 14-day baseline. Today's spend more than 2.5σ above the trailing mean fires the agent's alert route. Default threshold; configurable per team.
003 frequently asked
My OpenAI dashboard shows tokens by API key. Why isn't that enough?
One API key usually serves multiple agents in the same environment. When the bill spikes, the dashboard can tell you "key A went up", but not which of the seven agents using key A caused it. Per-agent attribution requires tagging each run with the agent that produced it at the time the run starts.
What's the difference between cost and cost-per-successful-run?
Total cost over total runs is the wrong denominator. A loop that retries 50 times and fails inflates the run count and depresses the average, hiding the per-attempt cost. Cost over successful runs (status=success and quality threshold met) is the number finance can actually budget against.
How do I know which feature or customer is driving the spend?
Tag each run with a customer_id and a feature at run start. Aggregation is retroactive once tagged, so adding the tag today lets you slice every run going forward, and unattributed rows surface in the dashboard so you can find the cron or backfill job that forgot to tag.
Can I get alerted before the invoice arrives?
Yes. Anomaly detection runs a z-score on each agent against its trailing 14-day baseline. When today's spend exceeds 2.5 standard deviations above the trailing mean, the agent's alert route fires (Slack, PagerDuty, Microsoft Teams, email, or generic webhook). Most teams catch the spike on the same day it starts.
What about cached tokens? Are those priced separately?
Yes. Anthropic prompt-cache reads and OpenAI cached_tokens are priced from a separate row on the rate card. Cache creation is billed as regular input. The SDK never sends a cost number; pricing happens server-side from the rate card so you can't be wrong about your own bill.
004 read next

How AgentPing implements cost attribution.

Spend is the cost-attribution side of AgentPing. The features page walks through the rate card model, anomaly detection, and per-customer cost rollups. The docs go one level deeper.

Spend features Spend docs What is AI agent observability?