Why your OpenAI or Anthropic bill keeps growing.

If multiple agents share one or two API keys, you have no way to know which agent is responsible when the bill jumps. The provider dashboard shows tokens by key, not by agent, customer, or feature. By the time the invoice arrives, the spike is three weeks old and nobody on the team remembers what shipped.

The bill rarely jumps for one big reason. It creeps for three small ones.

Across the teams we've talked to, the same three patterns account for most runaway-spend incidents. None of them produce an exception, none of them fail an existing alert, and none of them are visible in the provider's usage page.

The three usual suspects

Retry loops that carry full context. A new retry policy ships, a soft failure starts to recur, and each retry re-sends the full conversation history. One bad agent runs up two-thirds of the month's bill on its own.
Prompt creep. A handful of small additions to the system prompt over six weeks doubles the average input token count. No single change looked suspicious in code review; the cumulative effect is invisible until the invoice lands.
Ungated model swaps. An engineer changes claude-haiku to claude-opus for one agent because they were testing quality. The change ships. The cost-per-run for that agent jumps 12x and nobody notices for a week.

Cost-per-run, per agent, on a baseline that pages.

The smallest instrumentation that catches all three patterns is the same: a cost number per run, attributed to the agent that produced the run, with a rolling baseline that knows what normal looks like for each agent.

A real breakdown

          spend · march 2026 · top customers by agent
          team rollup
        

          
              Agent
              Customer
              Runs
              Cost
              Cost / run
            

          
              support-triage
              acme-corp
              12,408
              $1,460
              $0.118
            

              research-agent
              (unattributed)
              214
              $409
              $1.910
            

              content-writer
              acme-corp
              3,114
              $967
              $0.310
            

              email-classifier
              acme-corp
              18,672
              $163
              $0.009
            

        

Agent	Customer	Runs	Cost	Cost / run
support-triage	acme-corp	12,408	$1,460	$0.118
research-agent	(unattributed)	214	$409	$1.910
content-writer	acme-corp	3,114	$967	$0.310
email-classifier	acme-corp	18,672	$163	$0.009

The flagged row is a research-agent invocation running at 16x the cost-per-run of the other agents on the team, and missing its customer_id tag. That's a backfill job that started looping; the daily anomaly check finds it the morning it starts, not the first of next month.

The four things to instrument

Per-agent cost attribution. Every run records the agent, optional customer and feature tags, and provider-side token counts. Pricing is resolved server-side from a rate card so the SDK can't be wrong about your bill.
Cost-per-successful-run as the headline. Total cost over runs that finished cleanly. Loops and timeouts are excluded so the metric matches reality.
Anomaly detection on a 14-day baseline. Today's spend more than 2.5σ above the trailing mean fires the agent's alert route. Default threshold; configurable per team.
A spend guard on scheduled scripts. An opt-in check at the top of the script that refuses to start the next run once a spend rule trips. The alert tells you about last night's runaway; the guard stops tonight's.

My OpenAI dashboard shows tokens by API key. Why isn't that enough?

One API key usually serves multiple agents in the same environment. When the bill spikes, the dashboard can tell you "key A went up", but not which of the seven agents using key A caused it. Per-agent attribution requires tagging each run with the agent that produced it at the time the run starts.

What's the difference between cost and cost-per-successful-run?

Total cost over total runs is the wrong denominator. A loop that retries 50 times and fails inflates the run count and depresses the average, hiding the per-attempt cost. Cost over successful runs (status=success and quality threshold met) is the number finance can actually budget against.

How do I know which feature or customer is driving the spend?

Tag each run with a customer_id and a feature at run start. Aggregation is retroactive once tagged, so adding the tag today lets you slice every run going forward, and unattributed rows surface in the dashboard so you can find the cron or backfill job that forgot to tag.

Can I get alerted before the invoice arrives?

Yes. Anomaly detection runs a z-score on each agent against its trailing 14-day baseline. When today's spend exceeds 2.5 standard deviations above the trailing mean, the agent's alert route fires (Slack, PagerDuty, Microsoft Teams, email, or generic webhook). Most teams catch the spike on the same day it starts.

What about cached tokens? Are those priced separately?

Yes. Anthropic prompt-cache reads and OpenAI cached_tokens are priced from a separate row on the rate card. Cache creation is billed as regular input. The SDK never sends a cost number; pricing happens server-side from the rate card so you can't be wrong about your own bill.

Can I cap spend, not just get alerted about it?

Yes, with the opt-in spend guard. One call at the top of a scheduled script checks your rules (total spend, spend in a window, or run count, keyed by customer, agent, and function) against real cumulative spend before the run starts. Once a rule trips, or you hit pause in the dashboard, the next run refuses to start. Available on Team and Business plans.

How AgentPing implements cost attribution.

Spend is the cost-attribution side of AgentPing. The features page walks through the rate card model, anomaly detection, and per-customer cost rollups. The docs go one level deeper.

Spend features → Spend docs → Spend guard → What is AI agent observability? →