1 2 3 4
002 the hidden cost

Your agents misbehave
quietly.

A web service crashes loud: 500s, paging, customer screams. An AI agent rarely does. It keeps returning something. Usually plausible. Sometimes wrong. Always billable.

the silent leak

Token spend, 4x in a week.

One retry loop. One agent that suddenly starts pulling the full transcript instead of a summary. You see it on the 1st of the month, when the invoice lands. By then it's $8,400 on the wrong cost centre.

the missed cron

Three weeks of dead silence.

Your nightly summarizer hasn't run since the deploy on the 14th. Nothing crashed. Nothing paged. The job just stopped firing, and no one looked at the empty table until a customer asked.

the drift

Quality slid 14%. No alert fired.

A prompt change, a model upgrade, a temperature tweak. The agent still returns. The answers are subtly worse. Two weeks later, support tickets tick up. Someone has to bisect.

already have observability? here's the gap.

Your existing tools watch the infrastructure. None of them watch the agent itself.

SENTRY
Watches for exceptions. Your agent didn't throw; it confidently returned the wrong answer.
DATADOG
Watches latency, CPU, request count. None of that tells you the answer changed.
OPENAI USAGE
Tokens by API key. You can't trace cost back to a specific agent, run, customer, or feature.
YOUR LOGS
A wall of text. Searchable at 3am. Not browsable at planning time.