A web service crashes loud: 500s, paging, customer screams. An AI agent rarely does. It keeps returning something. Usually plausible. Sometimes wrong. Always billable.
One retry loop. One agent that suddenly starts pulling the full transcript instead of a summary. You see it on the 1st of the month, when the invoice lands. By then it's $8,400 on the wrong cost centre.
Your nightly summarizer hasn't run since the deploy on the 14th. Nothing crashed. Nothing paged. The job just stopped firing, and no one looked at the empty table until a customer asked.
A prompt change, a model upgrade, a temperature tweak. The agent still returns. The answers are subtly worse. Two weeks later, support tickets tick up. Someone has to bisect.
Your existing tools watch the infrastructure. None of them watch the agent itself.