Scheduled agents fail differently. They go quiet. The cron was removed during a refactor, the upstream API changed shape, the prompt slot started returning empty strings. No exception, no 500, no paging signal. The team usually finds out from a customer asking why their dashboard is stale, eleven days in.
Sentry needs an exception. Datadog needs a latency spike. Both watch the system while it's running and report when something goes wrong. Neither watches the gap when a scheduled run simply doesn't happen. There's no exception to catch, no request to time out, and no log line to alert on.
The minimal instrumentation is a cron expression on each scheduled agent, plus a tolerance window. A scheduler worker checks every minute; when an expected run is past the window, the agent's alert route fires with the last-successful-run context so the on-call engineer can act without opening a browser.
{
"alert": "schedule_missed",
"agent": "daily-summary",
"team": "production",
"expected_at": "2026-05-16T09:00:00Z",
"tolerance_minutes": 5,
"now": "2026-05-16T09:06:14Z",
"last_successful_run": {
"id": "run_018f3a2b9c1d7e8fa4b9c2d7e8f1a3b6",
"finished_at": "2026-05-15T09:00:42Z",
"duration_ms": 41892,
"cost_gbp": 0.149
},
"dashboard_url": "https://app.agentping.io/agents/daily-summary"
}
Same payload across the generic webhook route. PagerDuty gets a protocol-shaped variant; Slack and Teams get formatted message blocks; email gets a digest. The on-call engineer sees the missed run plus the last good run's context in one notification.
Pulse is the live-monitoring half of AgentPing. The features page walks through the live activity feed, schedule freshness, run traces, and the alert routes. The docs go one level deeper.
Pulse features → Pulse docs → What is AI agent observability? →