000 glossary

The terms that keep coming up.

Plain-English definitions of the words AgentPing uses across the dashboard, docs, and SDK. Each term has a stable anchor so you can link to a definition directly. If something here is missing, tell us and we'll add it.

Agent run

A single invocation of an AI agent, captured as one telemetry record. The unit of analysis for cost attribution, monitoring, and quality scoring.

A run may make zero or many LLM calls; it is still one record.

Run ID

A client-generated UUIDv7 identifier for an agent run. Constructed locally before any network call so the agent has an ID immediately and idempotent retries work.

Format: run_ prefix plus 32 hex characters.

Rate card

A server-side mapping from (provider, model) to per-token prices. AgentPing prices every run from the rate card so the SDK never sends a cost number.

Defaults ship for every Anthropic and OpenAI model; override per team.

Cached tokens

Tokens served from a provider-side prompt cache. Anthropic prompt-cache reads and OpenAI cached_tokens are priced separately from fresh input.

Cache creation is billed as regular input; cache hits are usually 90% cheaper.

Cost-per-successful-run

Total cost divided by the number of runs that finished cleanly (status=success and quality threshold met). The number finance can build budgets around.

Excluding failed and timed-out runs avoids hiding loops in the average.

Anomaly detection

Per-agent statistical check on today's spend against a rolling 14-day baseline. Today's spend more than 2.5σ above the trailing mean fires the agent's alert route.

Threshold configurable per team; default catches most retry loops on day one.

Heartbeat

A one-call signal sent at the end of a scheduled job, indicating "I ran". The cheapest form of monitoring for systems that cannot host a full SDK.

AgentPing accepts heartbeats at GET /v1/ping and POST /v1/heartbeats.

Schedule freshness

A monitoring pattern that pages when a scheduled agent does not run inside its expected window. Set a cron expression plus a tolerance window per agent.

Distinct from APM exceptions: the signal is the absence of a run, not an error.

Tolerance window

The grace period after a scheduled run's expected time before the alert fires. Prevents flapping on normal queue jitter.

Default 5 minutes; configurable per agent.

Cron expression

Five-field schedule string (minute, hour, day of month, month, day of week) telling AgentPing when a scheduled agent should run.

AgentPing also accepts plain-English shorthands like "@daily" or "every 5 minutes".

p95 latency

The 95th-percentile run duration over a rolling window (24h, 7d, 30d). Most runs land below it; the upper tail is where regressions hide.

A sudden p95 move usually means a model swap or a longer prompt.

Rubric

A plain-English scoring guide for an agent's output. The judge model reads the rubric and produces a numeric score with reasoning.

Stored as YAML with a model name, sample rate, pass threshold, and calibration anchors.

Judge model

The LLM that scores production runs against a rubric. AgentPing defaults to Claude Haiku 4.5; configurable per rubric.

Cost is bounded by a hard per-team monthly cap (default £50/month).

LLM-as-judge

The technique of scoring one LLM's output by asking another LLM to evaluate it against a rubric. Sample-based, with calibration anchors to keep scores stable.

Catches qualitative drift that deterministic checks cannot see.

Deterministic check

A non-LLM validator that runs on every output at zero LLM cost: JSON schema, regex, length bounds, required fields, tool-call assertions, numeric range.

Answers pass/fail questions about shape, not quality.

Calibration anchors

A short list of good and bad runs tagged by ID. The judge prompt includes them in-context for every call, so scores stay comparable across rubric versions.

10-20 anchors typically; 5 good and 5 bad is a fine starting point.

Sample rate

The fraction of normal runs scored by LLM-as-judge. Failed runs are always sampled regardless.

Default 0.10 (10%); configurable 1-100%. Stratified to bias toward failed and high-cost runs.

Pass threshold

The minimum score (typically on a 1-5 scale) that an output must reach to count as successful.

Used by cost-per-successful-run and by alert rules.

Rubric versioning

Every rubric edit creates a new version. Scores are tagged with the version that generated them, so changes do not retroactively re-score history.

The dashboard shows a visible break when a version changes.

Drift detection

A statistical change-detection signal on the score distribution. A single 2/5 is noise; the mean dropping from 4.2 to 3.8 over a week is drift.

z-score against the trailing 14-day baseline.

Parent run

When agent A spawns agent B, A is the parent and B is the child. AgentPing surfaces the full tree so a downstream failure can be traced to its root run.

Propagated via env var AGENTPING_PARENT_RUN or explicit parameter.

API key

A full-access team credential, prefixed apk_. Used by the SDK in header authentication.

A leaked API key compromises the whole team; rotate immediately.

Ping token

A per-agent scoped credential, prefixed ping_. Safe to embed in URLs for cron heartbeats and shell scripts.

A leaked ping token only compromises that one agent's heartbeat.

001 read next

The longer reads that put these terms in context.

What is AI agent observability? Features Docs