Checks

A deterministic, server-side assertion against a finished run's output. Pass or fail with metadata. Configured per agent in the dashboard. No LLM cost.

Use checks for things expressible as a rule. Save LLM-as-judge for things that require taste.

Check types

Type What it asserts
JSON schema Output conforms to a schema
Regex match Output matches (or doesn't match) a pattern
Length bounds Output length within N to M characters or tokens
Required fields Output object contains specific keys
Tool-call assertion Agent did (or didn't) call tool X
Numeric range Numeric output falls within bounds

Each check returns pass or fail with metadata like {"expected": "100-500", "got": 47}. Multiple checks evaluate independently; one failure doesn't short-circuit the rest.

A failing check can mark the run with a badge, feed into the score distribution (status='success' AND all checks passed is the success criterion for cost-per-successful-run), and trigger an alert on the same routes Pulse uses.

Tier availability

Tier Checks
Starter Basic (length, required fields, simple regex)
Team Full (schema, tool-call, complex regex, numeric)
Business Full
Enterprise Full

Example: schema check

A support-triage agent returns:

{
  "department": "billing",
  "severity": 2,
  "reasoning": "Customer asked about an invoice"
}

The schema pins the contract:

{
  "type": "object",
  "required": ["department", "severity"],
  "properties": {
    "department": {"enum": ["billing", "support", "engineering"]},
    "severity": {"type": "integer", "minimum": 1, "maximum": 5}
  }
}

Runs that come back with a different shape or unknown department fail the check immediately. The dashboard surfaces the diff; failures cluster so a prompt regression appears as one incident.