Checks
A deterministic, server-side assertion against a finished run's output. Pass or fail with metadata. Configured per agent in the dashboard. No LLM cost.
Use checks for things expressible as a rule. Save LLM-as-judge for things that require taste.
Check types
| Type | What it asserts |
|---|---|
| JSON schema | Output conforms to a schema |
| Regex match | Output matches (or doesn't match) a pattern |
| Length bounds | Output length within N to M characters or tokens |
| Required fields | Output object contains specific keys |
| Tool-call assertion | Agent did (or didn't) call tool X |
| Numeric range | Numeric output falls within bounds |
Each check returns pass or fail with metadata like {"expected": "100-500", "got": 47}. Multiple checks evaluate independently; one failure doesn't short-circuit the rest.
A failing check can mark the run with a badge, feed into the score distribution (status='success' AND all checks passed is the success criterion for cost-per-successful-run), and trigger an alert on the same routes Pulse uses.
Tier availability
| Tier | Checks |
|---|---|
| Starter | Basic (length, required fields, simple regex) |
| Team | Full (schema, tool-call, complex regex, numeric) |
| Business | Full |
| Enterprise | Full |
Example: schema check
A support-triage agent returns:
{
"department": "billing",
"severity": 2,
"reasoning": "Customer asked about an invoice"
}
The schema pins the contract:
{
"type": "object",
"required": ["department", "severity"],
"properties": {
"department": {"enum": ["billing", "support", "engineering"]},
"severity": {"type": "integer", "minimum": 1, "maximum": 5}
}
}
Runs that come back with a different shape or unknown department fail the check immediately. The dashboard surfaces the diff; failures cluster so a prompt regression appears as one incident.