Verify catches quality drift before your users do.
A run can finish without error and still be worse than yesterday's. Verify grades every run against standards you write in plain English, and catches the slow decline after a prompt or model change while it is happening, not when the tickets arrive.
checks
A rubric, a JSON schema, a handful of rules. These checks run on every single run at zero marginal cost, because no model is involved, so the obvious failures never reach a customer.
llm-as-judge
For the judgement calls, a separate model scores each run against your rubric. It grades against the standard you wrote, not its own opinion. You set the sample rate, score every run or one in fifty, so you decide exactly how much quality assurance costs.
drift detection
A prompt change on Monday quietly drops your average score. Support tickets arrive Wednesday. Verify watches the live distribution and flags the slide on day one, not after the churn.
The difference between finding out from your dashboard and finding out from an angry customer.
rubrics
Describe what good looks like in a sentence; no eval framework, no labelled dataset.
checks
Deterministic checks run on every run at zero marginal cost, catching the obvious breaks.
control
Score every run or one in fifty. Quality assurance costs exactly what you choose.
drift
A downward shift in the live judge-score distribution surfaces on your dashboard before the tickets do.
trends
Average score and pass rate tracked over time, so a slow regression is visible, not a surprise.
coverage
Quality measured on live production traffic, not a stale offline batch from last sprint.
A rubric or schema you write in plain English becomes the bar every run is held to.
Checks on all of them, judge on a sample you control, all on the live stream.
The distribution moves the day a prompt regresses, and you hear about it then.
Verify catches quality drift before your users do. Two lines of code, or one curl. Live in minutes, free while we are in private beta.
Free to start. No card. The SDK never blocks your agents.