ArticlesMethod

Write acceptance criteria your agent can check itself

The difference between an agent that drifts and one that self-verifies is criteria you can run.

Stuart LeoJune 8, 20265 min read

There's a clean line between an AI agent that builds what you meant and one that drifts off into plausible nonsense. It isn't the model. It's whether you told the agent how to know it was done — in a way it can actually check.

That's what acceptance criteria are: the definition of done, written so the agent can verify it itself. Get them right and the agent stops handing you confident, wrong work. Here's how.

Why agents drift from intent

Give an agent a vague goal — "improve the checkout" — and it will do something. It has no way to know what "improved" means to you, so it optimises for something plausible. Each step takes it a little further from what you actually had in mind, and because every step looks reasonable in isolation, you don't notice until you're well off course.

This is the drift that makes people distrust agents. It isn't the agent being careless. It's the agent flying without a target. The spec-driven development community has a sharp diagnosis for it: the failure mode of agents that produce plausible code that drifts from intent is the absence of a checkable spec, not a weak model.

Criteria versus opinions

Here's the test that fixes most of it. For everything you'd call a requirement, ask: can I run this?

  • "The form rejects an empty email." → You can test it. Criterion.
  • "The form should be robust." → Nobody can check that objectively. Opinion.
  • "Bookings past capacity are refused." → Testable. Criterion.
  • "Make the checkout feel premium." → Unrunnable as written. Opinion.

Stuart Leo

If you can't run the criterion, it's an opinion — and the agent will quietly ignore it.

Opinions aren't useless — they belong in the design conversation. But they can't be acceptance criteria, because the agent can't verify them, which means it won't.

Writing runnable acceptance criteria

Turn each requirement into something with an observable, checkable outcome:

Acceptance criteria — booking form
- Submitting with no email is rejected with an error.
- Booking a class at capacity (10/10) is refused.
- Booking the 10th seat of a 10-seat class succeeds.
- A successful booking sends one confirmation email.
- The full test suite passes.

Each line is a thing you — or the agent — can actually do and observe. Where you can, make them literal commands or tests. "The test suite passes" is the strongest criterion of all, because it's unambiguous and the agent can run it without you.

Letting the agent self-verify

Here's the payoff. Once the criteria are runnable, the agent can check its own work before it hands it back. It builds, runs the criteria, sees the 11th-booking case still succeeds when it shouldn't, fixes it, and re-runs — all without you in the loop.

That's the difference between an agent that needs babysitting and one that can run a task to a real finish. Criteria you can run turn "I think this is done" into "this meets the bar, here's the proof." For genuinely autonomous work — the agent building while you're away — this isn't optional. Unrunnable criteria mean unattended drift.

Criteria in the brief cascade

Acceptance criteria aren't a separate document. They belong in the brief — the short spec of what to build. A C² Prompt Brief carries them as a standard section: what to build, what not to build, what to read first, and how you'll know it worked. Put them there and every task starts with its finish line drawn.

When the work is done, the criteria also tell your session brief what to record as verified — and what isn't. The honesty about what's actually been checked carries forward to the next session.

If you cannot run the criterion, it is an opinion — and the agent will quietly ignore it. Make the finish line something the agent can see.

Start here: see C² vs spec-driven development, how to write a session brief, or read the method.

FAQ

What are acceptance criteria for an AI agent?
Acceptance criteria are the conditions a piece of work must meet to be considered done — written so the agent can check them itself. The strongest ones are runnable: a command, a test, a concrete observable outcome. If the agent can run the check, it can verify its own work before handing it back.
Why do AI agents drift from what I asked?
Because 'done' was never defined in a way they could check. Given a vague goal, an agent optimises for something plausible and drifts further from your intent with each step. Runnable acceptance criteria give it a target it can test against, which stops the drift.
What makes a good acceptance criterion?
You can run it. 'The form rejects an empty email' is a criterion — you can test it. 'The form should be robust' is an opinion — nobody can check it objectively. The rule: if you can't run it, it's not a criterion, and the agent will quietly ignore it.