ArticlesFoundations

What is an agent harness? Agent = model + harness

The model supplies intelligence. The harness supplies hands, eyes, memory and the loop — and your contextbase lives inside it.

Stuart LeoJune 8, 20265 min read

There's a phrase worth learning if you're building with AI agents, because it explains most of why two people using the same model get wildly different results: Agent = Model + Harness.

The model is the intelligence — Claude, GPT, Gemini, whichever. The harness is everything else that turns that intelligence into something that can actually build software. And once you can see the harness, you can see where your own results are won or lost — because it's almost never the model.

The equation: agent = model + harness

On its own, a model is a brain in a jar. It can reason brilliantly about code it's shown, but it can't open a file, run a test, read your repo, or remember what it did an hour ago. It just produces text.

The harness is what wraps the model to make it a working agent. It gives the model:

  • Hands — the ability to edit files, run commands, call tools, commit to git.
  • Eyes — the ability to read your codebase, inspect output, see an error.
  • Memory — some way to carry what it's learned across steps and sessions.
  • A loop — the cycle of gather information, act, check the result, retry on failure.
  • Boundaries — the limits on what it's allowed to touch.

Same model, weak harness: a clever assistant that forgets everything and guesses at your code. Same model, strong harness: a partner that reads the right context, makes the change, runs the tests, and fixes its own mistakes. The shorthand has caught on because it's true — Anthropic's engineers frame the whole challenge of capable agents as building effective harnesses for long-running work, not just picking a smarter model.

What the harness provides

The most visible part of the harness is the loop. Terminal-first coding agents show it to you in real time: the agent reads a file, plans, calls a tool, edits code, runs a command, sees the output, reacts to an error, tries again. That gather-act-verify cycle — running until the work is done or the limit is hit — is the harness doing its job.

You feel a good harness most when something goes wrong. A weak one fails on the first error and hands you a broken result. A strong one notices the failing test, reads the message, forms a hypothesis, and tries a fix — the way a careful developer would. That recovery behaviour isn't the model being smart. It's the harness being well built.

The three nested layers

Here's the cleanest way to place the harness against the terms you've probably already met — prompt engineering and context engineering. They're not rivals. They're three nested layers, each containing the one before:

  • Prompt engineering — the message. One composed input.
  • Context engineering — the memory. What a curator keeps or drops in a finite window.
  • Harness engineering — the machine. The gather-act-verify loop, retrying on failure.

Stuart Leo

Prompt engineering is the message. Context engineering is the memory. Harness engineering is the machine that runs them both in a loop.

This is why "isn't this just prompting?" misses it. The harness contains the prompt and the context. Martin Fowler's team has started calling the discipline harness engineering for exactly this reason: the leverage for a serious builder is in the machine, not in any single message.

Why the harness alone still forgets

Here's the catch, and it's the one most harness conversations skip. The harness gives the agent memory within a run — it can carry state across the steps of a task. But when the run ends, that memory is gone. Close the session, and the harness wakes up blank tomorrow.

So a great harness with no durable knowledge still has amnesia. It can build beautifully today and rediscover yesterday's bug tomorrow, because the loop is excellent but nothing it learned was written down. The machine is strong. The memory is a sieve.

That gap is exactly where most of your real-world pain lives — the re-explaining, the repeated mistakes, the agent that's sharp in the moment and forgetful across days.

C² as a harness methodology

This is the layer C² works at. C² is a harness methodology — not a tool you install, but the discipline that makes the machinery reliable and, crucially, gives it a memory that lasts:

  • The Prompt Brief is the prompt layer — what to build, what not to, what to read first.
  • The contextbase and the Router (your AGENTS.md / CLAUDE.md) are the context layer — durable, version-controlled knowledge the agent reads before it acts.
  • The Cascade of briefs and the verification chain are the harness loop — gather, act, verify, and write down what was learned.

That last move is the one that fixes the sieve. C² has the agent capture what it learned into the contextbase before the run ends — so the next run, in any harness, starts ahead. The harness gives your agent hands and a loop. The contextbase gives it a memory that survives the session. You want both, and they multiply.

If you've been chasing a better model to fix flaky results, look at the harness first — and then at whether anything it learns ever gets kept.

Keep going: see where the contextbase fits, how context engineering works inside the harness, or read the method.

FAQ

What is an agent harness?
An agent harness is everything in an AI agent except the model itself — the tools, memory, and the loop that lets it gather information, act, and verify the result. The shorthand is Agent = Model + Harness: the model supplies intelligence, the harness supplies the hands, eyes, memory and safety boundaries that turn intelligence into action.
What's the difference between a harness and a contextbase?
The harness is the machinery — tools, the gather-act-verify loop, retries. The contextbase is the knowledge that machinery reads: decisions, patterns, gotchas, session memory. The harness gives the agent reach; the contextbase gives it a memory that survives the session. A good setup needs both.
What is harness engineering?
Harness engineering is the practice of designing and tuning that machinery around the model — how it loads context, calls tools, verifies output, and recovers from failure. It's the outermost of three nested layers: prompt engineering (the message), context engineering (the memory), harness engineering (the machine).
Is C² a harness?
C² is a harness methodology. It works at the harness layer and contains the other two: the Prompt Brief is its prompt layer, the contextbase and Router are its context layer, and the Cascade plus verification chain are the harness loop. It's the discipline that makes the machinery reliable, rather than a specific tool.