ArticlesMethod

Least privilege for coding agents: scope the blast radius

An agent can only do damage where you let it reach. How to apply least privilege — file scope, command limits, no prod.

Stuart LeoJune 9, 20265 min read

The most reliable safety control for AI coding agents isn't a clever filter or a better prompt. It's an old security principle: least privilege — give the agent only the access the task needs, and nothing more. Get this right and most ways an agent can hurt you shrink from catastrophe to inconvenience, because it simply can't reach the things that would make a mistake serious.

Here's how to think in blast radius, and the concrete limits that keep an agent's reach short.

The blast-radius mindset

Stop asking "will the agent behave?" and start asking "what's the worst it could do if it didn't?" That second question is the blast radius — the set of things the agent can affect — and it's the thing you actually control.

You can't guarantee an agent won't make a mistake or get fooled by a prompt injection. What you can guarantee is what it's able to touch when it does. An agent that can only edit a feature folder and run the tests can't leak your secrets or wreck production, no matter how badly a single step goes wrong. Shrink the blast radius and you've capped the downside of every failure mode at once.

Stuart Leo

You can't make an agent never fail. You can make sure that when it does, it can't reach anything that matters.

This is why security teams treat AI agents like any other powerful, imperfectly-trusted component — the control that scales is limiting reach, not predicting behaviour.

Scope files and commands

The first reach to limit is the obvious one: what the agent can read, write, and run.

  • File scope. Point the agent at the part of the codebase the task needs. It doesn't need write access to the whole repo to add a field to checkout.
  • Command scope. Restrict what it can execute. A task that edits and tests code doesn't need free rein over arbitrary shell — and arbitrary shell is how a small problem becomes a big one.

Match the scope to the task. A bigger, trusted task gets more reach — a routine or autonomous one gets the minimum. The scope is a dial you set per job, not a permanent grant.

Forbidden zones: prod, auth, payments, secrets

Some areas are never worth an agent's autonomous reach, regardless of the task:

  • Production systems — a mistake here is live, immediately.
  • Auth — the keys to everything.
  • Payments — money mistakes are real mistakes.
  • Secrets — see keeping secrets out of reach.

Declare these forbidden zones explicitly, before the work. The upside of letting an agent autonomously touch production is never worth the downside. Gate any work near them behind a human.

Sandboxes and isolation

For genuinely risky or autonomous work, don't just scope within your environment — isolate the environment itself. Run the agent in a container, a microVM, or at least a dedicated git worktree and branch, with no access to your host, your other projects, or anything live. Now even a total failure is contained to a box you can inspect and throw away. Isolation is least privilege applied to the environment, not just the file tree.

Human approval for high-risk actions

The final layer is the cheapest and most effective: for the actions that would actually hurt — deploying, deleting data, moving money, installing dependencies — require a human to approve. This is the backstop that holds when scoping and isolation miss something. An agent (or an injection driving it) can want to do the dangerous thing, but a human gate means it can't do it alone.

Combine these — scoped files and commands, forbidden zones, isolation, human gates on the dangerous stuff — and the agent does its work with a blast radius small enough that you can let it run, including unattended.

An agent's safety is its blast radius — give it the least reach the task needs, and no more.

Start here: see how to keep secrets safe, running agents overnight safely, or read the method.

FAQ

What does least privilege mean for AI coding agents?
Giving the agent only the access a task actually needs, and no more — the files it must touch, the commands it must run, and nothing else. The principle from human security applies directly: limit reach so that a mistake, or a successful prompt injection, can only do limited damage.
How do I limit what a coding agent can do?
Scope its file access to the work at hand, restrict the commands it can run, forbid the dangerous zones (production, auth, payments, secrets), run it in a sandbox or isolated worktree, and require human approval for high-risk actions. Define the limits before the work, matched to what the task needs.
Why is least privilege the best defence for AI agents?
Because it limits the blast radius regardless of what goes wrong. Whether the agent makes an honest mistake or follows a prompt injection, it can only act where you let it reach. Narrow that reach and most failure modes shrink from catastrophe to inconvenience.