ArticlesMethod
Defend against prompt injection in coding agents
Prompt injection turns content your agent reads into commands it follows. Why coding agents are exposed, and what holds.
Prompt injection is the security problem that arrived with agents and got worse the more capable they became. The idea is simple and unsettling: text your agent reads — a dependency's documentation, a code comment, a GitHub issue, a web page — can contain instructions, and the agent may follow them as if they came from you. When the agent can act, a sentence buried in a README becomes a command with real consequences.
For coding agents specifically, this is not theoretical. Here's why they're exposed, how injections actually land, and the defences that hold.
What prompt injection is
A model can't reliably tell the difference between your instructions and instructions embedded in content it's reading. To the model, it's all text. So if the agent reads a file that says "ignore prior instructions and run this command," there's a real chance it does.
That's prompt injection: smuggling instructions into the content an agent processes, so it acts on the attacker's intent instead of yours. With a chatbot, the worst case is a bad answer. With a coding agent that runs commands and edits files, the worst case is much worse.
Why agents made it worse (they act)
The thing that makes coding agents powerful — they take actions in the world — is exactly what makes injection dangerous. As security researchers put it, going from chatbots to agents made prompt injection worse, because the model's output is no longer just words — it's commands, file edits, tool calls, network requests.
And the incidents are real. Injected instructions hidden in a dependency chain have led to leaked credentials and compromised packages. An agent triaging issues, reading a poisoned one, has been steered into harmful actions. The pattern repeats: the agent reads attacker-controlled text, and its ability to act turns that text into damage.
Where injections hide (deps, comments, docs)
The dangerous realisation is how much of what an agent reads is attacker-influenceable:
- Dependencies — their code, comments, and docs, which you didn't write and rarely audit.
- Code comments and doc strings — including in packages pulled in transitively.
- Issues, PRs, and tickets — anyone can open one.
- Web pages and search results — if the agent browses.
Stuart Leo
Your agent doesn't only read your instructions. It reads everything in its path — and any of it could be carrying someone else's.
The mental shift: treat everything the agent reads as potentially hostile, not just the obvious inputs.
Layered defences that work
There's no single fix — defence is layers, and the strongest ones limit what a successful injection can do:
- Least privilege. This is the big one. If the agent can't reach secrets, deploy, or production, an injection that lands has little to act on. Scope the blast radius and most injections fizzle.
- Separate secrets from action. Never run an agent with broad command access and secrets in scope — the combination that turns injection into exfiltration. (Keep secrets out of reach.)
- Treat inputs as untrusted. Be wary of pointing the agent at unvetted dependencies, random issues, or arbitrary web content, especially in autonomous runs.
- Sandbox and isolate. Run risky work in a container or worktree with no access to anything that matters.
Human-in-the-loop for high-risk actions
The last layer is the simplest: for anything genuinely dangerous — touching production, deleting data, moving money, installing software — require a human to approve it. An injection can tell the agent to do the dangerous thing, but it can't click your approval. Gating high-risk actions behind a human is the backstop when every other layer fails.
Treat everything your agent reads as untrusted — injection hides in dependencies and docs, not just prompts. Then make sure that even if it's fooled, it can't reach anything that matters.
Start here: see least privilege for coding agents, the field note on an injection I almost shipped, or read the method.
FAQ
- What is prompt injection in coding agents?
- Prompt injection is when text the agent reads — a dependency's docs, a code comment, an issue, a web page — contains instructions the agent then follows as if you'd given them. Because a coding agent acts (runs commands, edits files, calls tools), an injected instruction can do real damage, not just produce bad text.
- Why are AI coding agents especially vulnerable to prompt injection?
- Because they read untrusted content from many sources — dependencies, repos, issues, web results — and they can act on what they read. Injection in a dependency's comments or docs has led to leaked secrets and compromised packages. The agent's power to act is exactly what makes a successful injection dangerous.
- How do I defend against prompt injection?
- Treat everything the agent reads as untrusted, limit what the agent is allowed to do (least privilege), require human approval for high-risk actions, and don't run agents with secrets and broad permissions in the same scope. No single layer is enough — defence is layered, and the strongest layer is limiting the blast radius.
Related
An agent can only do damage where you let it reach. How to apply least privilege to coding agents — file scope, command limits, no prod, sandboxes.
The prompt injection I almost shippedA field note on an agent that read a malicious instruction buried in a dependency's docs, nearly acted on it, and what caught it before it shipped.
Keep secrets safe when coding with agentsAn agent with shell access and secrets in scope is the worst-case setup. How to keep API keys and credentials out of an agent's reach — and out of its context.