ArticlesMethod
Compaction: keeping your agent's context window lean
Every model degrades as its window fills. How to keep the working set lean without losing what matters.
Here's a failure that catches everyone eventually: a session that started sharp slowly goes wrong. The agent gets vaguer, forgets a constraint you set earlier, contradicts itself — and it's more confident than ever. You blame the model. The model is fine. The window is full.
Every model degrades as its context fills. Keeping it from filling is a skill, and it's called compaction. Here's what's happening and how to stay ahead of it.
Context rot: why full windows go wrong
The context window is the finite amount a model can hold at once. It's tempting to treat it like a backpack — keep stuffing things in until it's full. But it doesn't work like storage. As the window fills, the model gets worse at using what's in it. Details slip. Earlier instructions fade. Output drifts confidently off-target.
This is context rot, and it's a property of every frontier model, not a bug in any one of them. The practical consequence is counterintuitive: past a point, more context produces worse results. Teams running serious agent workloads treat managing this as core work — Arize's writeup on context management in agent harnesses frames it as a first-class engineering concern, not housekeeping.
Budget by fill percentage, not token count
The mental shift that fixes most of this: stop counting tokens, start watching the fill level.
Token counts are abstract and easy to ignore until you're over. Fill percentage is a dial you can act on. The working rule: compact well before full — many practitioners move once the window is past roughly half to sixty percent, because that's where rot starts to bite. You're not trying to use every last token. You're trying to keep the window in the range where the model is sharp.
Stuart Leo
Don't manage tokens — manage the fill percentage, and compact before the agent starts guessing.
The three compaction moves
Compaction isn't one action. C² keeps context lean with three mechanisms working at different timescales:
- The session brief — point-in-time compaction. At every context break, digest the session down to "what the next agent needs cold" and drop the raw transcript. Triggered by the break, not by the window filling.
- The Router — continuous compaction. The Router links rather than embeds, so each session loads a lean, curated slice of the contextbase, not the whole thing. Lazy-loading is compression: read the index, not every file.
- The Learn loop — capture then consolidate. Capture discoveries fast during a session, then periodically merge duplicates into canonical docs and prune the stale. The contextbase stays dense in signal, not bloated.
Together these handle the durable, cross-session half of the problem — the context that should persist, kept lean.
What to keep, summarise, and drop
For the within-session half — a single task that's overflowing — the moves are simpler:
- Keep the original intent and the last couple of steps. The agent always needs to know what it's doing and where it just was.
- Summarise what's gone stale. Twenty steps of exploration compress to "here's what I found." The conclusions matter — the transcript doesn't.
- Drop what's irrelevant to what's left. A tool output you've already acted on is dead weight in the window.
When a single task still overflows after this, the cleanest move is to have the agent write down where it's up to, then start a fresh session from that note. You've traded a bloated, rotting window for a short, sharp one.
Compaction as a habit
The mistake is treating compaction as something you do in a panic when things go sideways. By then the rot has already cost you. Make it routine: a session brief at every break, a Router that links instead of embeds, a fresh start when a task runs long. Done as habit, you simply never reach the wall.
Don't manage tokens — manage the fill percentage, and compact before the agent starts guessing.
Start here: see how to stop your agent forgetting, the field note on context rot, or read the method.
FAQ
- What is compaction in AI agents?
- Compaction is keeping an agent's working context lean — digesting what matters and dropping the rest — so the context window doesn't fill up and degrade. It's how you keep a long task or a long-running project from hitting the wall where the model starts losing track.
- What is context rot?
- Context rot is the way every model degrades as its context window fills. Past a certain fill level the agent drops details and produces confident, wrong output. Compaction is the countermeasure — you keep the window well below the level where rot sets in.
- How do I know when to compact?
- Watch the fill level, not the token count. A good rule is to compact well before the window is full — many practitioners act around the 50–60% mark. Within a session, when it's getting long, have the agent summarise where it's up to and start fresh from that note.
Related
AI agents forget everything when the session ends or the context window fills. Why the memory wall happens — and the written-context fix that makes learning stick.
The day context rot bit me at 80% fullA field note on watching a long session quietly degrade as the context window filled — the wrong outputs, the late diagnosis, and the compaction habit I keep now.
Write your first session briefA session brief is a short note at the end of each working session: what changed, why, what's verified, what's next. How to write one and why it compounds.