What is model routing for AI coding agents?

Model routing means sending each task to the right-sized model instead of running everything on one. Cheap, fast models handle discovery and simple edits; mid-tier models handle most work; the expensive top model is reserved for genuinely hard reasoning. Matching model to task is the single highest-leverage cost optimisation.

How much can model routing save?

A lot. For an agent making hundreds of calls per session, running all-top-model versus a sensible mix can be the difference between several dollars and well under a dollar per session — often a 3-5x reduction — without a meaningful drop in quality, because most steps never needed the expensive model.

Doesn't using a cheaper model hurt quality?

Only if you use it for the wrong steps. Discovery, file reading, and simple edits run fine on a cheaper model; hard reasoning gets the strong one. And richer context lets smaller models punch above their weight, because they're guessing less. The goal is right-sized, not cheapest.

ArticlesMethod

Route models to cut your AI agent bill

Running everything on the top model is the easiest way to overpay. Route by task and watch the bill fall.

Stuart LeoJune 8, 20264 min read

If your AI agent bill is uncomfortable, there's a good chance you're making the most common mistake: running every step on the most powerful model. It feels safe — best model, best results. It's also the easiest way to overpay by several times, because most of what an agent does doesn't need the expensive model at all.

The fix is model routing: matching the model to the task. It's the single highest-leverage cost move in agentic development, and it doesn't cost you quality. Here's how to think about it.

Why one model for everything overpays

An agent's loop is a mix of work. Some steps are hard — untangling a subtle bug, designing an architecture. Most steps are not — reading a file, listing a directory, making a small edit, summarising output. They all flow to whatever model you set.

Set that to the top-tier model and you're paying premium rates for the easy 80% as well as the hard 20%. The analysis of AI coding costs makes the gap concrete: for an agent making a couple of hundred calls a session, all-premium can run several times the cost of a sensible Haiku/Sonnet/Opus-style mix — for work that's largely identical in outcome. You're not buying better results on the easy steps. You're just paying more for them.

Match the model to the task

The principle is simple: send each task to the smallest model that can do it well.

Cheap, fast models for discovery, reading, simple edits, summarising — the high-volume, low-difficulty steps.
Mid-tier models for the bulk of real implementation work.
The top model for genuinely hard reasoning — the gnarly bug, the architecture call, the thing that actually needs the horsepower.

Stuart Leo

The cheapest model isn't the goal and the best model isn't the goal. The right model for each step is.

You can route by hand (pick the model for a session based on the work) or use a router that classifies each request and sends it on. Either way, the savings come from no longer paying premium for the easy majority.

Cheap for discovery, strong for reasoning

A useful default split: let a cheap model do the exploring and a strong model do the deciding. Finding the relevant files, reading them, gathering context — that's bulk work a fast model handles fine. Reasoning about the tricky change once the context is gathered — that's where the strong model earns its rate.

This mirrors how the loop actually works: lots of cheap gathering, a little expensive thinking. Pay accordingly.

Agent-agnostic means you can route freely

Here's where the method matters. Because C² is agent-agnostic — plain markdown in git, no lock-in to one model or tool — you're free to route across models and even vendors without rework. The contextbase doesn't care which model reads it. That freedom is what makes routing practical: you can put the cheap model and the expensive model on the same project, same context, same day.

A method welded to one model can't do this. An agent-agnostic one routes at will.

Richer context lets smaller models win

The deepest cost lever loops back to context. A smaller model with rich, precise context often matches a bigger model working blind, because most of what the expensive model buys you is compensating for missing context. Give any model a good contextbase and it guesses less — which means you can push more work down to cheaper models without losing quality. Good context doesn't just cut re-sending cost. It lets you run cheaper models more often.

The highest-leverage cost move isn't the cheapest model — it's the right model for each step.

Start here: see why agents get expensive, the field note on a tripled bill, or read the method.

FAQ

What is model routing for AI coding agents?: Model routing means sending each task to the right-sized model instead of running everything on one. Cheap, fast models handle discovery and simple edits; mid-tier models handle most work; the expensive top model is reserved for genuinely hard reasoning. Matching model to task is the single highest-leverage cost optimisation.
How much can model routing save?: A lot. For an agent making hundreds of calls per session, running all-top-model versus a sensible mix can be the difference between several dollars and well under a dollar per session — often a 3-5x reduction — without a meaningful drop in quality, because most steps never needed the expensive model.
Doesn't using a cheaper model hurt quality?: Only if you use it for the wrong steps. Discovery, file reading, and simple edits run fine on a cheaper model; hard reasoning gets the strong one. And richer context lets smaller models punch above their weight, because they're guessing less. The goal is right-sized, not cheapest.

Why AI agents get expensive (and how to fix it)

Agentic tasks burn far more tokens than chat — mostly from re-sending context every call. Why the bill climbs, and how leaner context brings it down.

My agent bill tripled — here's what fixed it

A field note on an AI coding bill that tripled in a month, finding the culprit (re-sent context and the wrong model everywhere), and the two changes that fixed it.

Compaction: keeping your agent's context window lean

Every model degrades as its context fills — context rot. Compaction keeps the working set lean without losing what matters. The three mechanisms C² uses.

All articles