Skip to content
AIJuly 4, 20267 min read

AI Agents That Write Code: What They Do Well and Where They Fail

We use coding agents every day. Not as a demo, not as a party trick, but as part of how we actually ship work for clients. So when people ask whether AI can write code, my answer is not a hot take. It is a field report from a shop that has both been rescued by these tools and burned by them, sometimes in the same afternoon.

The short version: they are a real, useful tool with real, specific limits. Anyone selling you either the miracle or the apocalypse is not doing the work. Here is what we have actually learned.

Where they genuinely earn their keep

There is a category of programming that is necessary, tedious, and not very hard. Agents are excellent at it, and honestly I am glad to hand it over.

  • Boilerplate. Scaffolding a form, wiring an API route, setting up a config file. This is typing, not thinking, and a good agent does it in seconds without the small mistakes tired humans make at 6pm.
  • Mechanical refactors. Rename a concept across forty files, pull a repeated block into a shared function, migrate an old pattern to a new one. The agent holds all forty files in its head at once, which a person cannot.
  • Tests. Writing the fifteenth test case for a function is soul-draining work, and the agent is happy to grind out the obvious cases so we can focus on the odd ones that actually matter.
  • First drafts. Give it a rough shape and it returns something to react to. A draft you can criticise beats a blank file, every time.
  • Tracing bugs. This one surprised me. Point an agent at a stack trace and a codebase and it is genuinely good at reading backwards through the logic to find where a value went wrong. It reads faster than we do, and it does not get bored on line 300.

Notice the pattern. In each case the problem is well defined and the answer can be checked. The agent is fast, tireless, and has read more code than any of us. When the task is "do this known thing, carefully, a lot," it wins.

That is not a small category. It is a large slice of most working days, and getting it off our plate means we spend more of our attention on the parts that need a human. We wrote more about how this reshaped our workflow in how coding agents changed the way we build.

Where they still fail

Now the other side, because this is where the confident marketing quietly goes silent.

Agents are bad at architecture. Ask one to build a feature and it will build a feature. Ask it whether the feature should exist, or how it fits the three systems around it, and it has no real opinion. It optimises the thing in front of it and ignores the shape of the whole. Left alone across a project, it produces code that works in each part and rots as a system, because nobody was holding the map.

They are bad at judgement calls. Should this be fast or should it be simple? Is it worth the extra complexity to handle a case that happens twice a year? Those questions have no correct answer, only trade-offs that depend on the business, the team, and next year's plan. An agent will pick one and present it with total confidence, which brings us to the real hazard.

They are confidently wrong. This is the failure that costs the most. A junior developer who is unsure will usually say so. An agent almost never does. It will produce a broken change, a subtly incorrect fix, or an invented function that does not exist, and it will describe all of it in the calm, fluent tone of someone who is certain. If you are not reading carefully, that confidence is contagious, and you ship the mistake.

Code on office screens waiting for a careful human review
Code on office screens waiting for a careful human review

They are weakest on genuinely novel problems. When the answer is not somewhere in the vast pile of code these models trained on, they struggle. They pattern-match to the nearest familiar thing, which is often close enough to look right and wrong enough to hurt. The rarer and more specific your problem, the less help you get, exactly when you need it most.

And they do not know what NOT to build. An agent will cheerfully add the feature, the abstraction, the extra option. It never pushes back with "you do not need this yet." Restraint is one of the most valuable skills in software, and it is one the tools do not have at all.

How we keep the wheel

So we use them constantly and we never let them drive. The distinction matters.

Every line an agent writes gets read by a person before it goes anywhere near a client's site. Not skimmed. Read. We treat its output the way you would treat work from a fast, well-read intern who has no judgement and no shame about being wrong: useful, worth having, and never trusted on its own.

We keep the humans on the parts humans are still better at. We decide what to build and why. We own the architecture and the trade-offs. We make the judgement calls. Then we hand the well-defined, checkable pieces to the agent and check its work. The thinking stays with us; the typing goes to the machine. That division is the whole trick, and it is close to what we mean when we talk about software that is built AI-native from the start.

People sometimes read all this as a step toward developers disappearing. We think it is closer to the opposite, and we argued that case in whether AI will replace developers. The tools raise the floor on the boring work and leave the hard, human part of engineering exactly where it was: deciding what is worth doing and being responsible when it ships.

That is the honest picture from inside a studio that lives with these tools. They are genuinely good, genuinely limited, and worth using with your eyes open. If you are trying to work out where AI fits in your own product without handing over the wheel, talk to us about AI. We will tell you where it helps and, just as important, where it does not.

Ready when you are.

Talk to us about AI

Get the next one in your inbox

Occasional, practical notes on building sites that sell. No spam, unsubscribe anytime.