EngineeringMay 26, 2026

What we still do by hand when an agent writes the code

By Aaron McClendon, Founder & CTO, Arkitekt AI

IBM put a label on something the rest of the industry has been circling all year: the move from AI-assisted *coding* to AI-assisted *delivery*. Their pitch for Bob is that agents shouldn't just autocomplete functions, they should plan, build, test, and deploy (IBM, 2025). That framing matches what we see in practice. The hard part of shipping software was never typing the code.

So a fair question for any agency claiming AI-assisted delivery: what do you actually still do by hand?

Here's our honest answer.

Scoping the thing before any code gets written

Agents are good at building what you describe. They're bad at noticing you described the wrong thing. Before we open an editor, we sit with the client and walk through the actual workflow — who touches what, where the data lives now, what breaks when someone is on vacation. That conversation is the single highest-leverage hour in the whole project. No model replaces it, because the inputs aren't in any repo.

Reading every diff

METR ran a randomized trial on experienced open-source developers using 2025-era AI tools and found they were roughly 19% slower on familiar codebases, even though they *felt* about 20% faster (METR, 2025). The gap between perceived and actual speed is the whole reason review matters. Agents produce code that looks right. Looking right and being right are different things, especially around auth, money, and anything touching a customer record.

We read every diff. We don't merge what we haven't read.

Writing the tests ourselves for anything that matters

Agents will happily write tests for the code they just wrote. Those tests tend to confirm the code does what the code does, which is not the same as confirming it does what you wanted. For critical paths — billing, permissions, data export, integrations with the client's existing systems — a human writes the test first or rewrites the agent's version against the actual spec.

Deploying, watching, and owning the pager

The New Stack's 2025 wrap-up noted how much of the agentic ecosystem now runs through MCP and similar connective tissue (The New Stack, 2025). Useful for development. Not a substitute for someone who notices when a queue is backing up at 2am. Our managed infrastructure setup means logs, metrics, and alerts route to us, not to a client who didn't sign up to be on call.

What this adds up to

Agents let a small team like ours move faster than we could two years ago. They do not let us skip the parts of software delivery that have always mattered: understanding the problem, reviewing the work, testing the risky bits, and owning what runs in production.

If someone tells you AI removes those steps, they're selling the demo, not the system. Good automation is invisible because someone made it boring on purpose.

Arkitekt AI builds production-grade custom software on managed infrastructure, delivered autonomously at AI speed. If you're paying for tools that almost fit, let's talk.

arkitekt-ai.com

Source: “Inside Big Software's fight for its life,” Ashley Stewart, Business Insider, April 7, 2026.

← All posts