Blog
EngineeringMay 20, 2026

Coding agents got fast in 2025. The hard part is what happens after.

By Aaron McClendon, Founder & CTO, Arkitekt AI

Coding agents got fast in 2025. The hard part is what happens after.

If you read any year-in-review on software in 2025, the headline was the same: coding agents got really good, really fast. The New Stack's wrap-up framed it as the year of agents, MCP, and vibe coding. Anthropic's own 2026 Agentic Coding Trends Report backs that up with adoption data from real engineering teams.

We use these tools every day. They are genuinely good. But here's the part that doesn't make the headlines: an agent writing a working feature in twenty minutes is maybe 30% of the job. The other 70% is the boring discipline that turns a working demo into something a small business can actually run their operations on.

The speed is real. The risk is also real.

An agent will happily write you a 600-line file that runs on the first try and quietly does the wrong thing in production three weeks later. It will invent an API endpoint that doesn't exist. It will use a deprecated library because it saw one in training data. It will write tests that pass because the tests assert what the code does, not what the business needs.

None of that is a reason to stop using agents. It's a reason to put rails around them.

What we actually do after the agent writes code

A recent arXiv guide on deploying production-grade agentic workflows covers a lot of the same patterns we've landed on through trial and error. The short version of our process:

Humans write the spec. Before any code, we write down what the thing has to do, what inputs it takes, what it must never do, and how we'll know it's working. The agent doesn't get to define success.

Small, reviewable changes. We don't let agents write entire systems in one shot. Features get broken into pieces a human can actually read in ten minutes. If a pull request is too big to review, it's too big to ship.

Tests we wrote, not tests the agent wrote. Agent-generated tests are useful as a starting point. They are not a safety net. The tests that actually catch bugs are the ones grounded in the business rule: "a job marked complete must have a signed timestamp," not "function returns true."

Staging that looks like production. Same database engine, same managed infrastructure, same auth. We catch the boring environment bugs before your team does.

A human signs off before anything touches your live data. Always.

Why this matters if you're buying software

If someone tells you they build with AI and quotes you a price that assumes the agent does everything, be careful. The speed-up from agents is real, but the cost of skipping review shows up later, usually as data you can't trust or a bug nobody can explain.

Good AI-assisted delivery should feel, from your side, exactly like good software delivery always has. It works. It keeps working. You don't think about it. The agent is a tool we use to get there faster. It's not the thing you're buying.

Arkitekt AI builds production-grade custom software on managed infrastructure, delivered autonomously at AI speed. If you're paying for tools that almost fit, let's talk.

arkitekt-ai.com

Source: “Inside Big Software's fight for its life,” Ashley Stewart, Business Insider, April 7, 2026.