Blog
AIMay 1, 2026

95% of GenAI pilots are failing. The 5% that work look boring.

By Aaron McClendon, Founder & CTO, Arkitekt AI

95% of GenAI pilots are failing. The 5% that work look boring.

MIT's NANDA group looked at hundreds of corporate generative AI pilots and found that 95% of them aren't producing measurable P&L impact. That's the headline from their *State of AI in Business 2025* report, covered by Fortune in August.

If you've been quietly suspicious that the AI demos in your LinkedIn feed don't match what's happening inside real companies, you weren't wrong.

The demo-to-production gap is the whole story

A demo runs once, on a clean input, with a sympathetic audience. Production runs ten thousand times, on messy inputs, while someone's payroll depends on it. Those are different problems.

IBM's team made a similar point in their 2025 review of agentic AI: the agents that hold up in production are the ones with narrow scope, clear oversight, and a human somewhere in the loop. The fully autonomous "set it and forget it" agent is mostly still a conference talk.

Gartner is pointing at the same shift from a different angle. They predict 40% of enterprise apps will feature task-specific AI agents by 2026, up from less than 5% in 2025. The key word there is *task-specific*. Not general. Not autonomous. Scoped.

What the 5% actually looks like

In our experience, the AI work that pays for itself in an SMB is unglamorous:

- Reading inbound emails and routing them to the right person with a draft reply attached. - Pulling line items off a PDF invoice and dropping them into the accounting system for a human to approve. - Summarizing a support ticket thread before a manager picks it up. - Tagging and deduplicating CRM records overnight.

None of these will trend on social media. All of them save real hours and reduce real errors. They share three traits:

1. Narrow scope. The model does one job, not ten. 2. A human checkpoint. Approval, review, or override is built in. 3. Deterministic plumbing around it. Queues, retries, logs, and clear failure modes — so when the model has a bad day, the business doesn't.

That last one is the part most pilots skip. Teams wire a chatbot to a database, demo it, and call it shipped. Then a customer asks something weird, the model improvises, and trust evaporates.

What this means if you're considering a project

If you're an operator looking at AI right now, two suggestions.

First, pick the boring task. The one a junior employee does for forty minutes every morning. That's where the math works.

Second, treat the AI as one component in a system, not the system itself. The model writes the draft. Code routes it, validates it, and stores the audit trail. A human signs off until you've earned the right not to.

The companies in the 5% aren't smarter. They just scoped smaller and built more around the model. That's the whole trick.

Arkitekt AI builds production-grade custom software on managed infrastructure, delivered autonomously at AI speed. If you're paying for tools that almost fit, let's talk.

arkitekt-ai.com

Source: “Inside Big Software's fight for its life,” Ashley Stewart, Business Insider, April 7, 2026.