95% of GenAI pilots are failing. The 5% that work look boring.
By Aaron McClendon, Founder & CTO, Arkitekt AI

MIT's NANDA group looked at hundreds of corporate generative AI pilots and found that 95% of them aren't producing measurable P&L impact. That's the headline from their *State of AI in Business 2025* report, covered by Fortune in August.
If you've been quietly suspicious that the AI demos in your LinkedIn feed don't match what's happening inside real companies, you weren't wrong.
The demo-to-production gap is the whole story
A demo runs once, on a clean input, with a sympathetic audience. Production runs ten thousand times, on messy inputs, while someone's payroll depends on it. Those are different problems.
IBM's team made a similar point in their 2025 review of agentic AI: the agents that hold up in production are the ones with narrow scope, clear oversight, and a human somewhere in the loop. The fully autonomous "set it and forget it" agent is mostly still a conference talk.
Gartner is pointing at the same shift from a different angle. They predict 40% of enterprise apps will feature task-specific AI agents by 2026, up from less than 5% in 2025. The key word there is *task-specific*. Not general. Not autonomous. Scoped.
What the 5% actually looks like
In our experience, the AI work that pays for itself in an SMB is unglamorous:
- Reading inbound emails and routing them to the right person with a draft reply attached. - Pulling line items off a PDF invoice and dropping them into the accounting system for a human to approve. - Summarizing a support ticket thread before a manager picks it up. - Tagging and deduplicating CRM records overnight.
None of these will trend on social media. All of them save real hours and reduce real errors. They share three traits:
1. Narrow scope. The model does one job, not ten. 2. A human checkpoint. Approval, review, or override is built in. 3. Deterministic plumbing around it. Queues, retries, logs, and clear failure modes — so when the model has a bad day, the business doesn't.
That last one is the part most pilots skip. Teams wire a chatbot to a database, demo it, and call it shipped. Then a customer asks something weird, the model improvises, and trust evaporates.
What this means if you're considering a project
If you're an operator looking at AI right now, two suggestions.
First, pick the boring task. The one a junior employee does for forty minutes every morning. That's where the math works.
Second, treat the AI as one component in a system, not the system itself. The model writes the draft. Code routes it, validates it, and stores the audit trail. A human signs off until you've earned the right not to.
The companies in the 5% aren't smarter. They just scoped smaller and built more around the model. That's the whole trick.
Arkitekt AI builds production-grade custom software on managed infrastructure, delivered autonomously at AI speed. If you're paying for tools that almost fit, let's talk.
Source: “Inside Big Software's fight for its life,” Ashley Stewart, Business Insider, April 7, 2026.