
When ChatGPT launched in November 2022, the reaction was immediate and visceral: this works. For the first time, millions of people experienced AI not as a distant promise, but as something useful, intuitive, and even with its flaws, surprisingly capable.
That instinct was correct. The conclusion that followed was not. Because what works brilliantly for an individual at a keyboard has proven surprisingly ineffective within an organization.
Two years later, after billions in investments, countless pilots, and an endless stream of “co-pilots,” a different reality is emerging: Generative AI is exceptional at language production. But companies don’t run on language: they run on memory, context, feedback, and limitations. That’s the gap. And that’s why so many enterprise AI initiatives are quietly failing.
High adoption, low impact… and a growing sense of déjà vu
This is not a story about a technology that failed to gain traction. It’s quite the opposite.
A widely cited MIT-backed analysis found that about 95% of enterprise generative AI pilots fail to deliver meaningful results, and only about 5% manage to achieve sustained production. Other coverage of the same findings points to the same pattern: massive experimentation, minimal transformation.
And the explanation is revealing: the problem is not enthusiasm, or even capacity: it is that the tools do not translate into real operational change.
This is not an adoption issue. It is an architectural problem.
The uncomfortable paradox: everyone uses AI, but nothing changes
Today, two realities coexist within most companies: on the one hand, employees use tools like ChatGPT constantly. They draft, summarize, ideate, and accelerate their work in ways that feel natural and effective.
On the other hand, official enterprise AI initiatives struggle to scale beyond carefully controlled pilots.
The same MIT-related analysis describes a widening “learning gap”: People quickly find value, but organizations fail to integrate that value into important workflows. The result is something akin to “shadow AI”: people use what works, while companies invest in what doesn’t.
That is not resistance to change.
That’s a sign.
The central mistake: treating a language model like an operating system
Most explanations for this failure focus on execution: bad data, unclear use cases, lack of training. All true. All secondary.
The real issue is simpler and much more fundamental: large language models are designed to predict text. That’s all. Everything else, from reasoning to summarizing, conversation, etc., is an emergent property of that ability.
But companies don’t operate like sequences of text. They operate as evolving systems with state, memory, dependencies, incentives, and constraints.
This is the mismatch.
As I have argued before, this is the main architectural flaw of AI: LLMs do not “see” the world. They do not maintain a persistent state. They don’t learn from real-world feedback unless they are explicitly designed to do so.
They generate a convincing language about reality. They do not operate within it.
You can’t run a company based on word predictions
This leads to a pattern that should look familiar.
Ask an LLM to:
- “Increase my sales”
- “Design a marketing strategy”
- “Improve team performance”
And you will get an answer. Often a very good one. A structured, articulate and persuasive response. And almost completely disconnected from the actual system it is supposed to influence.
Because an LLM cannot track a process, manage incentives, integrate CRM data or adapt based on results. You can describe a strategy. But you can’t run one.
The MIT findings reinforce this point: Generative AI tools are effective for flexible individual tasks, but fail in business contexts where adaptation, learning, and integration are required.
In other words: an LLM can write the grade. But he can’t run the company.
Throwing more computing at the problem won’t solve it
The industry response so far has been predictable: build bigger models, deploy more infrastructure, scale everything. But scale does not solve a design flaw. If a system is actually ungrounded, more parameters will not ground it. If you are short on memory, more tokens will not give you memory. If you lack feedback loops, no more data centers will be created.
Scale amplifies what exists. Don’t believe what’s missing. And what is missing here is no more language. It’s more world.
The next layer will not be about best answers
The next phase of enterprise AI won’t be defined by better chat interfaces or more powerful LLMs. It will be defined by something completely different: systems that can maintain state, integrate into workflows, learn from results, and operate under constraints.
Systems that not only generate text, but act in real environments. That is why the future of AI in companies will not be built solely on LLMs, but on architectures that integrate them into richer reality models.
Or, as I have argued in previous work, why global models are likely to become a core capability rather than a niche concept.
Say what many already know… but rarely say
If this seems obvious, it is because many people within organizations already see it: they have carried out the pilots. They have seen the demonstrations. They have experienced the gap. But saying it out loud is still uncomfortable.
There is too much momentum, too much investment, and too much narrative built around the idea that expanding LLMs will eventually solve everything. It won’t.
The emperor is not only poorly dressed. He’s wearing completely the wrong clothes.
the real opportunity
This is not the end of enterprise AI: it is the end of a misconception. Language models are not enterprise architecture: they are an interface layer. A powerful proposal, but insufficient on its own.
Companies that understand this first will not only implement AI better: they will build something fundamentally different.
And when that happens, it will feel, once again, like magic.
But this time it won’t be an illusion.

