- Type
- Pillar
- Published
- June 1, 2026
- Reading
- 15 min
- Author
- Charles Gautier
Why AI POCs never reach production - and what architecture changes
A cabinet reading of the gap between pilot and production: Forrester, Deloitte and Gartner converge - at least 30% of GenAI projects abandoned after the POC, only 20% of companies already growing revenue from AI. The five root causes, the move from POC to production as an operating-model problem, and five reflexes to avoid the statistics.
Key points
- A POC that never reaches production is almost always an operating-model problem (architecture, data, governance), not a model problem.
- Forrester and Deloitte converge: only 20% of organizations are already growing revenue from AI. The blocker is organizational, not technological.
- Five root causes recur: data not ready, missing success metrics, broken workflow integration, absent governance, escalating costs.
- The answer is not a better model but an operable system designed upfront: map first, design for operability, instrument metrics, govern.
The same scene plays out in many companies. A generative AI pilot is impressive in a demo: it summarizes files, answers customers, drafts reports. Leadership approves. Then, six months later, the project is still not in production - or it is, but no one uses it, and no one can say whether it created any value.
This is not an isolated anecdote. It is the dominant pattern of enterprise AI adoption in 2026. The numbers are harsh, and they do not say what people often make them say. This article offers an architecture-oriented reading of the gap between pilot and production: not a list of "10 reasons", but an analysis of the root causes and of what designing an operable system actually changes.
The thesis is simple, and it is now backed by the analyst firms: a POC that does not reach production is almost never a model problem. It is an operating-model problem - architecture, data and governance - decided long before the first line of prompt. As Forrester puts it in its 2026 predictions, AI moves "from hype to hard hat work": what blocks value is not the models, it is the ways of operating.
The numbers, and what they really mean
Before explaining, the circulating figures need to be read correctly. Many are brandished as proof that "AI does not work". Read rigorously, they say the opposite: the technology often works; turning it into a system is what fails.
The gap between ambition and execution: what Forrester and Deloitte say
The most solid foundation is not a shock figure, it is a convergence of analyst firms. Deloitte, in its State of AI in the Enterprise report (2026 edition, a survey of 3,235 leaders conducted between August and September 2025 across 24 countries), measures the gap head-on: only 20% of organizations are already growing revenue from their AI initiatives, while 74% are still hoping to in the future. Growth remains an aspiration, not a result.
Forrester reaches the same reading through the finances. In its 2026 predictions, the firm notes that fewer than one-third of decision-makers can tie the value of AI to their organization's financial growth, and that only 15% reported an EBITDA lift attributable to AI over the past twelve months. The direct consequence: Forrester expects enterprises to defer a quarter of their planned AI spend into 2027, while they wait to prove the return. The title of its predictions captures the shift: AI is moving from hype to "hard hat work".
The message from both firms is convergent, and it grounds the angle of this article: the blocker is not technological, it is organizational. It is an operating-model problem - ways of working, data, processes and governance - not an AI-model problem.
At least 30% of POCs abandoned, 95% with no ROI: read without alarmism
Gartner quantifies the drop-off at the next stage. According to its 29 July 2024 press release, at least 30% of generative AI projects will be abandoned after the proof of concept by the end of 2025 - because, the firm specifies, of poor data quality, inadequate risk controls, escalating costs or unclear business value. That is the official figure: the "30%" wording is Gartner's, not the "50%" often repeated by secondary sources. The firm adds two structuring projections, each from a distinct press release: up to 60% of AI projects may be abandoned for lack of "AI-ready" data by 2026 (press release of 26 February 2025), and more than 40% of agentic AI projects may be cancelled by the end of 2027, notably because of poorly anticipated costs, unclear business value and inadequate risk controls (press release of 25 June 2025, a poll of 3,412 respondents).
The most spectacular figure comes from MIT, through its Project NANDA initiative: roughly 95% of generative AI pilots reportedly produce no measurable return, according to the report relayed by Fortune on 18 August 2025. This figure must be handled with caution: its methodology (around a hundred interviews, a few hundred employees and deployments) has been judged preliminary and fragile by several analyses, and it should not be presented as an absolute fact. What matters is not the percentage, but the root cause MIT identifies: a learning gap, an organizational learning shortfall, not a failure of the models - exactly the same diagnosis as Forrester and Deloitte.
These figures do not describe a technology that fails. They describe organizations that demonstrate a capability without building the system that makes it operable. A pilot proves that a task can be done by a model. Production requires proving that it can be done reliably, traceably, integrated and sustainably over time. These are two different problems.
"Failure" is not "bad model"
The most expensive reflex is to read a stuck POC as a model problem, and to look for the fix in a change of tool: a newer model, a different provider, a fashionable framework. In most observed cases, the model was not the weak link.
RAND's work on the root causes of AI project failure points to convergent and remarkably stable factors: a problem poorly defined by leadership, data that is not ready, success metrics that are absent or badly chosen, and an integration into work processes that was never designed. None of these causes is fixed by changing the model.
This is precisely what the cabinet reading calls an architecture problem: the decision that determines the fate of the project is made upstream of the technology choice. If it is wrong, no model recovers it.
The five root causes of non-production
The causes of failure are known and overlap from one study to the next. Presenting them as a system, not a list, reveals that they share a single origin: a pilot designed to demonstrate, not to operate.
Cause 1: data not ready - the number one cause
This is the most cited and most underestimated cause. A model is only worth the data it can access: quality, freshness, access rights, structure. Yet data preparation absorbs the bulk of the real effort: according to a widely cited estimate from the New York Times (Steve Lohr, 2014), data scientists spend 50 to 80% of their time collecting and preparing raw data before it can be used - effort that is largely invisible during the pilot phase, where work happens on a curated, frozen dataset.
Gartner estimates that up to 60% of AI projects could be abandoned by 2026 for lack of "AI-ready" data (press release of 26 February 2025). The finding is confirmed in 2026 by Fivetran's agentic AI readiness index: data availability and quality remain the top cited barrier (42% of respondents). Production reveals what the pilot hid: data scattered across silos, uneven quality, unframed access rights, no freshness. A pilot that ignores this reality does not measure the real difficulty; it defers it.
Cause 2: poorly defined success metrics
Many pilots have no operational definition of success. No baseline before deployment, no link to a P&L line, no business-value indicator. "It impresses in the demo" is not a metric.
Without a baseline, it is impossible to prove a gain - and therefore impossible to justify the investment of moving to production. The project stays suspended in an undetermined status: neither abandoned nor industrialized. This is one of the main mechanisms feeding the "95% with no measurable ROI" figure: often the ROI is not negative, it is unmeasured, because it was never instrumented from the start.
Cause 3: broken workflow integration
A pilot usually lives next to the real workflow: in a separate chat window, a notebook, a demo interface. Production requires the opposite: AI must fit into existing tools, habits and processes, where the work actually happens.
This is the most brutal gap. An assistant that produces an excellent answer, but which has to be manually copy-pasted into the CRM, will not be adopted. RAND and McKinsey converge on this point: the value lever is not the model, it is the redesign of the workflow around it. A POC that did not think through its insertion point in the flow of work will not pass the adoption stage.
Cause 4: missing risk controls and governance
A pilot generally does not have to pass the security review, the legal review or the compliance review. Production does. This is often where the project hits a wall: no audit log, no control over data access, no human oversight, no traceability of decisions, no clear answer on personal-data handling.
In Europe, this requirement is no longer theoretical. The GDPR applies as soon as there is personal data, and the EU AI Act adds its own transparency and governance obligations. A system that cannot say what it does, with which data and under whose responsibility does not pass the review - and therefore does not reach production. Governance bolted on afterwards is a patch; designed from the start, it is a condition of entry into production.
Cause 5: costs that spiral
The last lock is economic. A pilot's API bill is misleading: it represents only a fraction of the real cost of a production system, once data preparation, integration, maintenance, compliance and oversight are added. When the true cost appears at industrialization time, with no solid value case in front of it, the project is scaled down.
This is a major cause of the agentic project cancellations Gartner anticipates by the end of 2027. Fivetran's 2026 agentic AI readiness index quantifies exactly this mismatch: nearly 60% of enterprises are investing millions, even tens of millions, in agentic AI, while only 15% are actually ready to run it in production. Money precedes operability - that is the very definition of a cost that spirals. The total cost of ownership of an agentic system deserves a dedicated analysis, which we will treat in a separate article. The essential point here: a pilot that only budgeted inference did not budget the project.
Moving from POC to production, seen as an architecture problem
If the five causes share a single origin - a pilot designed to demonstrate, not to operate - then the answer is also single: design, from the start, an operable system. This is the central shift the cabinet reading proposes. The question is not "which model?", but "which system will hold in production, be governed and be measured?".
Map before automating
Before writing a prompt, you have to understand the terrain: which processes, which data, which roles, which decisions, which integration points. This mapping reveals the real constraints - data silos, workflow breaks, compliance requirements - that the pilot would have discovered too late, at the worst moment.
This is the purpose of an AI Opportunity Mapping: set the diagnosis and prioritize use cases by their real value and feasibility, before any build commitment. Mapping first means refusing to pay the learning cost in production.
Design for operability from the start
An operable system is not a pilot with features added to it. It is an architecture designed for production: governed data, workflow integration, instrumented metrics, risk controls, human oversight. These properties cannot be added afterwards without cost or debt; they are decided upstream.
This is the logic of an Agentic Operating Blueprint: turn a prioritized opportunity into an architecture target, structured in layers - business, data, orchestration, decision, governance - so that moving to production is not a leap into the void, but the next step of a plan. This layering echoes the distinction developed in our analysis of agentic architecture in the enterprise: an isolated agent demonstrates, a governed architecture operates.
An emerging discipline names what separates a POC from an operable system: context engineering. Anthropic, which formalized the definition in September 2025, describes it as the set of strategies for "curating and maintaining the optimal set of tokens" supplied to the model during inference - well beyond the prompt alone: system instructions, tools, external data, history. The central question, the provider puts it, becomes "what configuration of context is most likely to generate our model's desired behavior". Anthropic frames it as the natural progression of prompt engineering: writing a good instruction still matters, but what holds in production is managing the entire context state across the turns of an agent. A pilot optimizes a prompt; an operable system governs its context.
The same reasoning shows up, under a different image, among agentic practitioners. In its 2025 AI agent report, Composio sums up the gap with a telling metaphor: you have a powerful "kernel" - the model - but no "operating system to run it properly", that is, a layer that manages memory, input/output and permissions. Composio insists on the point: pilot failure is structural rather than technical - the same diagnosis as Forrester, Deloitte and RAND, stated from the engineer's point of view. That layer is not an add-on: it is the operating model made executable.
The role of human oversight in production
Human oversight is not the enemy of automation; it is often the condition that makes production acceptable. Defining where human validation is required - disbursements, legal commitments, access to sensitive data - allows earlier deployment, because the residual risk is bounded and traced.
This is what separates a system that can enter production from a prototype no one dares to wire to the real world. Oversight and traceability do not slow industrialization down: they authorize it. We will dedicate a separate analysis to the governance of autonomous agents.
Evals as infrastructure
One last element separates the demo from the system: evaluation. The 2026 AI engineering consensus converges on a point that practice confirms: teams that invest early in evals - a battery of tests that measures output quality at every change - turn their failures into test cases rather than production surprises. The practice is layered: deterministic code evals for what can be checked unambiguously, then evaluation by a judge model ("LLM-as-judge") for what requires discernment, then human review on sensitive cases. Without this layer, a pilot cannot tell whether it is improving or regressing; it only knows whether it impresses. That is precisely what separates a one-off demonstration from a system you can evolve without breaking it.
Five reflexes to avoid the 95%
The cabinet doctrine comes down to five reflexes, each answering one of the five root causes.
1. Define success before the pilot. Set a measurable baseline and a link to a value line (cost, revenue, delay, quality) before launching. A pilot with no baseline can never prove its ROI, even if it creates one.
2. Test on real data, not idealized data. Confront the system with data as it is - scattered, uneven, constrained by access rights. Data preparation is the real project; better to measure its scale early than to suffer it in production.
3. Design the workflow insertion point from the start. Decide where AI fits into existing tools and habits. An excellent result that lives outside the flow of work will not be adopted, and an unadopted system has no value.
4. Build governance in as a condition of entry, not a catch-up. Audit log, access control, human oversight, decision traceability, personal-data handling: designed from the start, these are the keys that open the security and legal review instead of blocking it.
5. Budget the system, not the inference. Estimate the total cost - data, integration, maintenance, compliance, oversight - and weigh it against the value case. A project whose real cost only appears at industrialization time is a project that will be scaled down.
What this changes for decisions
For leadership, the main shift in perspective is this: a successful POC is not a system. It is a proof of feasibility, useful and necessary, but it says almost nothing about the ability to operate. Confusing the two is precisely what leads to the abandonment statistics.
The cabinet practice recommends asking the question before commitment: are we building a demonstration, or an operable system? Both are legitimate, but their costs, durations and requirements differ. The most expensive scenario observed in the field is a pilot funded as a demonstration, then ordered to become a system without ever having had the architecture for it.
Moving a POC to production is not a question of a better model. It is a work of mapping, design and governance - the work of an architecture cabinet. This is the position LeadsFlowAI holds: turning a promising demonstration into a system that can be operated, measured and evolved.
Where this connects to LeadsFlowAI
- Set the diagnosis and prioritize use cases by value and feasibility with an AI Opportunity Mapping, before any build commitment.
- Turn a prioritized opportunity into an operable architecture target with an Agentic Operating Blueprint.
- Return to the distinction between isolated agent and governed system in our analysis of agentic architecture in the enterprise.
- Read the compliance dimension of moving to production in our reading of the EU AI Act for non-regulated companies.
- Browse the other LeadsFlowAI insights to separate architecture choices, regulatory obligations and run issues.
Sources consulted
- Forrester, "Predictions 2026: Artificial Intelligence" / press release "Forrester's 2026 Technology & Security Predictions" (28 October 2025) - fewer than one-third of decision-makers tie the value of AI to their financial growth; 15% reported an AI-attributable EBITDA lift over twelve months; a quarter of planned AI spend deferred into 2027. Reading: the blocker is the operating model, not the technology. Consulted on 1 June 2026:
https://www.forrester.com/press-newsroom/forrester-tech-security-2026-predictions/andhttps://www.forrester.com/blogs/predictions-2026-ai-moves-from-hype-to-hard-hat-work/. - Deloitte, "State of AI in the Enterprise" (2026 edition, survey of 3,235 leaders, August-September 2025, 24 countries) - 20% of organizations are already growing revenue from AI, versus 74% still hoping to. Consulted on 1 June 2026:
https://www.deloitte.com/global/en/issues/generative-ai/state-of-ai-in-enterprise.htmland press releasehttps://www.deloitte.com/us/en/about/press-room/state-of-ai-report-2026.html. - Fivetran, "2026 Agentic AI Readiness Index" (Business Wire, 5 May 2026, survey of 400 data professionals) - only 15% of enterprises ready to run agentic AI in production while nearly 60% are investing millions; data quality/availability = top cited barrier (42%). Consulted on 1 June 2026:
https://www.businesswire.com/news/home/20260505250301/en/Fivetran-Launches-2026-Agentic-AI-Readiness-Index-Revealing-Gap-Between-Enterprise-Investment-and-Data-Preparedness-for-Agentic-AI. - Gartner, "30% of Generative AI Projects Will Be Abandoned After Proof of Concept" (29 July 2024) - at least 30% of GenAI projects abandoned after the POC by the end of 2025, due to poor data quality, inadequate risk controls, escalating costs or unclear business value. Consulted on 1 June 2026:
https://www.gartner.com/en/newsroom/press-releases/2024-07-29-gartner-predicts-30-percent-of-generative-ai-projects-will-be-abandoned-after-proof-of-concept-by-end-of-2025. - Gartner, "Lack of AI-Ready Data Puts AI Projects at Risk" (26 February 2025) - up to 60% of AI projects abandoned for lack of "AI-ready" data by 2026. Consulted on 1 June 2026:
https://www.gartner.com/en/newsroom/press-releases/2025-02-26-lack-of-ai-ready-data-puts-ai-projects-at-risk. - Gartner, "Over 40% of Agentic AI Projects Will Be Canceled by End of 2027" (25 June 2025, poll of 3,412 respondents) - cancellations due to poorly anticipated costs, unclear business value and inadequate risk controls. Consulted on 1 June 2026:
https://www.gartner.com/en/newsroom/press-releases/2025-06-25-gartner-predicts-over-40-percent-of-agentic-ai-projects-will-be-canceled-by-end-of-2027. - MIT Project NANDA, "The GenAI Divide: State of AI in Business 2025", relayed by Fortune on 18 August 2025 - roughly 95% of generative AI pilots with no measurable return; root cause = organizational learning gap. Methodology judged preliminary by several analyses: a figure to handle with caution. Consulted on 1 June 2026:
https://fortune.com/2025/08/18/mit-report-95-percent-generative-ai-pilots-at-companies-failing-cfo/. - RAND, "The Root Causes of Failure for Artificial Intelligence Projects" (2024) - convergent causes: poorly defined problem, data not ready, wrong metrics, missing integration. Consulted on 1 June 2026:
https://www.rand.org/pubs/research_reports/RRA2680-1.html. - Anthropic, "Effective context engineering for AI agents" (29 September 2025) - definition of context engineering as managing the optimal set of tokens supplied to the model during inference (beyond the prompt alone), the natural progression of prompt engineering; "what configuration of context is most likely to generate our model's desired behavior". Consulted on 1 June 2026:
https://www.anthropic.com/engineering/effective-context-engineering-for-ai-agents. - Composio, "The 2025 AI Agent Report: Why AI Pilots Fail in Production and the 2026 Integration Roadmap" (Manveer Chawla, 30 November 2025) - the metaphor of a "kernel" (the model) with no "operating system" to run it; an agent layer managing memory, input/output and permissions; pilot failure structural rather than technical. Consulted on 1 June 2026:
https://composio.dev/blog/why-ai-agent-pilots-fail-2026-integration-roadmap. - Data preparation = 50 to 80% of the effort - estimate from the New York Times (Steve Lohr, "For Big-Data Scientists, 'Janitor Work' Is Key Hurdle to Insights", 2014), confirmed by the data science literature consensus and 2025-2026 sector analyses. Consulted on 1 June 2026:
https://www.nytimes.com/2014/08/18/technology/for-big-data-scientists-hurdle-to-insights-is-janitor-work.html.
Key takeaways
- A successful POC proves feasibility; it says almost nothing about the ability to operate. Confusing the two leads to the abandonment statistics.
- Data preparation is cause number one: up to 60% of projects may be abandoned for lack of AI-ready data by 2026 (Gartner).
- The 95%-without-ROI figure usually means ROI unmeasured, not negative: set a baseline and metrics before you launch.
- Operability (workflow integration, governance, oversight, evals) is decided upfront; it cannot be bolted on after the POC without debt.
- Budget the whole system - data, integration, run, compliance, oversight - not just the API bill.
Frequently asked questions
Why do most AI projects never reach production?
Because most pilots are designed to demonstrate a capability, not to operate a system. The convergent root causes, identified by RAND, Gartner and Forrester among others, are data that is not ready, absent success metrics, an undesigned workflow integration, missing governance and poorly anticipated costs. Forrester and Deloitte both see an operating-model problem, not a technology problem. None of these is fixed by changing the model.
Does the 95% figure mean AI does not work?
No. That figure, from MIT Project NANDA (2025, relayed by Fortune) and whose methodology remains preliminary, points to an organizational learning gap, not a technological failure. In many cases the ROI is not negative: it is simply unmeasured, for lack of a baseline and metrics defined at the start. It is consistent with Forrester, which finds that fewer than one-third of decision-makers tie the value of AI to their financial growth. The technology often works; turning it into a system is what fails.
What is the most frequent cause of failure?
Data preparation. According to an estimate from the New York Times, data scientists spend 50 to 80% of their time collecting and preparing data, effort that stays largely invisible during the pilot phase, where work happens on a curated dataset. Gartner estimates that up to 60% of projects could be abandoned for lack of AI-ready data by 2026.
Do you need a better model to move a POC to production?
Rarely. The model is almost always sufficient. What is missing is the architecture around it: governed data access, workflow insertion, metrics, risk controls and human oversight. Changing the model without addressing these causes only moves the problem.
Where should you start to avoid failure?
With mapping before automation. Understanding processes, data, roles and integration points lets you prioritize use cases by value and feasibility, and design an operable system from the outset rather than discovering constraints in production. That is the purpose of an AI Opportunity Mapping, upstream of an Agentic Operating Blueprint.