What Enterprises Get Wrong Before Writing a Single Line of AI Code

85% of enterprise AI projects fail to reach production. Not because the models are bad or the engineering team is incompetent. Because nobody did the discovery work. I have watched this pattern play out at more than a dozen companies: a VP reads about GPT transforming their industry, an internal champion gets budget, an engineering team spins up a prototype in three weeks. The demo looks great. Then it sits on a shelf because nobody answered the questions that actually matter.

Does this solve a real business problem? Can we get the data into the right format? Who owns this system when it goes wrong? What does it cost at production volume? These are discovery questions, and they need answers before anyone writes code.

The Workflow Audit Comes First

The single most common mistake I see in enterprise AI planning is starting with the technology. "Where can we use AI?" is the wrong first question. The right first question is: what are our most expensive, most error-prone, or most time-consuming workflows?

Map every process that touches the problem space you are considering. Not at the abstract level. At the level of what a person actually does, step by step, when they complete this task today.

A claims processing team does not just "process claims." They receive a document, read it, cross-reference it against policy details, check for missing information, classify the claim type, route it to the right adjuster, draft an initial assessment, and flag anomalies. Each of those steps has a different profile for AI suitability.

I classify each step into three categories. Mechanical steps that follow clear rules and rarely require judgment: AI handles these reliably today. Assisted steps where judgment is required but AI can do the heavy lifting: humans stay in the quality control seat while AI handles volume. And human-only steps requiring empathy, complex negotiation, or regulatory sign-off: AI should not touch these, because trying to automate them creates more risk than value.

This classification is where I spend the most time with clients, because the boundaries are not always obvious. The only way to know where they fall is to sit with the people who do the work.

Your Data Is Not as Ready as You Think

Every AI system is a data system. The model is the least interesting part. The data that feeds it determines whether it works.

I run a data readiness assessment with every client, and the results are almost always sobering. Companies overestimate their data readiness by a wide margin. They say "we have ten years of customer data." What they actually have is ten years of data spread across four systems that do not talk to each other, with inconsistent field names, duplicate records, and three different date formats.

The assessment covers four areas: a data inventory documenting what actually exists and where it lives; an accessibility check (data behind an API is gold, data in spreadsheets on someone's desktop is a problem, data in someone's head is not data at all); a quality assessment pulling 200 to 500 records per source to identify systemic issues; and a gap analysis comparing what you have to what the AI system needs.

I worked with a financial services firm that wanted to build an AI system for personalized investment recommendations. They needed transaction history, risk profile data, and market sentiment signals. They had the first two. The third required a six-figure data licensing agreement they had not budgeted for. That is a gap you want to find in week two of discovery, not month six of a build.

Define Success Without Mentioning Technology

"Deploy an AI model" is not a success metric. Neither is "implement machine learning" or "use LLMs in our workflow." These are activities, not outcomes. If you cannot define success without mentioning technology, you are not ready to build.

Good success metrics are operational and measurable. Reduce customer response time from 4 hours to 15 minutes. Cut manual data entry by 80% while maintaining 99% accuracy. Decrease claims processing time from 5 days to 8 hours. Each of these has three properties that matter: it describes a business outcome, it is measurable with data you already collect, and it has a clear "before" number so you can prove impact.

The distinction between vanity metrics and operational metrics matters enormously. You can process 10,000 documents with AI and still have a higher error rate than the manual process. Operational metrics force you to measure what actually matters to the business.

Model Your Costs Before You Build

The vendor quote is just the starting point. The real cost picture includes infrastructure, monitoring, governance, and the team to run it. Monitoring and observability, prompt engineering and maintenance, data pipeline upkeep, retraining and evaluation, governance and compliance overhead. None of that is scary if you plan for it upfront. It only becomes a problem when it surprises you mid-project.

Inference costs alone get underestimated. In our model routing benchmarks, we found a 64x cost spread between the cheapest and most expensive models for the same task. At 1,000 users, the difference between naive model selection and optimized routing was $1.96 million per year. And pilot costs translate to production costs at a 3 to 5x ratio once you add load balancing, failover, rate limiting, caching, and SLA monitoring.

Privacy and compliance constraints are not a section in the project plan. They are the frame that determines the shape of everything else. Get them wrong and you do not have a system that needs modifications. You have a system that needs to be rebuilt. Compliance surprises after months of engineering work are one of the most common and most expensive patterns in enterprise AI. The fix is simple: bring your compliance team into the discovery phase, not the deployment review.

The Bottom Line

The discovery phase is not a delay. It is the fastest path to a production system that actually works. Every week spent on discovery saves a month of rework. That is not a platitude. It is arithmetic.

The companies that skip discovery end up with expensive demos. A working prototype the board loved in Q2, a stalled deployment in Q3, a quiet write-off in Q4. The companies that do discovery end up with systems that compound in value. First use case works and delivers measurable results. The second builds on the infrastructure from the first. By the third, cost per deployment drops and time to production shrinks.

The discovery phase requires someone who has seen enough AI deployments to know which questions matter and which are distractions.