The 95% you've already heard about
Last year MIT's NANDA initiative published a number that took on a life of its own: 95% of enterprise GenAI pilots fail to produce measurable financial impact. Every consultant, every board, every CFO now quotes it back to you. It became the convenient excuse for slowing down, and the convenient warning for not starting.
Stanford's Digital Economy Lab decided to invert the lens. Instead of studying the 95% that failed, they studied the 5% that didn't. The result: 51 successful enterprise AI deployments across 41 organizations, interviewed in depth between August 2025 and February 2026, written up by Pereira, Graylin, and Brynjolfsson into a 116-page report titled Enterprise AI Playbook.
It's one of the most grounded pieces of enterprise-AI research we've read this year. The case studies are concrete, the framing is honest about its limits, and the methodology is unusually disciplined. But the report carries a quiet thesis underneath the playbook. It's about labor, about which kinds of organizations win, and about a bifurcation the authors hint at without quite naming. That thesis is the part people leaders need to read carefully.
Here's our read.
• • •
Five findings that should change how you plan
The report is dense with statistics. Most are directional rather than precise, given the methodology: interview-based, self-reported deployments with explicit selection bias toward success. Five findings still cut deep enough to reshape how a people leader should think about the next twelve months.
Each of these has a downstream implication that doesn't appear on the chart. Taken together they say something simple and important: the constraints on enterprise AI success have moved.
• • •
The bottleneck has moved
For most of the GenAI era, the implicit story has been "the models aren't good enough yet." Boards used that to justify waiting. CTOs used it to justify pilots. CEOs used it to justify keeping the budget where it is.
This report, paired with the MIT NANDA finding it builds on, says that's no longer the binding constraint for a large share of enterprise work. The model has become a commodity. The variance lives somewhere else.
Companies still waiting for "better AI" before committing to redesign are misreading the problem. The capability already runs ahead of most organizations' ability to absorb it. Companies that figure out the absorption side compound capability quickly. Each successful deployment makes the next one cheaper, because the platform, the change-management muscle, and the proprietary data already exist. Companies still running proof-of-concept theater burn cycles without building any of that substrate.
The gap widens with every iteration, even when both sides have equal access to the underlying models. That's the J-curve playing out at the firm level.
• • •
Where resistance actually comes from
This finding flips a common assumption. Ask any executive where resistance to AI rollouts comes from and they'll point to the frontline: the people whose jobs the tool might absorb. The report says the opposite.
Staff functions resist for legitimate reasons. They manage risk, ensure compliance, and slow things down enough to think. They also resist for territorial ones. Nobody consulted them. The team bought the tool without their input. The policy implications never ran past them. So they raise their objections at the last possible moment, when the cost of addressing them runs highest.
Most AI playbooks try to win over the frontline. The veto comes from the staff floor. People leaders who skip Legal, HR, Risk, and Compliance during week two pay for it in month six.
The implication is operational. Whoever owns the rollout needs the staff functions sitting at the same kickoff table as IT and the business unit. Not a separate briefing. Not a courtesy review at the end. Same table, week two. The cost of that conversation is small. The cost of skipping it is the entire program.
• • •
The redeployment story and its expiration date
The most reassuring finding in the report is the headcount split. In the majority of cases, AI did not cost people their jobs.
The 55% breaks into three sub-patterns. Redeployment moves people from the automated task to higher-value work. The security operations case in the report shows this clearly: 4.5 FTEs shifted from alert triage to threat hunting after the team's monthly alert load jumped from 1,500 to 40,000. Hiring avoidance absorbs growth without adding bodies. The existing team handles the work the new hires would have done. Acceleration over cuts shows up in the EdTech case: 20–30% engineering productivity gains, PE owners pushing for layoffs, the CTO winning the argument to reinvest in the product roadmap instead.
The good news is real. Framing matters as much as the technology. Revenue-framed projects tend toward redeployment and acceleration. Cost-framed projects tend toward direct cuts. How a leader sets up the project at the start meaningfully shapes what happens to the people at the end. People leaders have more leverage here than they often use.
The harder news, which the report names but doesn't dwell on, is that this pattern is unlikely to hold. Three forces converge on a darker reading. First, models keep getting more capable. METR's measurements show the autonomous-task-length capability of frontier models doubling roughly every seven months. Frontier models handled tasks taking expert humans about an hour at 50% reliability as of early 2025. On that trend, multi-hour autonomous work arrives within a year, and day-long autonomous work within two to three. Second, cost pressure intensifies as the novelty wears off and CFOs ask harder ROI questions. Third, once early adopters prove the model works, competitive pressure to cut becomes harder to resist.
The report's own leading indicator comes from Brynjolfsson's separate ADP-payroll research: early-career workers aged 22–25 in AI-exposed roles have already dropped 16%, with software developers in that age band down close to 20%. That is not a future trend. It is happening now. The redeployment-friendly averages in the playbook do not reflect it.
• • •
The task-profile constraint
The 71% productivity gain headline travels well. It also travels misleadingly. It depends on a specific task profile, and most knowledge work doesn't fit it.
The work where agentic AI delivered the biggest gains shared four properties: high volume, clear success criteria, recoverable errors, and data accessible across systems. Call center triage, alert filtering, invoice processing, procurement at scale, document extraction. In the report's own data, agentic implementations clustered tightly in five functions:
The implication is uncomfortable for most enterprises. A lot of organizations don't have enough of the high-volume, clear-success-criteria work for the agentic numbers to move the P&L. A 200-person professional services firm doesn't have 40,000 monthly tickets. A mid-size manufacturer doesn't have 100,000 invoices. The supermarket case in the report (autonomous procurement, doubled EBITDA margin) worked partly because the company had thousands of SKUs across dozens of stores. Continuous, measurable, repeatable decisions. Strip out volume and the agentic ROI math gets much harder.
Gains scale with task homogeneity, not company size. The "AI is for the big guys" assumption is roughly backwards.
The honest move for a leader is to audit, before committing to an AI transformation roadmap, what fraction of the organization's work actually fits the high-leverage profile. If it's small, the right strategy is augmentation at the margins, not agentic transformation. Both are real strategies. Confusing them is how organizations end up in the 95% failure bucket.
• • •
The variable the report doesn't measure: worker capability
Here is where the playbook leaves the most important question unanswered. The report measures organizational readiness (sponsorship, process, change management) but never the individual capability of the humans operating the AI systems. That omission matters, because the productivity gains the report celebrates depend on a specific kind of worker, and most organizations don't have many of them.
The skill that matters isn't "prompt engineering" in the way most corporate training programs frame it. It's a fundamentally different cognitive mode: decomposing a work outcome into discrete events, mapping which system or agent handles each event, designing the handoffs, and supervising the chain. That's closer to systems thinking, process architecture, and product management than to writing better prompts. Most companies running prompt-engineering workshops and calling it AI training are roughly teaching people to type and calling it software engineering.
The report flattens two very different strategies into a single "agentic" bucket. They look similar from a productivity-curve perspective. They imply completely different talent investments.
This distinction matters because the talent question for the two paths has nothing in common. Path A asks whether the organization has two or three people who can architect and maintain a replacement system. Path B asks whether the organization can develop or hire a meaningful share of its workforce into a new cognitive mode. One that didn't exist in the old org chart, that nobody was hired for, that almost nobody was trained for.
The kind of person who thrives in Path B carries a specific profile: comfortable with ambiguity, willing to be wrong publicly and iterate, cross-functional enough to know what good output looks like in domains they don't own, equipped with the systems-thinking instinct to decompose work into events. That is a real personality and cognitive profile. The trait distributes across the population, but not uniformly, and it does not correlate cleanly with tenure, title, or past performance in a narrow role.
Organizations are about to discover that their highest performers in the old model are not always their highest performers in the new one. That will get politically and culturally brutal.
The report also doesn't engage with a training ROI problem that sits underneath all of this. Executives can't measure training ROI, and the training that actually matters now widens operating range across roles. That kind of training has no clean metric. L&D budgets deepen expertise inside a role, because that's measurable. The training that matters now teaches a marketing manager to also do light analytics, a product manager to draft a customer email, a customer service rep to handle invoice exceptions. There is no clean ROI number for any of that. So companies default to the legible investment and miss the real opportunity.
• • •
Why the middle market may win this
The PE partner quoted in the report says it plainly, and the rest of the document mostly walks around it: "SMEs can respond much better to this leverage, and they can actually be the winners of this revolution. They don't have that much legacy systems. They didn't know what to do with unstructured data, and now they can use it. And they lack resources, and the resources can get augmented with AI."
Read that carefully. A senior PE investor, someone whose entire business runs on buying and improving companies at scale, is saying that the structural advantages of large enterprises are turning into liabilities. That should be the headline of a different report.
Middle-market companies, somewhere between 500 and 5,000 employees, sit in the structural sweet spot. Big enough to have real data and real problems worth solving. Small enough to redesign work without political wars. Often hungry enough to take risks the Fortune 500 can't justify. They will produce a disproportionate share of the breakout stories over the next three to five years, and the case studies in this very report keep proving the point even when the report doesn't call it out.
• • •
The bifurcation nobody is modeling
Pull these threads together and a pattern emerges that the report hints at without quite naming. Not "AI takes jobs." Not "AI augments workers." Both framings are too clean. The real shape is a three-tier bifurcation of the labor market, already starting, and organizations are not preparing for it.
Two further dynamics make the picture more uncomfortable. Workers can opt out. They can refuse to learn, refuse to do more work, refuse to become the cross-functional generalist the org now needs. Historically that didn't matter much because employers held the leverage. The specific worker profile organizations need now is scarce, so the leverage is partially inverting. People who can operate this way will command premium pay and move freely. Everyone else will operate in a different market entirely.
And corporate redundancy is becoming visible. Most large organizations carry layers of work that exist because coordination is hard, information moves slowly, specialization runs deep, and handoffs require translation. AI collapses the cost of coordination, accelerates information flow, makes specialization optional in many domains, and eliminates many handoffs. The work those layers did doesn't disappear. But the people doing that work become visible as overhead in a way they weren't before. That isn't a productivity gain. That's an exposure of slack that was previously invisible. Once it's visible, the political pressure to act on it grows enormous, regardless of whether leadership wants it to.
The Stanford report tells us how the companies that figured it out got there. It doesn't tell us what happens to the people who don't make the jump. That part of the playbook hasn't been written. People leaders are going to write it, whether they're ready or not.
• • •
What people leaders should do this quarter
The above is the long view. Here is the short one. Three concrete moves to make before the end of Q3.
The Stanford playbook is a snapshot of what success has looked like for early adopters. The patterns are real. The percentages are directional. The bifurcation underneath them is the part that should change how you plan.
The technology question is largely settled. The organizational question is where the variance lives now. The labor question is the one we haven't really started answering. People leaders will be the ones asked to answer it first.
• • •
Frequently Asked Questions
What is the Stanford Enterprise AI Playbook?
A 116-page report from the Stanford Digital Economy Lab (Pereira, Graylin, Brynjolfsson, April 2026) built on 60-minute structured interviews across 51 successful enterprise AI deployments at 41 organizations. It deliberately inverts MIT NANDA's finding that 95% of GenAI pilots fail, and instead studies what success actually looked like.
What's the most important finding in the report?
77% of the hardest challenges in successful AI deployments were non-technical: change management, data quality, process redesign. The technology is no longer the bottleneck for most enterprise work. Organizational capability is. That's the single biggest reframe in the document.
Did AI cause layoffs in the successful deployments?
45% of cases reduced headcount. The other 55% split between redeploying staff to higher-value work, avoiding hiring, or maintaining headcount while accelerating output. The report's authors flag clearly that this 45% is likely a floor, not a ceiling. The redeployment-heavy pattern reflects an early adoption phase that's unlikely to hold as model capability scales and CFO pressure intensifies.
Where does resistance to AI adoption actually come from?
From staff functions, not end users. The report found 35% of resistance came from Legal, HR, Risk, and Compliance, versus 23% from frontline users. Most playbooks try to win over the frontline. The veto comes from the staff floor. Bring Legal, HR, Risk, and Compliance to the kickoff in week two, not the courtesy review in month six.
Why might smaller and mid-market companies win this?
Less legacy infrastructure, fewer staff functions with veto power, flatter decision-making, and willingness to redesign work without political wars. The structural advantages of large scale (process maturity, deep specialization, large workforces) are becoming liabilities. The PE partner quoted in the report says it directly: SMEs can be the winners of this revolution.
What does the labor bifurcation actually look like?
Three tiers. A small slice, roughly 10–20% of workers, becomes radically more valuable as cross-functional operators of AI systems. A large middle gets squeezed: competent at their old role, unable or unwilling to make the cognitive jump. Frontline work holds up better than knowledge work, which is the inverse of what most people predicted. Organizations are not modeling this, and they will not have time to react.
• • •