Training Consulting Studio Framework Insights About Contact

The Enterprise AI Playbook has a labor problem Stanford didn't quite name.

Stanford studied the 51 enterprise AI deployments that actually worked. The technology question is largely settled. The labor question is the one nobody is answering yet.

The report: Enterprise AI Playbook by Pereira, Graylin, and Brynjolfsson. Stanford Digital Economy Lab, April 2026. 116 pages, 51 successful deployments across 41 organizations, built from 60-minute structured interviews conducted Aug 2025–Feb 2026. Read the original PDF →

The 95% you've already heard about

Last year MIT's NANDA initiative published a number that took on a life of its own: 95% of enterprise GenAI pilots fail to produce measurable financial impact. Every consultant, every board, every CFO now quotes it back to you. It became the convenient excuse for slowing down, and the convenient warning for not starting.

Stanford's Digital Economy Lab decided to invert the lens. Instead of studying the 95% that failed, they studied the 5% that didn't. The result: 51 successful enterprise AI deployments across 41 organizations, interviewed in depth between August 2025 and February 2026, written up by Pereira, Graylin, and Brynjolfsson into a 116-page report titled Enterprise AI Playbook.

It's one of the most grounded pieces of enterprise-AI research we've read this year. The case studies are concrete, the framing is honest about its limits, and the methodology is unusually disciplined. But the report carries a quiet thesis underneath the playbook. It's about labor, about which kinds of organizations win, and about a bifurcation the authors hint at without quite naming. That thesis is the part people leaders need to read carefully.

Here's our read.

• • •

Five findings that should change how you plan

The report is dense with statistics. Most are directional rather than precise, given the methodology: interview-based, self-reported deployments with explicit selection bias toward success. Five findings still cut deep enough to reshape how a people leader should think about the next twelve months.

77%
Non-technical hard problems
Change management, data quality, and process redesign owned the hardest challenges. The AI itself rarely did.
61%
Had a prior failure
Most successful deployments were preceded by a failed one. Budget for the first attempt to break.
71% / 30%
Oversight model matters
Escalation-based AI delivered 71% median productivity gains. Approval-on-every-output delivered 30%.
42%
Model was commodity
In 42% of cases the choice of foundation model didn't matter. Rises to 71% for routine tasks.
45% / 55%
Headcount split
45% of successful deployments reduced headcount. 55% redeployed, avoided hiring, or maintained staff.

Each of these has a downstream implication that doesn't appear on the chart. Taken together they say something simple and important: the constraints on enterprise AI success have moved.

• • •

The bottleneck has moved

For most of the GenAI era, the implicit story has been "the models aren't good enough yet." Boards used that to justify waiting. CTOs used it to justify pilots. CEOs used it to justify keeping the budget where it is.

This report, paired with the MIT NANDA finding it builds on, says that's no longer the binding constraint for a large share of enterprise work. The model has become a commodity. The variance lives somewhere else.

Through 2024
Technology was the bottleneck.
Wait for better models. Wait for cheaper inference. Wait for context windows to expand. The strategic question was when to start.
2026 →
Organization is the bottleneck.
Same models. Same use case. Weeks at one company, years at another. 77% of the hard part is change, data, and process. The strategic question is whether you can absorb it.

Companies still waiting for "better AI" before committing to redesign are misreading the problem. The capability already runs ahead of most organizations' ability to absorb it. Companies that figure out the absorption side compound capability quickly. Each successful deployment makes the next one cheaper, because the platform, the change-management muscle, and the proprietary data already exist. Companies still running proof-of-concept theater burn cycles without building any of that substrate.

The gap widens with every iteration, even when both sides have equal access to the underlying models. That's the J-curve playing out at the firm level.

• • •

Where resistance actually comes from

This finding flips a common assumption. Ask any executive where resistance to AI rollouts comes from and they'll point to the frontline: the people whose jobs the tool might absorb. The report says the opposite.

Staff functions (Legal, HR, Risk, Compliance) 35%
The veto players you didn't bring in early
End users and frontline staff 23%
The audience most playbooks target

Staff functions resist for legitimate reasons. They manage risk, ensure compliance, and slow things down enough to think. They also resist for territorial ones. Nobody consulted them. The team bought the tool without their input. The policy implications never ran past them. So they raise their objections at the last possible moment, when the cost of addressing them runs highest.

Most AI playbooks try to win over the frontline. The veto comes from the staff floor. People leaders who skip Legal, HR, Risk, and Compliance during week two pay for it in month six.

The implication is operational. Whoever owns the rollout needs the staff functions sitting at the same kickoff table as IT and the business unit. Not a separate briefing. Not a courtesy review at the end. Same table, week two. The cost of that conversation is small. The cost of skipping it is the entire program.

• • •

The redeployment story and its expiration date

The most reassuring finding in the report is the headcount split. In the majority of cases, AI did not cost people their jobs.

45%
Reduced headcount
The function shrunk. People exited. Often the top-line headline.
55%
Redeployed · avoided hiring · maintained
People moved to higher-value work, growth was absorbed without new hires, or acceleration won the argument over cuts.
The report's authors flag this directly: 45% is likely a floor, not a ceiling. Today's pattern reflects an early adoption phase.

The 55% breaks into three sub-patterns. Redeployment moves people from the automated task to higher-value work. The security operations case in the report shows this clearly: 4.5 FTEs shifted from alert triage to threat hunting after the team's monthly alert load jumped from 1,500 to 40,000. Hiring avoidance absorbs growth without adding bodies. The existing team handles the work the new hires would have done. Acceleration over cuts shows up in the EdTech case: 20–30% engineering productivity gains, PE owners pushing for layoffs, the CTO winning the argument to reinvest in the product roadmap instead.

The good news is real. Framing matters as much as the technology. Revenue-framed projects tend toward redeployment and acceleration. Cost-framed projects tend toward direct cuts. How a leader sets up the project at the start meaningfully shapes what happens to the people at the end. People leaders have more leverage here than they often use.

The harder news, which the report names but doesn't dwell on, is that this pattern is unlikely to hold. Three forces converge on a darker reading. First, models keep getting more capable. METR's measurements show the autonomous-task-length capability of frontier models doubling roughly every seven months. Frontier models handled tasks taking expert humans about an hour at 50% reliability as of early 2025. On that trend, multi-hour autonomous work arrives within a year, and day-long autonomous work within two to three. Second, cost pressure intensifies as the novelty wears off and CFOs ask harder ROI questions. Third, once early adopters prove the model works, competitive pressure to cut becomes harder to resist.

The report's own leading indicator comes from Brynjolfsson's separate ADP-payroll research: early-career workers aged 22–25 in AI-exposed roles have already dropped 16%, with software developers in that age band down close to 20%. That is not a future trend. It is happening now. The redeployment-friendly averages in the playbook do not reflect it.

• • •

The task-profile constraint

The 71% productivity gain headline travels well. It also travels misleadingly. It depends on a specific task profile, and most knowledge work doesn't fit it.

The work where agentic AI delivered the biggest gains shared four properties: high volume, clear success criteria, recoverable errors, and data accessible across systems. Call center triage, alert filtering, invoice processing, procurement at scale, document extraction. In the report's own data, agentic implementations clustered tightly in five functions:

Procurement
Field service
Security ops
Coding
Customer support triage
Everything else (strategy, sales relationships, design, most management work): human-in-the-loop, 22% median gains. Useful, but not transformative.

The implication is uncomfortable for most enterprises. A lot of organizations don't have enough of the high-volume, clear-success-criteria work for the agentic numbers to move the P&L. A 200-person professional services firm doesn't have 40,000 monthly tickets. A mid-size manufacturer doesn't have 100,000 invoices. The supermarket case in the report (autonomous procurement, doubled EBITDA margin) worked partly because the company had thousands of SKUs across dozens of stores. Continuous, measurable, repeatable decisions. Strip out volume and the agentic ROI math gets much harder.

Gains scale with task homogeneity, not company size. The "AI is for the big guys" assumption is roughly backwards.

The honest move for a leader is to audit, before committing to an AI transformation roadmap, what fraction of the organization's work actually fits the high-leverage profile. If it's small, the right strategy is augmentation at the margins, not agentic transformation. Both are real strategies. Confusing them is how organizations end up in the 95% failure bucket.

• • •

The variable the report doesn't measure: worker capability

Here is where the playbook leaves the most important question unanswered. The report measures organizational readiness (sponsorship, process, change management) but never the individual capability of the humans operating the AI systems. That omission matters, because the productivity gains the report celebrates depend on a specific kind of worker, and most organizations don't have many of them.

The skill that matters isn't "prompt engineering" in the way most corporate training programs frame it. It's a fundamentally different cognitive mode: decomposing a work outcome into discrete events, mapping which system or agent handles each event, designing the handoffs, and supervising the chain. That's closer to systems thinking, process architecture, and product management than to writing better prompts. Most companies running prompt-engineering workshops and calling it AI training are roughly teaching people to type and calling it software engineering.

The report flattens two very different strategies into a single "agentic" bucket. They look similar from a productivity-curve perspective. They imply completely different talent investments.

Same productivity gain. Very different organizations.
PATH A
Autonomous agents operating end-to-end
Model
Build or buy systems that replace a function.
Talent
A few sophisticated builders. Many fewer operators.
Headcount
In the affected function, trends to zero.
Where it works
Narrow, high-volume, measurable domains (the supermarket case).
PATH B
Humans deploying autonomous agents across domains
Model
Workforce of high-leverage operators, each running multiple agentic workflows.
Talent
Many people with a specific profile that barely existed two years ago.
Headcount
Doesn't collapse. It transforms.
Where it works
Cross-functional knowledge work, when the people can be developed.

This distinction matters because the talent question for the two paths has nothing in common. Path A asks whether the organization has two or three people who can architect and maintain a replacement system. Path B asks whether the organization can develop or hire a meaningful share of its workforce into a new cognitive mode. One that didn't exist in the old org chart, that nobody was hired for, that almost nobody was trained for.

The kind of person who thrives in Path B carries a specific profile: comfortable with ambiguity, willing to be wrong publicly and iterate, cross-functional enough to know what good output looks like in domains they don't own, equipped with the systems-thinking instinct to decompose work into events. That is a real personality and cognitive profile. The trait distributes across the population, but not uniformly, and it does not correlate cleanly with tenure, title, or past performance in a narrow role.

Organizations are about to discover that their highest performers in the old model are not always their highest performers in the new one. That will get politically and culturally brutal.

The report also doesn't engage with a training ROI problem that sits underneath all of this. Executives can't measure training ROI, and the training that actually matters now widens operating range across roles. That kind of training has no clean metric. L&D budgets deepen expertise inside a role, because that's measurable. The training that matters now teaches a marketing manager to also do light analytics, a product manager to draft a customer email, a customer service rep to handle invoice exceptions. There is no clean ROI number for any of that. So companies default to the legible investment and miss the real opportunity.

• • •

Why the middle market may win this

The PE partner quoted in the report says it plainly, and the rest of the document mostly walks around it: "SMEs can respond much better to this leverage, and they can actually be the winners of this revolution. They don't have that much legacy systems. They didn't know what to do with unstructured data, and now they can use it. And they lack resources, and the resources can get augmented with AI."

Read that carefully. A senior PE investor, someone whose entire business runs on buying and improving companies at scale, is saying that the structural advantages of large enterprises are turning into liabilities. That should be the headline of a different report.

Used to be an advantage
What it becomes
Mature, codified processes
Political wars to redesign anything
Deep functional specialization
Siloed expertise that resists cross-functional work
Large staff functions (Legal, HR, Risk)
Veto layer slowing every deployment
Large workforces absorbing variance
Visible overhead once coordination collapses
Legacy systems integration
Sunk cost making rebuilds politically impossible

Middle-market companies, somewhere between 500 and 5,000 employees, sit in the structural sweet spot. Big enough to have real data and real problems worth solving. Small enough to redesign work without political wars. Often hungry enough to take risks the Fortune 500 can't justify. They will produce a disproportionate share of the breakout stories over the next three to five years, and the case studies in this very report keep proving the point even when the report doesn't call it out.

• • •

The bifurcation nobody is modeling

Pull these threads together and a pattern emerges that the report hints at without quite naming. Not "AI takes jobs." Not "AI augments workers." Both framings are too clean. The real shape is a three-tier bifurcation of the labor market, already starting, and organizations are not preparing for it.

10–20% · small slice
High-leverage operators
Cross-functional, comfortable with ambiguity, fluent with agentic systems. Radically more valuable. Premium compensation. Increasingly work in flatter structures that look more like partnerships than employment.
large band · the squeezed middle
The squeezed middle
Competent at their old role. Unable or unwilling to make the cognitive jump. Increasingly visible as redundant overhead in restructured workflows. Some redeploy successfully. Many don't.
medium band · resilient
Frontline resilience
Work that requires physical presence, emotional labor, or genuine human judgment in messy situations holds up better than knowledge work. The inverse of what most people predicted five years ago.
The size weighting is directional. The point is the shape: a small premium tier, a large squeezed middle, a resilient frontline. AI restructures knowledge work. It does not protect it.

Two further dynamics make the picture more uncomfortable. Workers can opt out. They can refuse to learn, refuse to do more work, refuse to become the cross-functional generalist the org now needs. Historically that didn't matter much because employers held the leverage. The specific worker profile organizations need now is scarce, so the leverage is partially inverting. People who can operate this way will command premium pay and move freely. Everyone else will operate in a different market entirely.

And corporate redundancy is becoming visible. Most large organizations carry layers of work that exist because coordination is hard, information moves slowly, specialization runs deep, and handoffs require translation. AI collapses the cost of coordination, accelerates information flow, makes specialization optional in many domains, and eliminates many handoffs. The work those layers did doesn't disappear. But the people doing that work become visible as overhead in a way they weren't before. That isn't a productivity gain. That's an exposure of slack that was previously invisible. Once it's visible, the political pressure to act on it grows enormous, regardless of whether leadership wants it to.

The Stanford report tells us how the companies that figured it out got there. It doesn't tell us what happens to the people who don't make the jump. That part of the playbook hasn't been written. People leaders are going to write it, whether they're ready or not.

• • •

What people leaders should do this quarter

The above is the long view. Here is the short one. Three concrete moves to make before the end of Q3.

01
Measure AI fluency, not just usage.
A license count is not a skill metric. Survey your team across Conceptual, Operational, and Governance. Find out where the fluency actually sits. It's usually not where you assume.
02
Choose Path A or Path B, explicitly.
Are you replacing functions, or building a workforce of operators? Most orgs are drifting into both without naming either. Name it. The talent and org design implications fork hard.
03
Bring staff functions to week-two kickoff.
Legal, HR, Risk, Compliance. Not a briefing. The same kickoff table. The veto comes from this floor. Addressing it later costs the program.
The Throughline

The Stanford playbook is a snapshot of what success has looked like for early adopters. The patterns are real. The percentages are directional. The bifurcation underneath them is the part that should change how you plan.

The technology question is largely settled. The organizational question is where the variance lives now. The labor question is the one we haven't really started answering. People leaders will be the ones asked to answer it first.

• • •

Frequently Asked Questions

What is the Stanford Enterprise AI Playbook?

A 116-page report from the Stanford Digital Economy Lab (Pereira, Graylin, Brynjolfsson, April 2026) built on 60-minute structured interviews across 51 successful enterprise AI deployments at 41 organizations. It deliberately inverts MIT NANDA's finding that 95% of GenAI pilots fail, and instead studies what success actually looked like.

What's the most important finding in the report?

77% of the hardest challenges in successful AI deployments were non-technical: change management, data quality, process redesign. The technology is no longer the bottleneck for most enterprise work. Organizational capability is. That's the single biggest reframe in the document.

Did AI cause layoffs in the successful deployments?

45% of cases reduced headcount. The other 55% split between redeploying staff to higher-value work, avoiding hiring, or maintaining headcount while accelerating output. The report's authors flag clearly that this 45% is likely a floor, not a ceiling. The redeployment-heavy pattern reflects an early adoption phase that's unlikely to hold as model capability scales and CFO pressure intensifies.

Where does resistance to AI adoption actually come from?

From staff functions, not end users. The report found 35% of resistance came from Legal, HR, Risk, and Compliance, versus 23% from frontline users. Most playbooks try to win over the frontline. The veto comes from the staff floor. Bring Legal, HR, Risk, and Compliance to the kickoff in week two, not the courtesy review in month six.

Why might smaller and mid-market companies win this?

Less legacy infrastructure, fewer staff functions with veto power, flatter decision-making, and willingness to redesign work without political wars. The structural advantages of large scale (process maturity, deep specialization, large workforces) are becoming liabilities. The PE partner quoted in the report says it directly: SMEs can be the winners of this revolution.

What does the labor bifurcation actually look like?

Three tiers. A small slice, roughly 10–20% of workers, becomes radically more valuable as cross-functional operators of AI systems. A large middle gets squeezed: competent at their old role, unable or unwilling to make the cognitive jump. Frontline work holds up better than knowledge work, which is the inverse of what most people predicted. Organizations are not modeling this, and they will not have time to react.

• • •

References

[1]
Pereira, J., Graylin, J., & Brynjolfsson, E. (April 2026). Enterprise AI Playbook. Stanford Digital Economy Lab. PDF
The report this article reviews. Studies 51 successful enterprise AI deployments across 41 organizations to identify patterns of success.
[2]
Challapally, A., Pease, C., Raskar, R., & Chari, P. (August 2025). The GenAI Divide: State of AI in Business 2025. MIT Project NANDA. PDF
Source of the widely-cited finding that 95% of enterprise GenAI pilots deliver no measurable P&L impact, based on 300+ initiative reviews and 153 survey responses.
[3]
Kwa, T., West, B., Becker, J., Deng, A., Garcia, K., et al. (March 2025). Measuring AI Ability to Complete Long Tasks. METR. Blog · arXiv
Documents a roughly seven-month doubling time in AI autonomous task-completion ability, with frontier models reaching about one hour of expert-equivalent work at 50% reliability as of early 2025.
[4]
Brynjolfsson, E., Chandar, B., & Chen, R. (November 2025). Canaries in the Coal Mine? Six Facts about the Recent Employment Effects of Artificial Intelligence. Stanford Digital Economy Lab. Paper
Uses ADP payroll microdata to document a 16% relative employment decline for ages 22–25 in AI-exposed occupations, with software developers in that cohort down close to 20%.

Start with the diagnostic.

Five minutes, anonymous. See where your team actually sits across Conceptual, Operational, and Governance, and which side of the bifurcation you're currently building toward.

Take the Diagnostic Work With Us