What did Anthropic's Claude Code research paper find?

The June 2026 paper, "Agentic coding and persistent returns to expertise," analyzed ~400,000 Claude Code sessions from ~235,000 users over six months. The core finding is that domain expertise predicts success in agentic coding more reliably than a software engineering background. Management occupations actually scored the highest verified success rates, and the gap between software engineers and non-software occupations has not widened over the seven-month observation period.

What is the division of labor between users and Claude in a typical session?

Users make approximately 70% of planning decisions (deciding what to build), while Claude handles approximately 80% of execution decisions (how to implement it). A typical session runs about 4 turns with Claude producing around 10 actions and 2,400 words of output per user prompt.

How much better do expert users perform compared to novices?

Expert users trigger 12 Claude actions and 3,200 words of output per prompt, compared to 5 actions and 600 words for novices. Verified success rates are 28-33% for intermediate and expert users versus 15% for novices. When a session runs into trouble, novices abandon 19% of the time compared to just 5-7% for more experienced users.

Do software engineers outperform non-engineers in Claude Code?

Only marginally. Software engineers achieve 30% verified success overall (34% in code-producing sessions), versus 26% overall (29% in code-producing sessions) for non-software occupations. Every major occupation tested falls within 7 percentage points of software engineers. Management occupations actually score highest on verified success, and the gap has not grown over seven months.

How has Claude Code usage changed over the seven months studied?

Fixing broken code fell from 33% to 19% of sessions. Operating software grew from 14% to 21%. Writing and data analysis roughly doubled (from 10% to 20%). Average session value rose 27%, with building (+43%), operating (+34%), and fixing (+32%) all growing substantially.

What occupations are growing fastest in Claude Code usage?

Among non-software occupations, management, sales, and legal professionals are the fastest-growing user groups. Management occupations score the highest verified success rates, suggesting that organizational and strategic judgment transfers well to directing AI coding agents.

What are the nine work modes identified in the research?

56% of sessions involve writing, fixing, or testing code (25% building, 26% fixing, 5% testing and orchestrating). 17% involves operating software, 14% planning and exploring, and 13% analysis and prose generation.

Anthropic Research: Domain Expertise Beats Coding | explainx.ai Blog

On June 16, 2026, Anthropic published a research paper with a finding that quietly overturns a long-held assumption in the software industry: coding background is not the primary determinant of success when working with AI coding agents. The paper, "Agentic coding and persistent returns to expertise," authored by Zoe Hitzig, Maxim Massenkoff, Eva Lyubich, Ryan Heller, and Peter McCrory, draws on roughly 400,000 Claude Code sessions from approximately 235,000 users collected between October 2025 and April 2026. The methodology is privacy-preserving throughout.

The headline numbers are striking but the implications run deeper. This post unpacks what the data actually says, why the findings matter differently for different groups of people, and what the seven-month trajectory tells us about where agentic development is heading.

The Division of Labor at the Heart of Agentic Coding

Before asking who succeeds, it helps to understand what the human-AI collaboration actually looks like moment to moment.

The research maps a clear split: users make roughly 70% of planning decisions — the "what" — while Claude handles roughly 80% of execution decisions — the "how." This is not a vague observation. A typical session runs about 4 turns. Per user prompt, Claude produces approximately 10 actions and 2,400 words of output.

The ratio shifts dramatically with expertise. Expert users trigger 12 Claude actions and 3,200 words of output per prompt. Novices trigger 5 actions and 600 words. That is a 2.4x gap in how much work Claude does per unit of user input — and it maps almost perfectly onto outcome differences.

The implication is direct: what separates high-performing users is not their ability to write code themselves but their ability to specify problems clearly, decompose work effectively, and judge whether Claude's output is correct. Those are skills that come from domain mastery and project experience, not from years of writing Python or TypeScript.

Nine Work Modes: What People Actually Use Claude Code For

The research categorizes usage into nine distinct work modes with precise breakdowns:

Code work (56% of all sessions)

Building new code: 25%
Fixing broken code: 26%
Testing and orchestrating: 5%

Beyond code (44% of sessions)

Operating software: 17%
Planning and exploring: 14%
Analysis and prose: 13%

The fact that 44% of sessions involve no direct code writing is itself significant. Claude Code is being used as a general-purpose technical assistant, not just a code generator. Users are asking it to run processes, navigate data, explore architectures, and produce written analysis. This is consistent with the occupational diversity the paper documents.

Expertise Matters — But Where It Comes From Has Changed

Verified success rates by expertise level

The paper uses a conservative "verified success" metric alongside a "partial success" measure. The numbers:

Novices: 15% verified success, 77% at least partial success
Intermediate users: 28% verified success, 91% partial success
Expert users: 33% verified success, 92% partial success

A crucial structural observation: most of the expertise gain occurs in the novice-to-intermediate transition. The jump from intermediate to expert is real but smaller. This means the biggest productivity unlock from training, better prompting habits, or improved mental models comes early — not after years of practice.

When sessions hit trouble

The data on session abandonment is telling. When a session runs into trouble:

Novices abandon 19% of the time
Intermediate and expert users abandon only 5-7% of the time

Experienced users do not hit fewer walls. They navigate better when they do. They know how to reframe the problem, break it into smaller steps, or provide Claude with additional context. That is a learnable skill, and it does not require a computer science degree.

The Occupation Finding: Domain Expertise Outpaces Coding Background

This is the finding that deserves the most attention.

Software engineers achieve 30% verified success overall, rising to 34% in code-producing sessions specifically. Non-software occupations achieve 26% overall and 29% in code-producing sessions. That is a gap of 4-5 percentage points — meaningful but far smaller than most people would expect.

More importantly: every major occupation the researchers studied falls within 7 percentage points of software engineers. And management occupations actually score the highest verified success rates of any group measured.

The gap between software and non-software occupations has not widened over seven months of observation. If AI coding tools were primarily amplifying existing technical advantage, that gap would be growing. It is not.

Why management occupations might lead

Management professionals succeed at directing AI coding agents for the same reasons they succeed at directing human engineers: they are practiced at translating business problems into requirements, at evaluating whether an output actually solves the right problem, and at managing the iterative process of refinement. These judgment skills transfer directly. What they historically lacked — the ability to personally implement solutions — Claude now handles.

The fastest-growing non-software occupations using Claude Code are management, sales, and legal. Each of these fields involves complex domain knowledge, regulatory or strategic context, and a chronic need for technical implementation that previously required either learning to code or hiring an engineer.

Seven Months of Trajectory Data

The longitudinal view in the research may be more important than the point-in-time statistics.

What is declining

Fixing broken code dropped from 33% to 19% of all sessions. This is a significant structural shift. As Claude's code quality improves and users become more practiced at specifying requirements, they are spending less time in error-correction loops.

What is growing

Operating software grew from 14% to 21%. Writing and data analysis roughly doubled from 10% to 20%. These are higher-level, more domain-specific activities. The user base is moving up the value chain.

Session value over time

Average session value rose 27% across the seven-month period:

Building sessions: +43%
Operating sessions: +34%
Fixing sessions: +32%

Every category is becoming more productive, but building — creating new things from scratch — shows the steepest growth. This is not a tool people are using more cautiously over time. They are using it more ambitiously.

Implications for Different Groups

For software engineers

The research does not suggest software engineers will be replaced. It suggests that the moat protecting technically complex work from non-technical contributors is narrowing. Engineers will face more competent non-technical collaborators who can participate meaningfully in implementation decisions. The response is not to guard knowledge but to compete on the things domain expertise cannot replicate: architectural judgment, systems thinking at scale, security instincts, performance optimization.

The fact that verified success at 30% is still the highest measured number also means that technical depth still matters — it just matters less than it used to as a gatekeeper to building things.

For domain experts in non-technical fields

The paper's framing is explicit: "A person with domain command in any field may now be able to do technical work they previously could not." For lawyers, scientists, finance professionals, and healthcare workers, this represents a genuine capability expansion. Building a data pipeline to analyze case outcomes, writing a script to automate regulatory filings, or prototyping a clinical data dashboard are now accessible to people whose expertise is in law, medicine, or finance — not software.

The practical implication: invest in learning to specify problems clearly and evaluate outputs critically. Those skills now directly translate into working software.

For managers and business leaders

Management occupations scoring the highest verified success rates is not a coincidence. The research effectively validates that organizational judgment — knowing what to build, why it matters, and whether the output solves the real problem — is the most valuable input to an AI coding session.

For managers building or overseeing technical teams, this suggests a different kind of engagement with AI tools is now possible. Rather than delegating AI tool adoption entirely to engineers, management teams can engage directly with prototyping, analysis, and tooling.

For educators and hiring managers

Coding bootcamps and CS programs should note that the novice-to-intermediate transition is where the biggest productivity gains are unlocked. The curriculum question is no longer "how do we teach people to write code" but "how do we teach people to work with AI coding agents effectively." That means problem decomposition, specification writing, output evaluation, and iterative refinement — skills closer to technical writing and systems analysis than traditional programming instruction.

Hiring managers across industries should reconsider job descriptions that treat coding ability as a binary hiring criterion. The relevant question is no longer "can this person write code" but "can this person direct AI systems to produce good code for their domain."

For legal and sales professionals

The fastest-growing non-software occupation groups, legal and sales, share something important: they deal in complex, high-stakes domain knowledge that is hard to transfer to a generalist engineer. A legal professional who understands the specific requirements of a compliance workflow is now able to build tooling around that workflow directly. A sales operations analyst who understands the nuances of pipeline data can now build their own dashboards and automation without waiting for engineering capacity.

The barrier that remains is not technical aptitude. It is willingness to engage with the iterative, sometimes-failing nature of software development — which is exactly what the abandonment-rate data measures. Expert users stay in the problem 14 percentage points longer than novices when things go wrong.

What the Data Does Not Say

It is worth being clear about the limits of this research.

Verified success at 15-33% across the population means most sessions do not reach the researcher's threshold for verified completion. The paper is measuring a tool that is genuinely difficult to use well, not one that makes complex software trivially easy. The expertise effects are real precisely because there is still meaningful skill involved.

The paper also measures sessions, not projects. Building complex production software likely requires sustained expertise across many sessions in ways this study cannot fully capture. The 7-percentage-point gap between software engineers and non-engineers may widen for longer, more complex work.

The Structural Shift

The paper's most important conclusion is about what AI is doing to the relationship between expertise and output. For most of human history, domain expertise and the ability to implement solutions based on that expertise were separate skills. A lawyer knew contract law; an engineer built the contract management system. A scientist understood the biology; a programmer wrote the analysis pipeline.

Agentic coding is collapsing that separation. Domain expertise, the research suggests, is increasingly sufficient — not just helpful — for technical implementation. The data from management occupations is the clearest signal: people who are practiced at translating domain knowledge into clear requirements and evaluating whether outputs are correct are outperforming groups who traditionally monopolized the ability to build.

That is a structural change, not a marginal improvement. The seven months of trajectory data suggest it is accelerating rather than plateauing.

Sources

Primary paper: Agentic coding and persistent returns to expertise — Anthropic, June 16, 2026. Authors: Zoe Hitzig, Maxim Massenkoff, Eva Lyubich, Ryan Heller, Peter McCrory.
Related on explainx.ai: What is Claude Code? Complete guide — Claude Code product overview
Related on explainx.ai: Karpathy-inspired Claude Code guidelines — prompting and specification best practices
Related on explainx.ai: Agent harness engineering — scaffolding for agentic systems

Paper data covers October 2025 through April 2026. Statistics and trajectories reflect that observation window. All occupation and expertise comparisons are drawn directly from the published research.

The Division of Labor at the Heart of Agentic Coding

Before asking who succeeds, it helps to understand what the human-AI collaboration actually looks like moment to moment.

Nine Work Modes: What People Actually Use Claude Code For

The research categorizes usage into nine distinct work modes with precise breakdowns:

Code work (56% of all sessions)

Building new code: 25%
Fixing broken code: 26%
Testing and orchestrating: 5%

Beyond code (44% of sessions)

Operating software: 17%
Planning and exploring: 14%
Analysis and prose: 13%

Expertise Matters — But Where It Comes From Has Changed

Verified success rates by expertise level

The paper uses a conservative "verified success" metric alongside a "partial success" measure. The numbers:

Novices: 15% verified success, 77% at least partial success
Intermediate users: 28% verified success, 91% partial success
Expert users: 33% verified success, 92% partial success

When sessions hit trouble

The data on session abandonment is telling. When a session runs into trouble:

Novices abandon 19% of the time
Intermediate and expert users abandon only 5-7% of the time

The Occupation Finding: Domain Expertise Outpaces Coding Background

This is the finding that deserves the most attention.

Why management occupations might lead

Seven Months of Trajectory Data

The longitudinal view in the research may be more important than the point-in-time statistics.

What is declining

What is growing

Session value over time

Average session value rose 27% across the seven-month period:

Building sessions: +43%
Operating sessions: +34%
Fixing sessions: +32%

Implications for Different Groups

For software engineers

The fact that verified success at 30% is still the highest measured number also means that technical depth still matters — it just matters less than it used to as a gatekeeper to building things.

For domain experts in non-technical fields

The practical implication: invest in learning to specify problems clearly and evaluate outputs critically. Those skills now directly translate into working software.

For managers and business leaders

For educators and hiring managers

For legal and sales professionals

What the Data Does Not Say

It is worth being clear about the limits of this research.

The Structural Shift

That is a structural change, not a marginal improvement. The seven months of trajectory data suggest it is accelerating rather than plateauing.

Sources

Primary paper: Agentic coding and persistent returns to expertise — Anthropic, June 16, 2026. Authors: Zoe Hitzig, Maxim Massenkoff, Eva Lyubich, Ryan Heller, Peter McCrory.
Related on explainx.ai: What is Claude Code? Complete guide — Claude Code product overview
Related on explainx.ai: Karpathy-inspired Claude Code guidelines — prompting and specification best practices
Related on explainx.ai: Agent harness engineering — scaffolding for agentic systems

The Division of Labor at the Heart of Agentic Coding

Nine Work Modes: What People Actually Use Claude Code For

Code work (56% of all sessions)

Beyond code (44% of sessions)

Expertise Matters — But Where It Comes From Has Changed

Verified success rates by expertise level

When sessions hit trouble

The Occupation Finding: Domain Expertise Outpaces Coding Background

Why management occupations might lead

Seven Months of Trajectory Data

What is declining

What is growing

Session value over time

Implications for Different Groups

For software engineers

For domain experts in non-technical fields

For managers and business leaders

For educators and hiring managers

For legal and sales professionals

What the Data Does Not Say

The Structural Shift

Sources

The Division of Labor at the Heart of Agentic Coding

Nine Work Modes: What People Actually Use Claude Code For

Code work (56% of all sessions)

Beyond code (44% of sessions)

Expertise Matters — But Where It Comes From Has Changed

Verified success rates by expertise level

When sessions hit trouble

The Occupation Finding: Domain Expertise Outpaces Coding Background

Why management occupations might lead

Seven Months of Trajectory Data

What is declining

What is growing

Session value over time

Implications for Different Groups

For software engineers

For domain experts in non-technical fields

For managers and business leaders

For educators and hiring managers

For legal and sales professionals

What the Data Does Not Say

The Structural Shift

Sources

Related posts

Claude Code Artifacts + MCP: Live Dashboards With Viewer-Scoped Auth

OpenAI Codex Plugin for Claude Code — Setup, Commands, and Who Benefits

Claude Code Desktop Browser: Built-In Web Browsing in the App (July 2026)

Related posts

Claude Code Artifacts + MCP: Live Dashboards With Viewer-Scoped Auth

OpenAI Codex Plugin for Claude Code — Setup, Commands, and Who Benefits

Claude Code Desktop Browser: Built-In Web Browsing in the App (July 2026)