Why don't we understand why AI works?

Despite AI's empirical success, we lack a complete theory explaining why massive transformer models trained on huge datasets perform reasoning, coding, and creative tasks so effectively. Scaling laws predict performance gains, but not the underlying mechanisms—similar to how thermodynamics worked before we understood atomic theory.

What are scaling laws in AI?

Scaling laws describe the predictable correlation between compute power, dataset size, and model performance. As you increase parameters, training compute, and data, model capabilities improve following power-law relationships. However, scaling laws tell us 'what' improves, not 'why' it improves.

What are emergent abilities in large language models?

Emergent abilities are capabilities that appear in larger models but not in smaller ones, and cannot be predicted by extrapolating smaller model performance. Examples include in-context learning, chain-of-thought reasoning, and instruction following. However, some researchers debate whether emergence is real or a measurement artifact.

Mandy Lu is an AI researcher at Google with a PhD in Computational and Mathematical Engineering from Stanford. Her research focuses on using computer vision for health and climate AI, including Parkinson's disease assessment and statistical techniques to reduce confounding variables in ML models.

Google AI Researcher Sparks Debate: We Still Don't Know | explainx.ai Blog

explainx.ainewsletter3.5k

workshops ↗

Google AI Researcher Sparks Debate: We Still Don't Know | explainx.ai Blog | explainx.ai

On June 3, 2026, Mandy Lu—a Google AI researcher with a Stanford PhD in computational mathematics—posted a simple statement on X (formerly Twitter) that ignited a viral debate:

"we still have no satisfying theory for why AI works"

The post struck a nerve. With over 80 reactions and hundreds of replies, it exposed an uncomfortable truth in the AI research community:

Despite transformers powering ChatGPT, Codex, and every major AI breakthrough since 2017, no one fully understands why they work so well.

We have scaling laws that predict performance. We have mechanistic interpretability that maps features. We have emergent abilities that appear unpredictably.

But we don't have a unified theory explaining why massive models trained on internet-scale data can reason, code, and create.

It's like building a rocket to the moon using thermodynamics—but without understanding atoms.

TL;DR

Topic	Key Facts
The Problem	No satisfying theory explains why transformers trained on massive datasets perform reasoning, coding, and creative tasks so effectively.
Scaling Laws	Predict performance gains from compute, data, and parameters—but not the underlying reasons why scaling works.
Emergent Abilities	Capabilities (reasoning, in-context learning) appear in large models unpredictably; some debate if emergence is real or measurement artifact.
Mechanistic Interpretability	Reverse-engineers neural networks to map features and pathways; named MIT's 2026 Breakthrough Technology.
Current State	AI works empirically (thermodynamics) but lacks fundamental theory (atomic physics). Practical results prioritized over theory.
Debate Context	Sparked by Mandy Lu (Google AI, Stanford PhD) on X; reflects broader tension in AI research community.

Who is Mandy Lu?

Mandy Lu is an AI researcher at Google working on health and climate applications. She holds a PhD in Computational and Mathematical Engineering from Stanford University, admitted Autumn 2025.

Her academic background includes:

Dual bachelor's degree in Math and Computer Science from Stanford
Master's degree with concentration in AI from Stanford
Research in the Stanford Vision Lab advised by Prof. Fei-Fei Li and Prof. Juan Carlos Niebles
Work in the Computational Neuroscience Laboratory under Prof. Ehsan Adeli and Prof. Kilian Pohl

Her research projects focus on:

Using computer vision to develop systems for Parkinson's disease assessment
Developing statistical techniques to reduce the effect of confounding variables on ML models

Her interdisciplinary background in computational mathematics, neuroscience, and AI makes her uniquely positioned to identify gaps between empirical results and theoretical understanding.

The Statement That Sparked the Debate

On June 3, 2026, Lu posted:

"we still have no satisfying theory for why AI works"

The simplicity and directness of the statement resonated across the AI research community, tech workers, and skeptics alike.

Replies ranged from:

Technical explanations of scaling laws and mechanistic interpretability
Comparisons to thermodynamics before atomic theory
Debates over whether practical results matter more than theory
Criticisms that AI is overhyped and doesn't actually "work" as claimed

The discussion exposed a fundamental tension in AI research: we can build increasingly powerful systems, but we can't fully explain them.

What We Know: The Empirical Evidence

Before exploring what we don't understand, let's establish what we do know.

1. Transformers Work Empirically

Since the "Attention Is All You Need" paper (Vaswani et al., 2017), transformers have dominated AI:

Large Language Models (GPT-4, Claude, Gemini)
Vision Models (CLIP, SAM, DALL-E)
Multimodal Models (GPT-4o, Gemini 1.5)
Code Generation (Codex, GitHub Copilot)
Scientific Discovery (AlphaFold 3, protein design)

The architecture works—undeniably and reproducibly.

2. Scaling Laws are Predictable

Research from OpenAI (2020) and DeepMind's Chinchilla (2022) established that model performance follows power-law relationships with:

Model size (number of parameters)
Dataset size (number of training tokens)
Compute (FLOPs during training)

You can predict the performance of a 100B parameter model trained on 2T tokens before you train it.

3. Emergent Abilities Appear

As models scale, new capabilities appear that weren't present in smaller models:

In-context learning (learning from examples in the prompt)
Chain-of-thought reasoning (step-by-step problem solving)
Instruction following (understanding and executing complex requests)
Multi-step planning (breaking down tasks)

These abilities emerge at certain scale thresholds—but we can't predict exactly when or why.

4. Mechanistic Interpretability is Advancing

Researchers can now:

Identify features corresponding to recognizable concepts (Anthropic's "dictionary learning")
Trace pathways a model takes from prompt to response
Intervene on specific attention heads to control model behavior
Map geometric representations of knowledge in high-dimensional space

MIT Technology Review named mechanistic interpretability one of its 10 Breakthrough Technologies for 2026.

What We Don't Know: The Theory Gap

Despite these advances, fundamental questions remain unanswered.

1. Why Does Self-Attention Work So Well?

The self-attention mechanism is the core of transformers. It allows the model to:

Look at every token in a sequence simultaneously
Compute relationships between all tokens
Parallelize computation across GPUs

We know how it works mathematically. We can implement it. We can optimize it.

But why does this particular mechanism enable reasoning, creativity, and generalization?

2. Why Do Scaling Laws Hold?

Scaling laws are descriptive, not explanatory.

They tell us:

"If you 10x the compute, performance improves by Y%"

They don't tell us:

"Why does more compute lead to better reasoning?"
"What changes in the model's internal structure as it scales?"
"Why do power laws govern AI performance?"

As one researcher put it: "Scaling laws are like the ideal gas law. They predict behavior, but they're not a fundamental theory."

3. Why Do Emergent Abilities Appear?

Emergent abilities are the most mysterious phenomenon in AI.

At some scale threshold, models suddenly:

Learn to follow multi-step instructions
Perform arithmetic they weren't explicitly trained on
Reason through novel problems

Why?

Some researchers argue emergence is real—a phase transition in the model's internal representations.

Others argue it's a measurement artifact: we're using crude metrics that fail to capture gradual improvements in smaller models, making progress look sudden when it's actually continuous.

A 2026 paper proposed that LLMs are non-ergodic systems where capabilities emerge through discrete transitions guided by constraint interactions—but this is still a hypothesis, not a proven theory.

4. Why Does Pre-Training on Internet Data Work?

Models are trained on:

Web pages (Reddit, Wikipedia, blogs)
Code repositories (GitHub, Stack Overflow)
Books (fiction, non-fiction, technical manuals)

Somehow, this leads to models that can:

Diagnose diseases
Write legal contracts
Prove mathematical theorems
Generate novel protein structures

Why does next-token prediction on internet text lead to general reasoning abilities?

As one X commenter put it:

"Tech VCs thought AI was sentient because they had never read books and hence the LLM was beyond anything they had ever seen."

The implication: LLMs might just be extremely good pattern matchers, not true reasoners. But if that's the case—why do they generalize so well?

The Thermodynamics Analogy

Multiple replies to Lu's post invoked a historical analogy:

"We're using AI like we used thermodynamics before we understood atomic theory."

In the 1800s, engineers built steam engines using thermodynamics. They could:

Predict how much work an engine would produce
Optimize efficiency
Build machines that powered the Industrial Revolution

But they didn't understand why thermodynamics worked until the kinetic theory of gases and statistical mechanics explained heat and pressure in terms of molecular motion.

Similarly, AI researchers today can:

Predict how models will perform (scaling laws)
Optimize architectures (transformers, MoE, SSMs)
Build systems that transform industries (ChatGPT, Codex)

But we lack the fundamental theory that explains why transformers work at a mechanistic, first-principles level.

The Mechanistic Interpretability Response

One camp of researchers argues we're making progress toward a theory through mechanistic interpretability.

What is Mechanistic Interpretability?

Mechanistic interpretability reverse-engineers neural networks to understand how AI actually thinks. It aims to uncover how a model computes outputs by analyzing:

Weights (learned parameters)
Neuron activations (what fires when)
Information pathways (how data flows through layers)

Recent Breakthroughs

Anthropic has led this field with several major advances:

1. Feature Dictionary Learning (2024)

Anthropic announced a "microscope" that identified features corresponding to recognizable concepts:

Neurons that activate for "Golden Gate Bridge"
Neurons that activate for "code syntax errors"
Neurons that activate for "sarcasm"

2. Circuit Tracing (2025)

Anthropic traced whole sequences of features and the path a model takes from prompt to response, showing:

How models compose simpler features into complex concepts
How attention heads route information
How models perform multi-step reasoning

3. Targeted Intervention (2026)

Researchers demonstrated selective control of model behavior by:

Suppressing toxic outputs
Manipulating semantic content
Enhancing factual accuracy

Anthropic has stated its goal: "Reliably detect most AI model problems by 2027 using interpretability tools."

The Geometric Foundation

Recent research suggests knowledge is encoded as geometry in high-dimensional space.

Models represent concepts as vectors, and relationships between concepts correspond to geometric relationships (distances, angles, subspaces).

This explains:

Word analogies (king - man + woman = queen)
Concept composition (combining features to form new ideas)
Transfer learning (representations generalize across tasks)

But why does gradient descent on next-token prediction lead to these semantically meaningful geometric structures?

That's still an open question.

The Scaling Law Debate

Another reply thread focused on scaling laws as a partial theory.

What Scaling Laws Tell Us

The OpenAI Scaling Laws paper (2020) and DeepMind's Chinchilla paper (2022) established:

Loss decreases as a power law with:
- Model size (N)
- Dataset size (D)
- Compute (C)
Optimal allocation of compute requires balancing model size and data:
- Training a 70B model on 1.4T tokens is better than training a 175B model on 300B tokens (same compute budget)
Emergent abilities correlate with scale:
- In-context learning improves predictably
- Instruction following emerges at ~10B parameters
- Chain-of-thought reasoning emerges at ~100B parameters

What Scaling Laws Don't Tell Us

Scaling laws are phenomenological: they describe what happens, not why.

They don't explain:

Why power laws govern AI performance
Why the exponents have the values they do
Why emergent abilities appear at specific thresholds
What changes inside the model as it scales

A 2026 unified framework connected scaling laws to in-context learning emergence, showing that ICL performance follows power-law relationships with model depth, width, context length, and training data—but the exponents are determined by task structure.

This is progress toward theory, but still descriptive, not first-principles.

The "AI Doesn't Actually Work" Argument

Some replies pushed back on the premise, arguing AI doesn't work as well as claimed.

One X commenter wrote:

"I would be way more bullish on AI if it actually worked and was actually replacing real humans at scale. Nothing is changing and we're being lied to. The tools don't work! They are expensive! And high maintenance. Dot com bubble 2.0."

This reflects growing AI skepticism as enterprises face:

Cost explosions (Microsoft banning Claude Code due to token costs)
Unreliable outputs (arXiv imposing one-year bans for AI-generated errors)
Overhyped capabilities (agentic fatigue)

The Counterargument

Others countered:

AI augmentation is real (Gary Tan's 400x productivity with Claude Code)
AI job impact is gradual (~11,000 net U.S. jobs lost monthly, not millions)
AI science is advancing (AlphaFold 3, drug discovery, materials science)

The debate reflects a gap between hype and reality—but also genuine progress amid unrealistic expectations.

The Neuroscience Parallel

Some replies drew parallels to neuroscience, where we:

Understand how neurons work (action potentials, synapses)
Can map brain regions (fMRI, EEG)
Still don't fully understand consciousness, reasoning, or memory

One commenter noted:

"We know how AI works. We don't fully know why it works as well as it does. That's an important distinction."

This mirrors neuroscience:

We know how neurons fire
We don't know why consciousness emerges

Similarly:

We know how transformers compute (matrix multiplications, attention)
We don't know why they generalize so well

The "Practical Results Matter More" Argument

A pragmatic camp argued: Who cares about theory if it works?

One reply stated:

"The real-world deployments showed augmentation, not replacement. Humans plus AI is the winning formula."

This reflects the engineering mindset: prioritize building useful systems over understanding fundamental mechanisms.

Historically, this has worked:

Steam engines powered the Industrial Revolution before we understood thermodynamics
Antibiotics saved millions before we understood molecular biology
Vaccines worked before we understood immunology at the cellular level

But in each case, theory eventually caught up and enabled:

Better engines (internal combustion, turbines)
Better drugs (targeted therapies)
Better vaccines (mRNA vaccines)

Why Theory Matters for AI

Understanding why AI works isn't just academic—it has practical implications.

1. Safety and Alignment

If we don't understand why models produce certain outputs, we can't:

Predict when models will fail
Detect deceptive or harmful behavior
Align models with human values reliably

Anthropic's interpretability research aims to solve this by 2027, but we're not there yet.

2. Efficiency

Understanding why scaling works could help us:

Train smaller models that perform as well as larger ones
Reduce compute costs (currently spiraling out of control)
Design better architectures (moving beyond transformers)

State Space Models and Mixture of Experts architectures are attempts to move beyond transformers, but they're still empirical experiments, not theory-driven designs.

3. Generalization

Understanding why pre-training on internet data leads to general reasoning could help us:

Design better training data
Improve out-of-distribution generalization
Reduce hallucinations and errors

4. Scientific Discovery

A theoretical understanding could:

Accelerate AI progress (rather than trial-and-error scaling)
Unlock new capabilities (analogous to quantum mechanics enabling semiconductors)
Predict limits (what AI can and can't do)

The Current State of AI Theory

So where are we now?

What We Have

Scaling laws that predict performance
Mechanistic interpretability that maps features and circuits
Emergent abilities that correlate with scale
Geometric interpretations of knowledge representation

What We're Missing

A unified theory explaining why transformers work
First-principles understanding of emergent abilities
Predictive models of what capabilities will appear at what scale
Fundamental limits of current architectures

Current Research Directions

2026 breakthroughs include:

Non-ergodic frameworks for emergence
Unified scaling law theories connecting ICL and model size
Circuit-level interventions for targeted control
Geometric theories of knowledge representation

But none of these constitute a complete, first-principles theory.

What Happens Next?

Three scenarios:

Scenario 1: Theory Catches Up

AI research develops a unified theory (like statistical mechanics for thermodynamics) that explains:

Why transformers work
Why scaling laws hold
How emergent abilities appear

This enables theory-driven AI design and predictable capabilities.

Likelihood: Medium. Mechanistic interpretability is making progress, but we're not close to a unified theory yet.

Scenario 2: Empirical Progress Continues Without Theory

AI systems keep improving through:

Scaling (bigger models, more data)
Architectural tweaks (MoE, SSMs, new attention mechanisms)
Empirical optimization (RLHF, better training techniques)

Theory lags behind, but practical results drive adoption.

Likelihood: High. This is the current trajectory.

Scenario 3: Scaling Hits a Wall

Without theoretical understanding, we:

Can't predict when scaling will stop working
Can't design better architectures
Hit diminishing returns on compute investment

AI progress slows dramatically as empirical scaling plateaus.

Likelihood: Low-Medium. Some evidence of diminishing returns emerging, but not a hard wall yet.

What This Means for Developers

If you're building with AI, here's what Lu's observation means:

1. Expect Unpredictability

Without a theory, you can't fully predict when models will:

Fail on edge cases
Hallucinate confidently
Refuse valid requests

Design systems with human oversight and fallbacks.

2. Empirical Testing is Critical

Since theory can't predict behavior, test extensively:

Red-teaming
Adversarial examples
Real-world deployments

3. Stay Updated on Interpretability

Mechanistic interpretability tools are improving. Follow:

Anthropic's research (leading the field)
OpenAI's safety work
Academic conferences (NeurIPS, ICML)

4. Prepare for Paradigm Shifts

If a theoretical breakthrough happens, it could:

Obsolete current architectures (like transformers replacing RNNs)
Unlock new capabilities (like attention unlocked sequence modeling)
Change cost structures (making current systems cheaper or obsolete)

ACM — causal abstraction & LLM reasoning (Jul 7, 2026) — Icard / Geiger; science catching up
Anthropic J-space & J-lens — frontier causal interventions
Anthropic Natural Language Autoencoders — Latest interpretability research
Gary Tan's 400x Productivity with Claude Code — AI augmentation in practice
Sam Altman and Dario Amodei Walk Back AI Jobs Apocalypse — Reality check on AI impact
Agentic Fatigue and the Productivity Paradox — When AI disappoints
Scalable Oversight: RLHF, Constitutional AI, Weak-to-Strong — AI alignment techniques

Conclusion

Mandy Lu's statement—"we still have no satisfying theory for why AI works"—is both alarming and accurate.

We've built systems that:

Pass medical exams
Write production code
Reason through novel problems
Generate photorealistic images

But we can't fully explain why they work.

This isn't just an academic curiosity. Without theory, we:

Can't predict failures
Can't guarantee safety
Can't optimize efficiency
Can't anticipate limits

We're flying blind on the most powerful technology of the 21st century.

The good news: progress is happening. Mechanistic interpretability, scaling law research, and geometric theories are advancing. Anthropic aims to "reliably detect most AI model problems by 2027."

The bad news: we're not there yet. And in the meantime, billions of dollars and critical decisions depend on systems we don't fully understand.

For developers, the lesson is clear: build with humility. AI is powerful, but unpredictable. Test rigorously. Keep humans in the loop. Stay updated on interpretability research.

And watch for the breakthrough—when it comes, it could change everything.

Sources

AI theory and interpretability research evolve rapidly. This analysis reflects the state of knowledge as of June 2026. For the latest research, follow Anthropic Research, OpenAI Research, and leading ML conferences.