Can LLMs Drive Cars or Plan in the Real World? Yann LeCun vs the AV Optimists
Yann LeCun argues LLMs cannot handle continuous physical signals or predict action consequences β after Andrew Gordon Wilson said AVs beat human drivers. Moravec's paradox, the teenager-driving analogy, digital AGI debate, and what comes after tokens.
Are autonomous vehicles proof that AI has solved physical intelligence β or proof that narrow engineering can outrun general models? A July 2026 X thread between Yann LeCun and Andrew Gordon Wilson reopened one of AI's oldest fault lines: what large language models can never do versus what specialized systems already do better than humans.
Wilson conceded physical agents are not broadly competent yet β then pointed at self-driving cars: safer than human drivers under typical conditions. LeCun widened the frame: the problem is not just robots. It is any high-dimensional, continuous, noisy signal β and any agent that cannot predict consequences.
TL;DR β the thread in one table
Voice
Claim
@andrewgwils
Physical agents need new paradigms Β· AVs already beat humans in typical driving
@ylecun
LLMs only handle discrete symbols Β· real-world modalities are out of reach Β· agents need consequence prediction + planning
@CrunchyAI
Teen drivers bring 15 years of world learning + hours of practice
@ylecun
"Thank you, you are making my point"
@CrunchyAI
Digital AGI may arrive via memory + harness without physical competence
@ylecun
Humans are missing the G too β specialized but fast learners Β· optimistic, not about LLMs
@TonyZador
Moravec's paradox wins again
@ylecun
Paradox is 38 years old β every generation forgets it
What Wilson said β and what LeCun heard
Andrew Gordon Wilson (machine learning professor) posted a nuanced yes-and:
Physical agents are not generally competent yet, and certainly new paradigms are needed for significant further advances in AI. But autonomous vehicles are now generally safer than human drivers under typical driving conditions.
That is a narrow-systems argument: decades of LIDAR, maps, simulation, safety cases, and regulatory gates produced one domain where machines statistically win β not a general recipe for humanoid housework or open-world manipulation.
Yann LeCun did not dispute AV statistics directly. He reframed:
It's not merely physical agents, it's anything that deals with something else than sequences of discrete symbols.
Any data modality that is high-dimensional, continuous, and possibly noisy is completely out of reach of current generative models.
That includes pretty much all real-world signals (aside from human language, computer languages and mathematics).
Furthermore, you can't have reliable agents unless they have the ability to predict the consequences of their actions and plan accordingly. LLMs simply don't.
explainx.ai read: Wilson cited engineering success in a closed world (roads, rules, ODD). LeCun cited architectural limits of next-token models (no latent physics, no action-conditioned rollouts). Both can be true.
The symbol world vs the sensor world
LeCun's taxonomy splits AI inputs into two buckets:
Weak β patchy via multimodal fine-tune, not native
Token predictors compress the world into text about the world. That works for Stack Overflow and earnings calls. It does not substitute for a dynamics model that answers: if the gripper closes 2mm more, what happens to the glass?
That is the thesis behind world models β systems that learn environment transitions (Odyssey Starchild-1, NVIDIA Cosmos, Meta V-JEPA, Tencent HY-World) β and why LeCun left Meta's LLM orbit for AMI Labs-style research.
The teenager-driving thread β 15 years, then hours
@CrunchyAI (Luis Rosias) pushed back on any narrative that machines "learn to drive in a few hours like a teenager":
A teenager doesn't learn to drive a car in a few hours. They learn to drive a car with 15 years of in-world learning and a few hours of practice on top of that.
LeCun:
Thank you, you are making my point
The analogy is not anti-AV. It is anti-shortcut narrative:
snippet
Years of passive world model (gravity, objects, social cues, depth)
+
Hours of active skill practice (steering, pedals, mirrors)
=
Competent novice driver
Compare to:
snippet
Pretrain on internet text
+
Few-shot "you are a driver" prompt
=
???
AV stacks spent years on simulation, mapping, and edge-case mining β not "a few hours of GPT fine-tune." The teenager metaphor explains why data efficiency and prior world knowledge matter for embodied tasks.
Digital AGI vs embodied AGI β talking past each other?
Luis Rosias extended the thread in a direction LeCun did not fully engage:
The G in AGI is environment-specific right now.
The environment that LLMs operate in is a very different one than the ones humans or cats generally operate in.
With a few improvements to memory/context management and continual learning, they can easily become what most would consider AGI without being able to understand or take any action in the physical world the way we can.
Replies mocked "just a few improvements" β fair skepticism. Luis doubled down:
We have most of the pieces already⦠At this point it's less of a fundamental technology/model architecture problem and just a harness problem.
An LLM base with intelligent harness will be AGI in digital environment.
This maps to the agentic vs agentive split Eric Xing's paper names: agentic systems (Claude Code, Hermes orchestrators, MCP graphs) derive competence from external scaffolding; agentive systems internalize goals, world models, and learning.
Luis is betting on agentic digital AGI β LeCun is betting that without endogenous consequence models, harnesses hit a ceiling in the physical world and possibly in long-horizon planning.
The Moravec paradox is 38 years old and we still need to remind every new generation of non-physical AI researchers about it.
Moravec's paradox (1988): evolution optimized perception and motor control for millions of years; reasoning is a thin layer on top. AI inverted the difficulty β chess before grasping.
Easy for humans, hard for machines
Hard for humans, easy for machines (historically)
Walking, catching, seeing
Chess, arithmetic, trivia
Folding laundry, opening doors
Syntax, log proofs, token completion
LLMs spectacularly conquered the right column. They did not dissolve the left column β though narrow AVs and factory arms chip away at slices of it.
"Humans are missing the G too"
@das_rdsm asked a sharp question: humans cannot do everything machines do β are humans also not general?
LeCun:
Yes, humans are missing the G.
Humans are quite specialized.
But they are also adaptable and can learn new skills very quickly.
Machines are doing many things better than humans.
That's why they are useful and why we build them.
This defuses both hype and doom. General is not "does everything." It is fast adaptation in new environments β where humans still lead on one-shot manipulation and novel tool use, and machines lead on bandwidth, precision, and tireless repetition.
"Pessimistic" β LeCun's actual position
@SteveGrubbsVXR:"Your pessimism is suffocating."
LeCun, two hours later:
I'm quite optimistic.Just not about LLMs.
That is the headline for 2026 research politics: skepticism toward LLM-as-universal-intelligence is not skepticism toward AI. Capital and talent are bifurcating:
Per-mile fatality rates can beat human averages in geo-fenced, mapped domains
Humanoid robots doing dishes
Redundant sensors + simulation at scale
Open-world LLM agents with hands
Regulatory + engineering culture around failure modes
"AGI in a weekend" from prompt tuning
Waymo, Cruise successors, and Chinese robotaxi fleets are systems engineering victories β not receipts that GPT-5 understands torque.
When evaluating any "physical AI is solved" headline, ask:
Operating design domain β weather, geography, construction zones?
Consequence model β sim before act, or text before act?
Data modality β symbols about driving, or native sensor rollouts?
Planning horizon β milliseconds (lane keep) vs minutes (multi-step chores)?
What would change LeCun's mind?
Not a bigger context window alone. Indicators he'd likely accept as progress:
Action-conditioned world models that roll forward physical state under intervention
Sample-efficient motor learning β robot skills from minutes not millions of trajectories
Unified architectures for vision-audio-touch without tokenizing away geometry
Internal planning loops β not MCP graphs bolted onto a frozen LLM
Until then, the practical split remains:
Digital work β LLM + harness (loop engineering, tools, verification)
Physical work β specialized stacks (AV, drones, factory arms) + emerging world models
FAQ β quick answers
Did LeCun say self-driving cars don't work?
No. He shifted the conversation to LLM limits and general physical intelligence, not a takedown of AV safety stats.
Is "digital AGI" a real thing?
Operationally, many firms already treat autonomous knowledge work as the near-term AGI target. LeCun's camp would call that powerful automation, not intelligence that grounds in physical consequence.
Who is Andrew Gordon Wilson?
ML professor, Gaussian processes and Bayesian deep learning researcher β not an AV vendor advocate. His post was qualified optimism about one domain.
What should builders do Monday morning?
Match architecture to environment: don't prompt your way into robotics; don't build a world model for summarizing PDFs. Use LLMs where symbols dominate; use sim + sensors where physics dominates.
Thread summarized from @ylecun, @andrewgwils, @CrunchyAI, @TonyZador, and replies, July 2026. AV safety statistics vary by operator, geography, and ODD β verify primary sources before policy or investment decisions.