explainx.ainewsletter3.4k
trending🔥loopsskills
pricing
workshops ↗
explainx.ai

Learn to lead teams that combine humans and agents. Platform access, live workshops, bootcamps, and 50+ courses — plus skills, tools, and MCP to practice what you learn.

follow us

custom AI agents

[email protected]

get started

Join · $29/mo

learn

start for freepathwaysworkshopsbootcampscoursescertificationscertification testsexplainx universitycorporate trainingfacilitatorshackathonslearn skills & mcp

discover

skillstoolsagentsmcp serversdesignsllmsagiranks

content

releasesvisionmissionaboutcommunityteamcareersresourcespromptsgenerators hubgenerator SEO hubprompt templatesprompt guidesblogfor LLMsdemo

Sister Products

Infloq

Infloq

Influencer marketing

BgBlur

BgBlur

Privacy-first blur

Olly Social

Olly Social

Social AI copilot

Ceptory

Ceptory

Video intelligence

BgRemover

BgRemover

Background removal

newsletter · weekly

Get AI news, tools, and insights in your inbox.

contactsupportprivacytermsdata rightssubmission guidelines

© 2026 AISOLO Technologies Pvt Ltd

← Back to blog

explainx / blog

Will AI replace mathematicians? What IEEE’s “Big Mathematics” debate means for proofs, Lean, and your career

IEEE Spectrum’s June 2026 feature maps three futures for AI in math—tool, partner, or oracle. Here is what Aletheia, IMO gold, Lean formalization, and Terence Tao’s “Big Mathematics” actually change for researchers and students.

Jun 27, 2026·13 min read·Yash Thakker
MathematicsAI ResearchIEEE SpectrumProof AssistantsLeanTerence TaoEducation
Will AI replace mathematicians? What IEEE’s “Big Mathematics” debate means for proofs, Lean, and your career

TL;DR — the questions people actually ask

  • Will AI replace mathematicians? Not wholesale—but it is already competitive on some abstract reasoning tasks, and the kind of work valued is shifting.
  • What changed in 2025–2026? IMO gold, Aletheia’s research-level output, OpenAI’s autonomous disproof of a major geometry conjecture, and LLMs accelerating Lean formalization.
  • What is Terence Tao’s “Big Mathematics”? Humans plus machines at scale, with formal verification as the trust layer.
  • Should students use AI for homework? The community’s worry: yes, they will—and they may skip building the intuition that makes someone a mathematician, not just an answer-fetcher.
  • Does “Lean compiles” settle everything? No—you still need humans to specify the right theorem and interpret whether the artifact is worth keeping.
  • Where is the primary source? Benjamin Skuse’s feature in IEEE Spectrum (June 25, 2026)—plus the skeptical thread that followed on Hacker News when the piece hit the front page.

Why this IEEE piece landed like a bombshell

Benjamin Skuse opens with a confession many applied-math Ph.D.s will recognize: re-read your thesis a decade later and realize a modern LLM-assisted workflow might finish it in days. Pure mathematicians in the next desk over looked idle for years—and some never published. Only later did he understand they were not performing intelligence or masochism; they were pursuing a slow, private joy when a hard idea clicks.

That joy—Jeremy Avigad at Carnegie Mellon calls it neither purely aesthetic nor athletic, but the feeling when “you’ve been thinking long and hard about something complex, difficult, and then—all of a sudden—it just comes together”—is what IEEE asks AI to threaten.

The article is not a product launch. It is a status report from the Heidelberg Laureate Forum (September 2025), where young researchers heard elders describe futures in which superhuman AI forms conjectures, searches spaces, proves theorems, verifies proofs, and generalizes—without humans in the loop. Yang-Hui He’s line stuck: human mathematicians could become “priests to oracles.” Students in the hall, Skuse reports, looked devastated.

If you ship AI agents for a living, this is the same existential question wearing a chalkboard costume: When the machine can do the hard part, what is left for people—and does “left” still matter?


What AI has actually done in mathematics (not hype)

Computation helping math is old. The four-color theorem (1976) used a computer to check 1,936 cases humans could not realistically audit—controversial, but the human still proposed the strategy.

What shifted in the last two years is reasoning depth on open problems, not just brute enumeration.

Competition math at IMO gold level

Last summer, systems from Google DeepMind and OpenAI reached performance comparable to the world’s strongest high-school competitors at the International Mathematical Olympiad—six brutal problems across algebra, combinatorics, geometry, and number theory. That is not “research mathematics,” but it ended the era of dismissing LLMs as stochastic parrots that regurgitate textbook exercises.

Aletheia and Ph.D.-level research output

Earlier in 2026, Google DeepMind’s experimental system Aletheia autonomously produced publishable Ph.D.-level results—work on structure constants in arithmetic geometry that is obscure to outsiders but significant because of the reasoning chain, not the headline theorem.

OpenAI and combinatorial geometry

Skuse ties this thread to a result explainx.ai covered in depth: a general-purpose OpenAI reasoning model disproved an important conjecture in combinatorial geometry—the same research line as the planar unit distance / Erdős problem. Fields medalist Tim Gowers and other external mathematicians treated it as a milestone: independent, original, sophisticated thinking from a model not trained only on that subfield.

Formalization: Gauss, Viazovska, and Lean

Another axis is proof assistants—Lean, Isabelle, Rocq—languages that verify mathematics step-by-step. Formalizing an informal proof used to be a multi-year slog.

In February 2026, Math, Inc.’s reasoning agent Gauss helped formalize Maryna Viazovska’s Fields Medal-winning sphere-packing proof in Lean in days, then autonomously formalized the harder 24-dimensional case in two weeks. IEEE walks through Euclid’s infinitude-of-primes argument in human prose versus Lean code: every hidden “clearly” becomes an explicit theorem invocation.

That matters for collaboration mechanics, not just vanity. Terence Tao’s point (below): when verification is mechanical, you can take ideas from strangers—or machines—without reputation as the gatekeeper.

But the Gauss milestone also triggered the sharpest public fight in formal mathematics this year—not about whether Lean accepted the proof, but about what kind of artifact counts as mathematics.

The Mathlib vs “verified blob” fight

David Bessis’s “The fall of the theorem economy” captures the backlash. Mathlib—Lean’s dominant library—is human-curated formal math with clean abstractions other proofs can import. Gauss’s Viazovska formalization, by contrast, landed as a ~200,000-line codebase that verifies but, critics argue, exposes no intelligible interface: who merges an unaudited machine-generated blob into the shared trunk of global science?

Alex Kontorovich’s account (cited in that essay) frames the worry bluntly: autoformalization without Mathlib’s scaffolding produces “mathslop”—results known to be correct via proofs no human has understood—sitting beside the maintainable layer humans actually build on.

The counterargument in the ensuing debate is compilation-shaped: if Lean’s kernel accepts the proof, logical validity is mechanical, like trusting that a C++ binary compiled without treating every opcode as required reading. Correctness and comprehensibility decouple. We use anesthesia, gravity, and x86 without full mechanistic understanding; perhaps mathematics can too.

Both sides have a point:

  • Skeptics ask what you do with a verified orphan—refactor cost, peer-review backlog, and whether funders confuse “formalized” with “understood.” The human-led sphere-packing formalization project (arXiv, April 2026) explicitly wanted a readable, maintainable, reusable artifact—not only a checkmark.
  • Optimists note AI’s main win is often search, not verification: finding the creative route through a solution space humans exhausted. Formal checkers then handle the tedious expansion—projects like Flyspeck (formalizing the Kepler conjecture) already showed that verification alone can consume a career.

Eugene Wigner’s line from the computer-assisted era still haunts both camps: “It is nice to know that the computer understands the problem, but I would like to understand the problem, too.”


What skeptics on Hacker News got right (without the hype)

When IEEE’s piece reached HN, the comment thread was less “AI will replace everyone tomorrow” and more epistemology with commit hashes. Useful corrections to headline narratives:

“Lean compiles or it doesn’t” — half true

If an LLM outputs Lean and the typechecker accepts it, the inference chain is valid for the formal statement encoded—every step is a mechanical check, not a seminar-room “clearly.” That is genuinely stronger than a PDF that looks like a proof.

The catch: you must still verify the formal statement matches the mathematical claim you care about. Mis-stated definitions, wrong imports, or vacuous lemmas are the formal analog of assert True in a test suite: green CI, wrong product. Lean compiles answers “is this derivation valid?” not “did we formalize the right problem?”

You still need Terence Tao—or someone close

A recurring theme: oversight skill scales with the model. Spotting a subtle error in a 300-step LLM proof sketch—or knowing when to reject a plausible-looking lemma—requires the same depth that makes you useful reviewing senior engineers’ code. That cuts against “mathematicians obsolete” narratives: the oracle future still needs priests who can read the entrails, even if the entrails are Lean goals.

Tao’s actual workflow (often misread) is interactive: humans and AI alternate on the same Lean file; the proof assistant gives feedback to both. The AI is not a solo author you blindly publish.

Mochizuki reminds us verification culture ≠ instant certainty

Shinichi Mochizuki’s claimed abc conjecture proof sat for years before external experts found a fatal flaw in 2018—despite serious human effort, not because Lean was unavailable but because novel frameworks resist quick audit. Formal tools help most when definitions already live in a shared library; they do not magically dissolve social trust problems.

AI is still mostly at the “you missed a trick” stage

Several commenters noted today’s wins resemble applying known machinery to problems humans almost had—OpenAI’s Erdős geometry result connected existing algebraic number theory to a geometric question, rather than inventing a wholly new field. Inductive leaps that create new ideas—not just new proofs of old conjectures—remain where human mathematicians expect to lead for a while.

Why not all three futures at once?

IEEE’s tool / partner / oracle table is a teaching device, not a forced choice. You might use AI as a calculator for algebra, a collaborator on Lean formalization, and still treat a Millennium Problem solution as an oracle you merely want true—even if you never understand the proof. The “answer is 42” anxiety is real: Hard Theorem platforms—results built atop lemmas you cannot reconstruct yourself—already exist in human mathematics, like using the four-color theorem without hand-checking every case.

Access and review queues get worse before they get better

Centralization worries (proprietary models, token burn, rich-school vs poor-school access) echo IEEE’s elitism section—and HN’s blunt “always has been” replies about inequality do not make the new layer harmless. If every AI draft proof joins the peer-review pile, verification throughput becomes the bottleneck, not idea generation.

For benchmark context on how labs measure this progress—and where gaps remain—see our AI benchmarks guide and FrontierMath discussion in the GPT-5.6 preview breakdown.


Three futures: tool, partner, oracle

IEEE distills the community split into a table worth memorizing:

AI as toolAI as partnerAI as oracle
Role of AIAssistantCollaboratorAutonomous researcher
What matters mostHuman understandingShared discoveryAnswers
Human fear levelLowMixedHigh

Tool camp (Akshay Venkatesh, Maia Fraser): Mathematics is partly a social technology for agreement—numbers mean the same thing to everyone. The struggle to understand is not wasted effort; it is the point. An AI proof of a stubborn conjecture is useful information, Fraser argues, but the open problem of finding an elegant human proof remains valuable even if none exists.

Partner camp (Terence Tao): Welcome “Big Mathematics”—massive, decentralized projects where creative humans set direction and AI chews through technical work. Tao already collaborates online with people he has never met; he notes that in the future he may not know whether a collaborator is human.

Oracle camp (pragmatists, some laureates): For Millennium Prize Problems and similar, many mathematicians “would sell their soul for the solution,” Jeremy Avigad jokes. If AI gets there first, fine—as long as the answer is true.

Most practitioners IEEE quotes do not want full replacement. The dread at Heidelberg was about the oracle path becoming default—funders, students, and society treating the journey as optional.

One HN nuance worth keeping: the “priests to oracles” metaphor may misfire. Delphi’s interpreters were embedded in politics and subjectivity; mathematics aspires to objective agreement. If humans become interpreters, the job looks less like mystic ritual and more like technical due diligence—still human, still skilled, still not obsolete, but very different from the romantic lone-genius image.


Terence Tao on formal verification as the trust layer

Tao’s version of partnership depends on formalization. Opening a collaborative proof project without safeguards, he says, “would just be a disaster.” But mathematics uniquely allows a verification layer: checked proofs filter rubbish.

That rhymes with how serious AI engineering teams treat evals and verifiers—you do not trust an agent because it sounds confident; you trust a pipeline that checks outputs. explainx.ai’s loop engineering for coding agents is the software analogue: humans set intent; machines iterate; gates catch failure.

For math, Lean is the gate. For your repo, it might be tests, linters, or proof-carrying specifications. The pattern is the same: scale collaboration by automating verification, not by automating trust in charisma.


Risks IEEE highlights (and why developers should care)

Access and elitism

Historically, a mathematician needed paper, pen, and time. If frontier progress requires proprietary models at DeepMind or OpenAI scale, math risks becoming an activity of well-funded labs—a concern parallel to closed-source frontier models vs local alternatives in software.

Motivation and intellectual atrophy

Venkatesh asks a personal question: if you spent years slowly understanding a proof and a computer could do large chunks overnight, will you still spend the years? Worse for students who can jump straight to answers and never build the subconscious intuition Maia Fraser describes—starting from a fuzzy sense that something should be true and refining it into rigor.

That is not abstract. It is the same debate as letting Copilot write your first Rust program before you understand ownership—or using an agent to skip system design. The tool is not evil; skipping the struggle has a compounding cost.

Publication and norms

The community is responding with essays, workshops, and journal debates—plus institutional guidelines on AI in research and publication. Akshay Venkatesh’s “Mathematicians in the Age of AI” essay (arXiv, March 2026) urges the field to own the technology rather than resist it, while still teaching core intuition.


So is AI “sucking the soul out of math”?

Skuse’s closing is nuanced. In one sense AI does the opposite: it forces the profession to articulate why mathematics exists—beauty, shared understanding, better problem-solving in ordinary life (Jessica Randall’s point that math trained her mind for everything else).

In another sense, the practice is reshaping in ways that may not reverse: more formal proofs, more machine-generated lemmas, more collaborations where humans never read every line.

For builders on explainx.ai, the transferable lesson is not “learn Lean tomorrow.” It is:

  1. Separate answers from understanding. Useful in math, essential in agentic systems that can hallucinate convincingly.
  2. Invest in verification layers. Tao’s Lean; your tests, types, and eval harnesses.
  3. Pick your future intentionally. Tool, partner, or oracle—IEEE’s table applies to how you use AI in engineering, not only how Fields medalists use it in geometry.

What to watch next

  • Formalization velocity: If LLMs keep compressing Lean work from years to weeks, expect large public math projects (similar in spirit to Polymath, but machine-augmented).
  • Mathlib integration vs orphan repos: Whether autoformalized proofs get refactored into reusable libraries—or stay as verified one-offs humans cite but never read.
  • Open vs closed capability: Whether publishable math stays concentrated in a few labs—or whether open models catch up on reasoning benchmarks like FrontierMath Tier 4.
  • Education policy: University honor codes and olympiad rules are already adapting; the first generation raised on “AI did my proof sketch” will expose whether intuition atrophies or shifts form.
  • Peer review: OpenAI’s geometry result and Aletheia’s output still move through human verification cultures built for PDFs and seminars—not agent logs. Journal rules also still exclude LLMs as authors, which is a separate question from proof quality.
  • Elegance vs brute force: A quiet bet in the skeptical thread: remaining open problems may yield correct but ugly proofs—verified blobs with no short human story.

Bottom line

Benjamin Skuse’s IEEE Spectrum feature (June 25, 2026) is the clearest mainstream synthesis yet of mathematicians’ motivation crisis and collaboration opportunity in the AI era. AI is not yet a universal replacement for human mathematicians—but it is no longer confined to homework help or four-color-style case crunching.

Whether you become a priest to an oracle, a partner in Big Mathematics, or a stubborn humanist insisting on beautiful proofs is partly personal—and partly structural, depending on what funders, journals, and classrooms reward.

If you build AI systems for a living, watch math closely. It is the field where correctness is checkable, progress is measurable, and the social contract around human effort is being renegotiated in public—with better rigor than most tech keynote rhetoric.

For the Erdős geometry milestone specifically, read our full breakdown of OpenAI’s planar unit distance result. For how labs score reasoning progress more broadly, see AI benchmarks in 2026.

Related posts

May 22, 2026

OpenAI solves 80-year Erdős geometry problem: AI autonomously disproves the square grid conjecture (May 2026)

For nearly 80 years, mathematicians believed square grids were optimal for maximizing unit-distance pairs. An OpenAI model just proved them wrong—using Golod-Shafarevich theory and infinite class field towers to construct configurations with n^(1+δ) pairs. First autonomous AI solution to a central math problem. Fields medalist Tim Gowers calls it 'a milestone in AI mathematics.'

Jun 26, 2026

Anthropic Economic Index (June 2026): Cadences, Artifacts, and What Claude Users Actually Believe About AI at Work

Claude usage mirrors the workweek, spikes on tax day, and shifts to recipes at 6 p.m. Anthropic's first user survey finds over one-third expect AI to handle most of their work within a year — yet the people who delegate the most feel the most optimistic about pay and job security. Here is what the Cadences report means for builders, managers, and anyone betting on agentic AI.

Jun 25, 2026

Krea 2 Technical Report: Open-Weights Image Foundation Model Built for Creative Exploration

Krea 2 lands in the top 10 of the Artificial Analysis text-to-image leaderboard and 2nd among independent labs. The 58-page technical report details how they got there: no synthetic training data, a PostgreSQL-backed data warehouse they call krablets, iREPA-accelerated pretraining, a custom DPO variant called STPO to prevent policy divergence, and an RL stage with four reward signals including a dedicated artifact detector.