← Blog
explainx / blog

One DNA Test to Replace 15: Long-Read Sequencing and the Future of Rare Disease Diagnosis

Researchers at Radboud University Medical Center published a landmark study in the New England Journal of Medicine showing long-read genome sequencing yields 3% more diagnoses and can replace 15 existing tests — recommending it as the global first-choice diagnostic for rare genetic disorders.

10 min readYash Thakker
GenomicsRare DiseasesMedical ResearchDNA SequencingBiotechnology

MDX restores the committed source plus an HTML comment attribution; plain text bundles the rendered markdown body with the explainx.ai attribution footer.

One DNA Test to Replace 15: Long-Read Sequencing and the Future of Rare Disease Diagnosis

For millions of people living with rare genetic diseases, the diagnostic journey is often measured not in days but in years — sometimes decades. A battery of tests, one after another, each with its own waiting period, its own specialist, and its own chance of coming back inconclusive. An estimated 400 million people worldwide live with rare diseases. For most of them, getting a name for what they have is the beginning of everything: treatment access, prognosis, family planning, community.

A study published this week in the New England Journal of Medicine describes a technology that could fundamentally compress that journey — and in the process make the current diagnostic standard look like assembling a jigsaw puzzle piece by piece when you could have used much larger pieces from the start.

What the Radboud Study Found

Researchers at Radboud University Medical Center in the Netherlands tested long-read genome sequencing against conventional diagnostics in approximately 1,000 patients with suspected rare genetic disease.

The headline result: of 832 patients with rare genetic disease, 160 (19.2%) received a conclusive diagnosis using long-read sequencing — a 3% improvement over what conventional short-read methods delivered in the same cohort.

Three percent sounds modest. In context, it is not. When applied across a patient population of hundreds of millions globally, a 3% improvement in diagnostic yield represents an enormous number of families who get an answer where they otherwise would not.

But the diagnostic yield number is almost secondary to the other finding: the single long-read test can replace up to 15 separate existing diagnostic tests that patients would otherwise undergo sequentially.

Professor Lisenka Vissers, Professor of Translational Genomics at Radboudumc, summarised the recommendation directly: "We showed that the new test yields 3% more diagnoses. It can also replace 15 other tests. We recommend using this test worldwide as the first choice."

The Jigsaw Puzzle Problem with Current Diagnostics

To understand why long-read sequencing is a step change rather than an incremental improvement, it helps to understand what current short-read sequencing actually does.

The standard approach — short-read sequencing — breaks DNA into fragments of roughly 300 base pairs each, sequences those fragments individually, and then uses computational assembly to piece them back together into a coherent picture of the genome. The analogy the researchers use is apt: it is like trying to assemble a jigsaw puzzle from tiny pieces. Each piece contains accurate local information, but reassembling it correctly — especially across complex, repetitive, or structurally variable regions — is computationally hard and error-prone.

Long-read sequencing, using technologies like Oxford Nanopore and PacBio, reads continuous DNA segments of up to 20,000 base pairs. That is roughly 66 times longer per read. Using larger puzzle pieces does not just make assembly easier — it makes previously invisible regions of the genome accessible for the first time.

Specifically, long reads can detect:

  • Structural variants — large rearrangements, duplications, or deletions that span hundreds or thousands of base pairs and are simply invisible to short-read methods
  • Repeat expansions — a class of mutations where a DNA sequence repeats more times than normal, common in neurological rare diseases, that are impossible to size accurately with short reads
  • Complex rearrangements — inversions and translocations that require long-range context to interpret
  • Challenging pathogenic variants — a January 2025 study by the same Radboud team in the American Journal of Human Genetics showed long-read sequencing could identify 93% of pathogenic variants that are difficult or impossible to detect with conventional short reads
Live Bootcamp6 weeks

Complete AI Builder Bootcamp

Claude, Python automation & full-stack — 12 live sessions with Yash Thakker.

View bootcamp

The Complete AI Builder Bootcamp is the best AI development course for learning Claude AI, prompt engineering, Python automation, and full-stack web development. This intensive 6-week live bootcamp teaches you how to build AI-powered applications using Claude Projects, Claude Artifacts, Claude Code, and the complete Claude ecosystem. You'll master prompt engineering techniques, learn to create custom Claude connectors and MCP integrations, build Python automation workflows, develop full-stack websites with AI assistance, and create AI marketing agents.

The bootcamp includes 12 live Zoom sessions with Yash Thakker, founder of AISOLO Technologies and instructor to 350,000+ students. You'll build 8+ portfolio projects including AI playbooks, full-stack note-taking applications, Python automation scripts, marketing agents, and personal portfolio websites. The curriculum covers AI fundamentals, Claude Projects and Artifacts, Claude Co-work, Claude plugins and skills, Claude Code for Python development, full-stack development, AI marketing, and capstone projects.

Students receive 1-year access to all recordings, permanent Discord community access, a certificate of completion, and personalized career guidance. All enrollments include a 7-day money-back guarantee. This is the most comprehensive Claude AI bootcamp available, taking students from zero AI knowledge to expert AI builder in 6 weeks.

The Epigenetic Bonus: 2 in 1

Here is the part of the story that pushes long-read sequencing from "better diagnostic" to "genuinely different class of test."

Human DNA is not just a sequence. It also carries epigenetic modifications — chemical marks attached to the outside of the DNA double helix that do not change the underlying sequence but dramatically affect which genes are active. These marks — methylation patterns being the most studied — can switch genes on or off, and some rare disorders are caused not by sequence mutations but by aberrant epigenetic states.

With current short-read diagnostics, detecting these modifications requires entirely separate, specialised tests run after the initial sequencing. They add time, cost, and complexity to an already lengthy diagnostic process.

Long-read sequencing captures methylation and other epigenetic marks as part of the same sequencing run. The DNA's chemical state is read simultaneously with its sequence — no additional test required.

Professor Christian Gilissen, Professor of Genome Bioinformatics at Radboudumc, described this as a built-in advantage: "With current diagnostics, this requires additional specialized tests, but with long reads we capture these modifications as a bonus — 2 in 1."

For clinical practice, this matters in two ways. First, it catches a category of disorders that short-read sequencing would not even flag as candidates for epigenetic investigation. Second, it removes the conditional testing logic — the "run this test, and if it comes back negative, run that other test" cascade that currently defines rare disease diagnostics and adds months to diagnosis timelines.

The Scale of the Problem This Addresses

The context for why this matters so much: rare diseases are not actually rare in aggregate.

  • More than 7,000 distinct rare diseases have been identified
  • Up to 400 million people worldwide live with one of them
  • 80% have a genetic cause
  • The average time to diagnosis is several years, with many patients spending a decade or more in diagnostic limbo
  • For paediatric patients, delayed diagnosis often means delayed or absent treatment during critical developmental windows

The emotional and financial cost of the diagnostic odyssey is well documented. Patients undergo unnecessary procedures. Families receive incorrect diagnoses and potentially harmful treatments targeted at the wrong condition. Some never receive a diagnosis at all.

Replacing 15 tests with one — while simultaneously improving diagnostic yield — compresses the odyssey. The first test is now likely to be the definitive one rather than the first in a long queue.

What Still Needs to Happen

The Radboud team is clear that the 19.2% diagnostic rate is a floor, not a ceiling. Professor Alexander Hoischen, Professor of Genomic Technologies at Radboudumc, noted the expected trajectory: "Thanks to long reads, we obtain an even more complete view of DNA and can detect complex and hard-to-find abnormalities. We then link these to specific conditions. In this way, our knowledge grows and we can make more diagnoses."

This points to the second driver of future improvement: the variant databases. A long-read sequencer can detect a structural variant that a short-read machine would miss — but the clinical interpretation of that variant requires matching it against a database of known pathogenic variants. Those databases are built from clinical experience. As more patients are sequenced with long-read methods and more variants are linked to specific conditions, the diagnostic yield will rise even in the absence of any further technical improvement to the sequencing itself.

There are also practical barriers to universal adoption. Long-read sequencing equipment remains more expensive than short-read infrastructure, and the computational pipelines for interpreting long-read data are newer and require specialised expertise. Healthcare systems that have invested heavily in short-read infrastructure will face transition costs. Regulatory approvals vary by country.

None of these are arguments against adoption — the Radboud recommendation stands. They are acknowledgments that the path from published recommendation to global standard of care involves health system economics, not just scientific evidence.

Where AI Fits In — and Why It Changes the Trajectory

The Radboud study is fundamentally a sequencing hardware story. But the reason the diagnostic yield will keep rising — as Hoischen explicitly predicted — is a software and AI story.

The Variant Interpretation Bottleneck

Detecting a structural variant with a long-read sequencer is step one. Knowing what that variant means for a patient is step two — and step two requires matching the detected variant against a database of known pathogenic variants linked to specific conditions.

Those databases are built from annotated clinical cases. As more patients are sequenced with long-read methods and more variants are linked to specific conditions — through machine learning models trained on growing genomic datasets — the interpretation layer gets better even without any further improvement to the sequencing hardware.

This is the same dynamic that Anthropic Science researchers identified when building VirBench, an evaluation framework for AI agents performing biological database queries. Their key finding: AI reasoning alone struggles with biological data that was built for humans clicking through browsers, not for agents querying programmatically. Deterministic, structured access to curated databases transformed accuracy from 17% to 99.7% on viral queries. The same principle applies to variant interpretation: reliable curated data is the bottleneck, and AI models are only as good as the structured datasets they can access.

Long-read sequencing is building exactly that curated dataset — at scale, with a completeness that short-read methods could not provide.

Biohub's $500M Bet on the Same Layer

The Biohub Virtual Biology Initiative, announced in April 2026, pledged $500 million toward open multimodal cell data — combining genomic, transcriptomic, proteomic, and imaging data in a shared, machine-readable format. The Allen Institute, the Broad Institute, Human Cell Atlas, and NVIDIA are among the partners.

The initiative's explicit goal is to build foundation models of the cell: AI systems trained on enough biological data to make accurate predictions about how genetic variants affect cellular function and disease outcomes. That is, in structural terms, the same problem Radboud's diagnostic yield depends on — linking newly detected variants to known conditions.

As long-read sequencing generates richer variant data and Biohub-style initiatives build the training sets, the two tracks converge: better detection plus better interpretation compounds into meaningfully higher diagnostic yields.

Dario Amodei's Biomedical Case

It is worth noting that the biomedical AI acceleration argument is not just a research aspiration. Anthropic CEO Dario Amodei, in his June 2026 policy essay on the AI exponential, listed accelerating biomedical innovation as one of five areas where AI policy urgently needs attention. His specific prediction: AI will greatly increase the rate of drug candidates entering the pipeline, improve effect sizes, develop therapies for previously untreatable diseases, and generate entirely new therapy categories.

The Radboud finding is a concrete step in that direction — not through drug discovery, but through the diagnostic layer that drug development depends on. You cannot develop therapies for rare diseases you cannot identify. Better diagnostics are the prerequisite for better treatments.

Why This Matters Beyond Rare Disease

The long-read sequencing story connects to a broader shift in how genomics is being used to understand human biology — one that the Radboud team alludes to in framing this as the beginning of an accumulating knowledge base rather than a fixed diagnostic test.

The same long-read technologies enabling the Radboud diagnostic advance are also the tools making previously inaccessible regions of the genome visible for the first time. Research this week showed that "junk DNA" — the repetitive, structurally complex regions that short-read sequencing could never properly sequence — may play significant roles in cancer. Long-read sequencing made those regions accessible to study.

The epigenetic data captured as a bonus in the Radboud test is also the same category of biological information driving new therapeutic approaches — including reprogramming techniques now entering early human trials for conditions like glaucoma.

The infrastructure built to diagnose rare diseases with long reads is, in parallel, the infrastructure that will tell us things about human biology that were simply inaccessible to investigation with the tools we had before.

Related reading: Biohub Virtual Biology Initiative and Mayo REDMOD · Anthropic VirBench: Why Biological Agents Need Deterministic Tools · Dario Amodei on AI and Biomedical Acceleration


The study "Long-Read Genome Sequencing for the Genetic Diagnosis of Rare Diseases" was published in the New England Journal of Medicine in June 2026. The research team is at Radboud University Medical Center, Nijmegen, Netherlands.

Related posts