What is long-read genome sequencing?

Long-read genome sequencing is a DNA analysis technique that reads continuous segments of up to 20,000 base pairs at a time, compared to the current standard (short-read sequencing) which reads about 300 base pairs per fragment. Reading longer continuous stretches — like using larger jigsaw puzzle pieces — makes it far easier to detect complex structural variants, repeat expansions, and rearrangements that short reads miss entirely. Technologies like Oxford Nanopore and PacBio enable this approach.

What did the Radboud study find?

In approximately 1,000 patients with suspected rare genetic disease, long-read genome sequencing delivered a conclusive diagnosis in 160 of 832 patients (19.2%) — a 3% improvement over conventional short-read methods. The study, published in the New England Journal of Medicine in June 2026, also showed the single test can replace up to 15 separate existing diagnostic tests, and simultaneously captures epigenetic modifications that normally require additional specialised tests.

Why does 3% more diagnoses matter?

For rare diseases, where patients often wait years for a diagnosis, a 3% improvement across the global rare disease population represents hundreds of thousands of additional families getting answers. With 400 million people worldwide living with rare diseases, incremental diagnostic yield gains translate directly to life-changing outcomes — treatment access, family planning clarity, and connections to others with the same condition.

What is the epigenetic bonus from long-read sequencing?

Epigenetic modifications are chemical marks on the outside of DNA that can switch genes on or off. Some rare disorders are caused by these marks rather than by changes in the DNA sequence itself. Detecting them normally requires separate, specialised tests. Long-read sequencing captures these modifications automatically as part of the same run — what the Radboud researchers call a "2 in 1" benefit.

Could this test work for all rare diseases?

Not yet universally, but the Radboud team expects diagnostic yield to continue rising. Alexander Hoischen noted that as more variants are linked to specific conditions through long-read data, the knowledge base grows and more diagnoses become possible. The test is currently recommended as a first-choice starting point — replacing the multi-test diagnostic odyssey patients currently undergo.

Long-Read DNA Test Replaces 15 Others for Rare Diseases | explainx.ai Blog

explainx.ainewsletter3.5k

workshops ↗

Long-Read DNA Test Replaces 15 Others for Rare Diseases | explainx.ai Blog | explainx.ai

For millions of people living with rare genetic diseases, the diagnostic journey is often measured not in days but in years — sometimes decades. A battery of tests, one after another, each with its own waiting period, its own specialist, and its own chance of coming back inconclusive. An estimated 400 million people worldwide live with rare diseases. For most of them, getting a name for what they have is the beginning of everything: treatment access, prognosis, family planning, community.

A study published this week in the New England Journal of Medicine describes a technology that could fundamentally compress that journey — and in the process make the current diagnostic standard look like assembling a jigsaw puzzle piece by piece when you could have used much larger pieces from the start.

What the Radboud Study Found

Researchers at Radboud University Medical Center in the Netherlands tested long-read genome sequencing against conventional diagnostics in approximately 1,000 patients with suspected rare genetic disease.

The headline result: of 832 patients with rare genetic disease, 160 (19.2%) received a conclusive diagnosis using long-read sequencing — a 3% improvement over what conventional short-read methods delivered in the same cohort.

Three percent sounds modest. In context, it is not. When applied across a patient population of hundreds of millions globally, a 3% improvement in diagnostic yield represents an enormous number of families who get an answer where they otherwise would not.

But the diagnostic yield number is almost secondary to the other finding: the single long-read test can replace up to 15 separate existing diagnostic tests that patients would otherwise undergo sequentially.

Professor Lisenka Vissers, Professor of Translational Genomics at Radboudumc, summarised the recommendation directly: "We showed that the new test yields 3% more diagnoses. It can also replace 15 other tests. We recommend using this test worldwide as the first choice."

The Jigsaw Puzzle Problem with Current Diagnostics

To understand why long-read sequencing is a step change rather than an incremental improvement, it helps to understand what current short-read sequencing actually does.

The standard approach — short-read sequencing — breaks DNA into fragments of roughly 300 base pairs each, sequences those fragments individually, and then uses computational assembly to piece them back together into a coherent picture of the genome. The analogy the researchers use is apt: it is like trying to assemble a jigsaw puzzle from tiny pieces. Each piece contains accurate local information, but reassembling it correctly — especially across complex, repetitive, or structurally variable regions — is computationally hard and error-prone.

Long-read sequencing, using technologies like Oxford Nanopore and PacBio, reads continuous DNA segments of up to 20,000 base pairs. That is roughly 66 times longer per read. Using larger puzzle pieces does not just make assembly easier — it makes previously invisible regions of the genome accessible for the first time.

Specifically, long reads can detect:

Structural variants — large rearrangements, duplications, or deletions that span hundreds or thousands of base pairs and are simply invisible to short-read methods
Repeat expansions — a class of mutations where a DNA sequence repeats more times than normal, common in neurological rare diseases, that are impossible to size accurately with short reads
Complex rearrangements — inversions and translocations that require long-range context to interpret
Challenging pathogenic variants — a January 2025 study by the same Radboud team in the American Journal of Human Genetics showed long-read sequencing could identify 93% of pathogenic variants that are difficult or impossible to detect with conventional short reads

The Epigenetic Bonus: 2 in 1

Here is the part of the story that pushes long-read sequencing from "better diagnostic" to "genuinely different class of test."

Human DNA is not just a sequence. It also carries epigenetic modifications — chemical marks attached to the outside of the DNA double helix that do not change the underlying sequence but dramatically affect which genes are active. These marks — methylation patterns being the most studied — can switch genes on or off, and some rare disorders are caused not by sequence mutations but by aberrant epigenetic states.

With current short-read diagnostics, detecting these modifications requires entirely separate, specialised tests run after the initial sequencing. They add time, cost, and complexity to an already lengthy diagnostic process.

Long-read sequencing captures methylation and other epigenetic marks as part of the same sequencing run. The DNA's chemical state is read simultaneously with its sequence — no additional test required.

Professor Christian Gilissen, Professor of Genome Bioinformatics at Radboudumc, described this as a built-in advantage: "With current diagnostics, this requires additional specialized tests, but with long reads we capture these modifications as a bonus — 2 in 1."

For clinical practice, this matters in two ways. First, it catches a category of disorders that short-read sequencing would not even flag as candidates for epigenetic investigation. Second, it removes the conditional testing logic — the "run this test, and if it comes back negative, run that other test" cascade that currently defines rare disease diagnostics and adds months to diagnosis timelines.

The Scale of the Problem This Addresses

The context for why this matters so much: rare diseases are not actually rare in aggregate.

More than 7,000 distinct rare diseases have been identified
Up to 400 million people worldwide live with one of them
80% have a genetic cause
The average time to diagnosis is several years, with many patients spending a decade or more in diagnostic limbo
For paediatric patients, delayed diagnosis often means delayed or absent treatment during critical developmental windows

The emotional and financial cost of the diagnostic odyssey is well documented. Patients undergo unnecessary procedures. Families receive incorrect diagnoses and potentially harmful treatments targeted at the wrong condition. Some never receive a diagnosis at all.

Replacing 15 tests with one — while simultaneously improving diagnostic yield — compresses the odyssey. The first test is now likely to be the definitive one rather than the first in a long queue.

What Still Needs to Happen

The Radboud team is clear that the 19.2% diagnostic rate is a floor, not a ceiling. Professor Alexander Hoischen, Professor of Genomic Technologies at Radboudumc, noted the expected trajectory: "Thanks to long reads, we obtain an even more complete view of DNA and can detect complex and hard-to-find abnormalities. We then link these to specific conditions. In this way, our knowledge grows and we can make more diagnoses."

This points to the second driver of future improvement: the variant databases. A long-read sequencer can detect a structural variant that a short-read machine would miss — but the clinical interpretation of that variant requires matching it against a database of known pathogenic variants. Those databases are built from clinical experience. As more patients are sequenced with long-read methods and more variants are linked to specific conditions, the diagnostic yield will rise even in the absence of any further technical improvement to the sequencing itself.

There are also practical barriers to universal adoption. Long-read sequencing equipment remains more expensive than short-read infrastructure, and the computational pipelines for interpreting long-read data are newer and require specialised expertise. Healthcare systems that have invested heavily in short-read infrastructure will face transition costs. Regulatory approvals vary by country.

None of these are arguments against adoption — the Radboud recommendation stands. They are acknowledgments that the path from published recommendation to global standard of care involves health system economics, not just scientific evidence.

Where AI Fits In — and Why It Changes the Trajectory

The Radboud study is fundamentally a sequencing hardware story. But the reason the diagnostic yield will keep rising — as Hoischen explicitly predicted — is a software and AI story.

The Variant Interpretation Bottleneck

Detecting a structural variant with a long-read sequencer is step one. Knowing what that variant means for a patient is step two — and step two requires matching the detected variant against a database of known pathogenic variants linked to specific conditions.

Those databases are built from annotated clinical cases. As more patients are sequenced with long-read methods and more variants are linked to specific conditions — through machine learning models trained on growing genomic datasets — the interpretation layer gets better even without any further improvement to the sequencing hardware.

This is the same dynamic that Anthropic Science researchers identified when building VirBench, an evaluation framework for AI agents performing biological database queries. Their key finding: AI reasoning alone struggles with biological data that was built for humans clicking through browsers, not for agents querying programmatically. Deterministic, structured access to curated databases transformed accuracy from 17% to 99.7% on viral queries. The same principle applies to variant interpretation: reliable curated data is the bottleneck, and AI models are only as good as the structured datasets they can access.

Long-read sequencing is building exactly that curated dataset — at scale, with a completeness that short-read methods could not provide.

Biohub's $500M Bet on the Same Layer

The Biohub Virtual Biology Initiative, announced in April 2026, pledged $500 million toward open multimodal cell data — combining genomic, transcriptomic, proteomic, and imaging data in a shared, machine-readable format. The Allen Institute, the Broad Institute, Human Cell Atlas, and NVIDIA are among the partners.

The initiative's explicit goal is to build foundation models of the cell: AI systems trained on enough biological data to make accurate predictions about how genetic variants affect cellular function and disease outcomes. That is, in structural terms, the same problem Radboud's diagnostic yield depends on — linking newly detected variants to known conditions.

As long-read sequencing generates richer variant data and Biohub-style initiatives build the training sets, the two tracks converge: better detection plus better interpretation compounds into meaningfully higher diagnostic yields.

Dario Amodei's Biomedical Case

It is worth noting that the biomedical AI acceleration argument is not just a research aspiration. Anthropic CEO Dario Amodei, in his June 2026 policy essay on the AI exponential, listed accelerating biomedical innovation as one of five areas where AI policy urgently needs attention. His specific prediction: AI will greatly increase the rate of drug candidates entering the pipeline, improve effect sizes, develop therapies for previously untreatable diseases, and generate entirely new therapy categories.

The Radboud finding is a concrete step in that direction — not through drug discovery, but through the diagnostic layer that drug development depends on. You cannot develop therapies for rare diseases you cannot identify. Better diagnostics are the prerequisite for better treatments.

Why This Matters Beyond Rare Disease

The long-read sequencing story connects to a broader shift in how genomics is being used to understand human biology — one that the Radboud team alludes to in framing this as the beginning of an accumulating knowledge base rather than a fixed diagnostic test.

The same long-read technologies enabling the Radboud diagnostic advance are also the tools making previously inaccessible regions of the genome visible for the first time. Research this week showed that "junk DNA" — the repetitive, structurally complex regions that short-read sequencing could never properly sequence — may play significant roles in cancer. Long-read sequencing made those regions accessible to study.

The epigenetic data captured as a bonus in the Radboud test is also the same category of biological information driving new therapeutic approaches — including reprogramming techniques now entering early human trials for conditions like glaucoma.

The infrastructure built to diagnose rare diseases with long reads is, in parallel, the infrastructure that will tell us things about human biology that were simply inaccessible to investigation with the tools we had before.

The study "Long-Read Genome Sequencing for the Genetic Diagnosis of Rare Diseases" was published in the New England Journal of Medicine in June 2026. The research team is at Radboud University Medical Center, Nijmegen, Netherlands.

One DNA Test to Replace 15: Long-Read Sequencing and the Future of Rare Disease Diagnosis

Related posts

GeneBench-Pro: OpenAI''s Research-Level Benchmark for Computational Biology Judgment

"What Happens to Creativity When AI Makes Copying Free?" — The shadcn Debate, Explained

Agentic Misalignment Summer 2026: Four Failure Modes in Frontier AI Agents