Multipath Reliable Connection (MRC) is a network protocol approach OpenAI developed with AMD, Broadcom, Intel, Microsoft, and NVIDIA over roughly two years. It extends RDMA over Converged Ethernet (RoCE), draws on Ultra Ethernet Consortium techniques, and uses SRv6-based source routing. The goal is very large GPU clusters: spray transfers across many paths, react to loss and congestion quickly, and run a simpler control plane. Overview: https://openai.com/index/mrc-supercomputer-networking/

Where are the specification and research paper?

OpenAI released MRC through the Open Compute Project—their post links an OCP PDF contribution. They also co-authored “Resilient AI Supercomputer Networking using MRC and SRv6” (PDF on cdn.openai.com). Cite those documents for topology math and protocol behavior, not second-hand summaries.

What production footprints does OpenAI claim?

They state MRC is deployed on their largest NVIDIA GB200 supercomputers used to train frontier models, naming Oracle Cloud Infrastructure in Abilene, Texas, and Microsoft Fairwater clusters, with hardware involvement from NVIDIA and Broadcom. Treat operational stories as vendor field reports until independently replicated.

Is MRC the same as LLM tokens?

No. MRC is datacenter GPU interconnect and forwarding behavior. Tokens are vocabulary chunks produced by a tokenizer—what APIs bill and what fits in a context window. Reliable training fabrics help ship better models; your product still optimizes prompts, tool output, and token budgets separately.

What is multi-plane networking in this context?

OpenAI describes splitting high-rate NIC capacity across multiple parallel switch planes (example: eight 100Gb/s planes instead of treating the link only as one 800Gb/s pipe). That increases path diversity and, in their counting, can enable very large all-to-all fabrics with fewer switch tiers than some conventional 800Gb/s-only designs. Verify exact scale claims in the OCP spec and paper.

What should I do next as an app developer?

Read this for context on why frontier capacity keeps scaling. For day-to-day shipping, pair model choice with token literacy ([What are tokens?](/blog/what-are-llm-tokens)), context limits ([LLM context window](/blog/llm-context-window-explained-2026)), and agent tooling ([MCP](https://explainx.ai/mcp-servers), [skills](https://explainx.ai/skills)).

OpenAI MRC explained: Multipath Reliable Connection for | explainx.ai Blog

explainx.ainewsletter3.5k

workshops ↗

OpenAI MRC explained: Multipath Reliable Connection for | explainx.ai Blog | explainx.ai

Dimension	Typical single-path RoCE-style mental model	OpenAI’s MRC narrative
Path use	One primary path preserves order	Spray across many paths; reorder at destination
Multi-plane	Often under-utilized or poorly balanced	Designed to load-balance across planes
Loss interpretation	May conflate congestion and failure	Trimming + probes to reduce false path retirement
Control plane	Dynamic routing common	SRv6 + static tables; endpoints steer around faults
Time scale	Seconds of instability possible	Claims microsecond-scale path decisions

Problem	Synchronous pretraining is tail-latency sensitive; collectives wait on the straggler; link and switch faults become routine at 100k+ GPU scale
MRC	Multipath over RoCE-class Ethernet, adaptive spraying, packet trimming, SRv6 source routing, static switch forwarding tables
Topology	Multi-plane fabrics—OpenAI cites on the order of ~131k GPUs with two switch tiers under their stated assumptions vs three–four tiers for some single-plane designs
Artifacts	OCP MRC 1.0 PDF · Paper PDF
Tokens	Not your chat tokenizer—tokens live at the model API layer; MRC lives in the GPU fabric layer

OpenAI MRC explained: Multipath Reliable Connection for GPU supercomputer networking (2026)

Visuals and attribution

TL;DR

Related posts

"American AI Is Losing" — The Open-Weights Op-Ed That Split Hacker News

macOS Blocks Codex as “Malware” — XProtect False Positive Fix (July 2026)

Codex $HOME Deletion: GPT-5.6, Full Access, and Tibo's July 16 Investigation

Why AI training turns supercomputer networking into a bottleneck

Multipath Reliable Connection (MRC): what it is

Multi-plane GPU fabrics (the topology MRC assumes)

Packet spraying, path retirement, and packet trimming

SRv6 source routing vs dynamic interior routing (BGP-style)

Deployment claims: GB200, OCI Abilene, Microsoft Fairwater

What OpenAI reports in production (and how to read it)

Contrast table: classic single-path RoCE vs MRC-style fabrics

Who needs to care (and who does not)

LLM tokens: what they are—and why they are not “packets on the wire”

explainx.ai resources (builder side of the stack)

Primary sources