explainx.ainewsletter3.4k
trending🔥loopsskills
pricing
workshops ↗
explainx.ai

Learn to lead teams that combine humans and agents. Platform access, live workshops, bootcamps, and 50+ courses — plus skills, tools, and MCP to practice what you learn.

follow us

custom AI agents

[email protected]

get started

Join · $29/mo

learn

platform · $29/moworkshopsbootcampscoursescertificationscertification testsexplainx universitycorporate trainingfacilitatorshackathonslearn skills & mcp

discover

skillstoolsagentsmcp serversdesignsllmsagiranks

content

releasesvisionmissionaboutcommunityteamcareersresourcespromptsgenerators hubgenerator SEO hubprompt templatesprompt guidesblogfor LLMsdemo

Sister Products

Infloq

Infloq

Influencer marketing

BgBlur

BgBlur

Privacy-first blur

Olly Social

Olly Social

Social AI copilot

Ceptory

Ceptory

Video intelligence

BgRemover

BgRemover

Background removal

newsletter · weekly

Get AI news, tools, and insights in your inbox.

contactsupportprivacytermsdata rightssubmission guidelines

© 2026 AISOLO Technologies Pvt Ltd

← Back to blog

explainx / blog

MinerU 3.4: PDF and Office Parsing for LLM, RAG, and Agent Workflows

MinerU 3.4 upgrades OCR to PP-OCRv6 (+11% accuracy), doubles OCR speed, and parses PDF, DOCX, PPTX, and XLSX to Markdown/JSON. 69.7k GitHub stars. Full pipeline, hybrid, and VLM backends.

Jun 26, 2026·8 min read·Yash Thakker
Document AIRAGOpen SourceOCRPDF
MinerU 3.4: PDF and Office Parsing for LLM, RAG, and Agent Workflows

If your RAG pipeline still treats PDFs as "extract text with PyPDF and hope," you are leaving layout, tables, formulas, and multi-column structure on the floor. MinerU — OpenDataLab's document parsing engine with ~69.7k GitHub stars — exists to fix that: turn complex PDFs and Office documents into LLM-ready Markdown and JSON with headings, tables, formulas, and images preserved.

Version 3.4.0 landed June 18, 2026 with a focused upgrade: PP-OCRv6 for the pipeline backend (~11% OCR accuracy gain on OmniDocBench v1.6), roughly 100% faster OCR processing, and smarter model download / cache reuse. For agent builders, MinerU is increasingly the default ingestion layer before chunking, embedding, and retrieval.

newsletter3.4k

Curated AI updates on agents, skills, and MCP — delivered to your inbox. Unsubscribe anytime.


TL;DR

DetailMinerU 3.4
Repogithub.com/opendatalab/MinerU
Docsopendatalab.github.io/MinerU
Latest releasemineru-3.4.0 (June 2026)
Stars / forks~69.7k / ~5.9k
InputsPDF, images, DOCX, PPTX, XLSX
OutputsMarkdown, JSON, multimodal formats
LicenseMinerU Open Source License (Apache 2.0–based)
Installuv pip install -U "mineru[all]"
CLImineru -p <input> -o <output>
CPU path-b pipeline

Why MinerU Matters for RAG and Agents

Document ingestion is the silent failure mode in most RAG systems. Chunk a badly parsed PDF and you get:

  • Tables split across chunks with no header context
  • Formulas rendered as garbage Unicode
  • Multi-column layouts read in wrong order
  • Headers, footers, and page numbers polluting embeddings

MinerU addresses parsing before chunking. It removes headers/footers/page numbers, preserves document structure (headings, lists, paragraphs), converts formulas to LaTeX, tables to HTML, extracts images with captions, and detects scanned PDFs for automatic OCR.

The project originated during InternLM pre-training — built to solve symbol conversion in scientific literature. That pedigree shows in formula and table handling, where generic text extractors fail.

June 2026 sits in a crowded document-AI moment: Baidu Unlimited-OCR targets one-shot long-horizon parsing; Mistral OCR 4 offers managed API extraction with bounding boxes. MinerU's position: full-stack open ingestion with multiple backends, local deployment, and production routing (mineru-router) — not a single-model demo.


Version 3.4: What Changed (June 18, 2026)

PP-OCRv6 upgrade

The pipeline backend's OCR model moved to PP-OCRv6, improving OCR accuracy by about 11% on OmniDocBench v1.6. Japanese, Traditional Chinese, English, and Latin were removed as separate OCR language options — those scenarios now route through the ch OCR model, simplifying configuration.

~100% OCR speed improvement

MinerU optimized the OCR inference and processing pipeline, roughly doubling OCR throughput — significant for batch document jobs and OCR-heavy scans.

Model download and cache

  • Automatic model source selection on first install based on network environment (HuggingFace, ModelScope, etc.)
  • Local cache priority — checks downloaded model files before remote requests
  • Reduces repeated downloads across dev/staging/prod environments

See OpenDataLab's Model Source Documentation for configuration details.


Parsing Backends Compared

MinerU is not one model — it is an orchestration stack with backend selection:

BackendAccuracy (OmniDocBench v1.6 E2E)CPUGPUBest for
pipeline86.47✅OptionalHomelab, CPU-only, batch OCR
hybrid medium (default)95.26❌8GB+ VRAMDaily production — speed/accuracy balance
hybrid high95.39❌8GB+ VRAMMax accuracy, image analysis
vlm / vlm-http-client95.30❌2GB+ VRAM (client)OpenAI-compatible remote servers

Hybrid medium (added in v3.3, now default) sacrifices only 0.13 accuracy points vs high while delivering 35–220% speed improvements by platform:

PlatformText PDF speedupOCR scenario speedup
Linux~80%~35%
Windows~90%~45%
macOS~220%~50%

Medium does not support image analysis inside documents — switch to effort=high when you need that.

VLM model: MinerU2.5-Pro

The primary VLM is MinerU2.5-Pro-2605-1.2B (v3.3+) with native multilingual OCR, image/chart parsing, truncated paragraph merging, and cross-page table merging. v3.1.0 added native PPTX and XLSX parsing alongside PDF, DOCX, and images.


Key Features

  • Multi-format input: PDF, PNG/JPG, DOCX, PPTX, XLSX
  • Layout-aware output: reading order for single/multi-column and complex layouts
  • Formula → LaTeX, table → HTML
  • OCR: 109 languages; auto-detect scanned/garbled PDFs
  • Outputs: NLP Markdown, multimodal Markdown, JSON by reading order, layout/span visualizations
  • Interfaces: CLI, FastAPI (mineru-api), Gradio WebUI, mineru-router for multi-GPU load balancing
  • Async tasks: POST /tasks for submit/status/result (v3.0+)
  • Long documents: sliding-window parsing + streaming disk writes — tens of thousands of pages without manual splitting
  • Thread-safe multi-threaded inference for high-concurrency production

Quick Start

Install

pip install --upgrade pip
pip install uv
uv pip install -U "mineru[all]"

mineru[all] is the recommended bundle for Windows, Linux, and macOS.

Parse a document (GPU path)

mineru -p document.pdf -o ./output

Parse on CPU only

mineru -p document.pdf -o ./output -b pipeline

Supports single files or directories. Outputs land in structured Markdown/JSON under the output path.

Docker

Docker deployment is documented for Linux and Windows WSL2 — macOS should use pip/uv install instead. See Docker deployment docs.


Production: mineru-router and Multi-GPU

mineru-router (v3.0+) provides unified entry deployment across multiple services and GPUs:

  • Interfaces fully compatible with mineru-api
  • Automatic task load balancing
  • Designed for high-concurrency, high-throughput parsing farms

Combined with thread-safe concurrent inference and streaming writes, MinerU 3.x targets enterprise document pipelines — not just one-off CLI conversions. That aligns with Liquid AI LFM2.5-230M's data-extraction positioning: parse at scale upstream, route structured chunks to small edge models downstream.


Hardware Requirements (Summary)

pipelinehybrid / vlm
OSLinux 2019+, Windows, macOS 14+Same
Python3.10–3.13 (Windows: 3.10–3.12)Same
RAMMin 16GB, rec 32GB+Min 16GB
VRAM4GB optionalMin 8GB (hybrid), 2GB (http client)
DiskMin 20GB SSD recommendedMin 2GB (+ models)

Pure CPU inference is pipeline-only. Apple Silicon supports GPU acceleration via MPS on supported backends.


MinerU vs Alternatives (June 2026)

ToolStrengthTrade-off
MinerU 3.4Full stack, multi-backend, Office formats, routerHeavy install, GPU for best accuracy
Unlimited-OCROne-shot long PDFs, SGLang throughputVision-model path, different architecture
Mistral OCR 4Managed API, bounding boxes, confidenceNot self-hosted weights
Generic PyPDFFast, trivialNo layout, tables, or formulas

For RAG specifically, parsed output quality directly affects chunking and retrieval strategy. MinerU's JSON-sorted-by-reading-order output is designed for downstream indexing — or wire parsed Markdown into a Langflow RAG pipeline for visual retriever tuning. At the extreme end of "hard documents," the Vesuvius Challenge applies a similar parse-then-verify loop to carbonized 2,000-year-old scrolls — with papyrologists, not chunkers, as the final gate.


License Evolution

v3.1.0 (April 2026) moved MinerU from AGPLv3 to the MinerU Open Source License — Apache 2.0–based with additional conditions. The change explicitly targets lower adoption friction for commercial deployments while keeping the codebase open.

v3.0 also removed dependencies on AGPLv3 models (doclayoutyolo, mfd_yolov8) and a CC-BY-NC-SA layoutreader — cleaning the license stack for enterprise use.


Online Demos (Try Before Deploy)

DemoNotes
Official web appFull features, login required
OpenDataLabSame as official
ModelScope GradioCore parsing, no login
HuggingFace GradioCore parsing, no login

MinerU's own docs recommend trying online demos first — complex layouts, scans, and handwriting may still fall short of expectations.


Related ExplainX coverage

PostConnection
Baidu Unlimited-OCRAlternative long-horizon parsing approach
Mistral OCR 4Managed document AI API comparison
RAG vs agentic RAGWhat to do with parsed documents
Liquid AI LFM2.5-230MEdge extraction after MinerU ingestion
arXiv AI-generated errors banWhy grounded document pipelines matter
Vesuvius Challenge scroll readExtreme document recovery — ML ink detection + human transcription

Summary

MinerU 3.4 reinforces its role as the default open-source document ingestion engine for LLM workflows: PP-OCRv6 accuracy, doubled OCR speed, smarter model caching, 95%+ hybrid parsing, full Office format support, and mineru-router for production scale.

69.7k stars reflect years of iteration from InternLM's pre-training needs to today's agent/RAG stacks. If your agents read PDFs, MinerU is the layer to install before you embed a single chunk.


Last updated: June 26, 2026. Version details from github.com/opendatalab/MinerU release mineru-3.4.0 and project README.

Related posts

Jun 23, 2026

Baidu's Unlimited-OCR: One-Shot Long-Horizon Document Parsing Is Here

Baidu's Unlimited-OCR lands on GitHub and Hugging Face with 1.8k stars overnight. The model parses entire PDFs, multi-page scans, and dense documents in one shot — no chunking, no stitching — and ships with both a Transformers and a high-throughput SGLang backend.

Jun 23, 2026

Mistral OCR 4: Bounding Boxes, Document AI, and the New OCR API

Mistral AI released OCR 4 on June 23, 2026 — a structured document extraction model with bounding boxes, typed blocks, and inline confidence scores in 170 languages. Independent annotators preferred it 72% of the time in blind tests. Here is what changed, how it compares to Baidu Unlimited-OCR, and how to call the API.

Jun 21, 2026

PixelRAG: Berkeley's Visual RAG That Reads Web Pages as Screenshots (Not HTML)

PixelRAG skips HTML parsing entirely. Instead it renders web pages and PDFs to screenshot tiles and retrieves over the images using a Qwen3-VL-Embedding model LoRA-fine-tuned on screenshot data. Tables, charts, and visual layout survive. Accuracy improves up to 18% over text-based RAG on SimpleQA benchmarks. There is a hosted API at pixelrag.ai/api backed by 8.28M Wikipedia pages, a CLI install in one pip command, and a Claude Code plugin that lets Claude screenshot any URL and read it like a human.