# 📄 @sylphx/pdf-reader-mcp
> Production-ready PDF processing server for AI agents
[](https://www.npmjs.com/package/@sylphx/pdf-reader-mcp)
[](https://opensource.org/licenses/MIT)
[](https://github.com/SylphxAI/pdf-reader-mcp/actions/workflows/ci.yml)
[](https://codecov.io/gh/SylphxAI/pdf-reader-mcp)
[](https://pdf-reader-msu3esos4-sylphx.vercel.app)
[](https://www.typescriptlang.org/)
[](https://www.npmjs.com/package/@sylphx/pdf-reader-mcp)
**5-10x faster parallel processing** • **Y-coordinate content ordering** • **94%+ test coverage** • **103 tests passing**
---
## 🚀 Overview
PDF Reader MCP is a **production-ready** Model Context Protocol server that empowers AI agents with **enterprise-grade PDF processing capabilities**. Extract text, images, and metadata with unmatched performance and reliability.
**The Problem:**
```typescript
// Traditional PDF processing
- Sequential page processing (slow)
- No natural content ordering
- Complex path handling
- Poor error isolation
```
**The Solution:**
```typescript
// PDF Reader MCP
- 5-10x faster parallel processing ⚡
- Y-coordinate based ordering 📐
- Flexible path support (absolute/relative) 🎯
- Per-page error resilience 🛡️
- 94%+ test coverage ✅
```
**Result: Production-ready PDF processing that scales.**
---
## ⚡ Key Features
### Performance
- 🚀 **5-10x faster** than sequential with automatic parallelization
- ⚡ **12,933 ops/sec** error handling, 5,575 ops/sec text extraction
- 💨 **Process 50-page PDFs** in seconds with multi-core utilization
- 📦 **Lightweight** with minimal dependencies
### Developer Experience
- 🎯 **Path Flexibility** - Absolute & relative paths, Windows/Unix support (v1.3.0)
- 🖼️ **Smart Ordering** - Y-coordinate based content preserves document layout
- 🛡️ **Type Safe** - Full TypeScript with strict mode enabled
- 📚 **Battle-tested** - 103 tests, 94%+ coverage, 98%+ function coverage
- 🎨 **Simple API** - Single tool handles all operations elegantly
---
## 📊 Performance Benchmarks
Real-world performance from production testing:
| Operation | Ops/sec | Performance | Use Case |
|-----------|---------|-------------|----------|
| **Error handling** | 12,933 | ⚡⚡⚡⚡⚡ | Validation & safety |
| **Extract full text** | 5,575 | ⚡⚡⚡⚡ | Document analysis |
| **Extract page** | 5,329 | ⚡⚡⚡⚡ | Single page ops |
| **Multiple pages** | 5,242 | ⚡⚡⚡⚡ | Batch processing |
| **Metadata only** | 4,912 | ⚡⚡⚡ | Quick inspection |
### Parallel Processing Speedup
| Document | Sequential | Parallel | Speedup |
|----------|-----------|----------|---------|
| **10-page PDF** | ~2s | ~0.3s | **5-8x faster** |
| **50-page PDF** | ~10s | ~1s | **10x faster** |
| **100+ pages** | ~20s | ~2s | **Linear scaling** with CPU cores |
*Benchmarks vary based on PDF complexity and system resources.*
---
## 📦 Installation
### Claude Code
```bash
claude mcp add pdf-reader -- npx @sylphx/pdf-reader-mcp
```
### Claude Desktop
Add to `claude_desktop_config.json`:
```json
{
"mcpServers": {
"pdf-reader": {
"command": "npx",
"args": ["@sylphx/pdf-reader-mcp"]
}
}
}
```
📍 Config file locations
- **macOS**: `~/Library/Application Support/Claude/claude_desktop_config.json`
- **Windows**: `%APPDATA%\Claude\claude_desktop_config.json`
- **Linux**: `~/.config/Claude/claude_desktop_config.json`
### VS Code
```bash
code --add-mcp '{"name":"pdf-reader","command":"npx","args":["@sylphx/pdf-reader-mcp"]}'
```
### Cursor
1. Open **Settings** → **MCP** → **Add new MCP Server**
2. Select **Command** type
3. Enter: `npx @sylphx/pdf-reader-mcp`
### Windsurf
Add to your Windsurf MCP config:
```json
{
"mcpServers": {
"pdf-reader": {
"command": "npx",
"args": ["@sylphx/pdf-reader-mcp"]
}
}
}
```
### Cline
Add to Cline's MCP settings:
```json
{
"mcpServers": {
"pdf-reader": {
"command": "npx",
"args": ["@sylphx/pdf-reader-mcp"]
}
}
}
```
### Warp
1. Go to **Settings** → **AI** → **Manage MCP Servers** → **Add**
2. Command: `npx`, Args: `@sylphx/pdf-reader-mcp`
### Smithery (One-click)
```bash
npx -y @smithery/cli install @sylphx/pdf-reader-mcp --client claude
```
### Manual Installation
```bash
# Quick start - zero installation
npx @sylphx/pdf-reader-mcp
# Or install globally
npm install -g @sylphx/pdf-reader-mcp
```
---
## 🎯 Quick Start
### Basic Usage
```json
{
"sources": [{
"path": "documents/report.pdf"
}],
"include_full_text": true,
"include_metadata": true,
"include_page_count": true
}
```
**Result:**
- ✅ Full text content extracted
- ✅ PDF metadata (author, title, dates)
- ✅ Total page count
- ✅ Structural sharing - unchanged parts preserved
### Extract Specific Pages
```json
{
"sources": [{
"path": "documents/manual.pdf",
"pages": "1-5,10,15-20"
}],
"include_full_text": true
}
```
### Absolute Paths (v1.3.0+)
```json
// Windows - Both formats work!
{
"sources": [{
"path": "C:\Users\John\Documents\eport.pdf"
}],
"include_full_text": true
}
// Unix/Mac
{
"sources": [{
"path": "/home/user/documents/contract.pdf"
}],
"include_full_text": true
}
```
**No more** `"Absolute paths are not allowed"` **errors!**
### Extract Images with Natural Ordering
```json
{
"sources": [{
"path": "presentation.pdf",
"pages": [1, 2, 3]
}],
"include_images": true,
"include_full_text": true
}
```
**Response includes:**
- Text and images in **exact document order** (Y-coordinate sorted)
- Base64-encoded images with metadata (width, height, format)
- Natural reading flow preserved for AI comprehension
### Batch Processing
```json
{
"sources": [
{ "path": "C:\Reports\Q1.pdf", "pages": "1-10" },
{ "path": "/home/user/Q2.pdf", "pages": "1-10" },
{ "url": "https://example.com/Q3.pdf" }
],
"include_full_text": true
}
```
⚡ **All PDFs processed in parallel automatically!**
---
## ✨ Features
### Core Capabilities
- ✅ **Text Extraction** - Full document or specific pages with intelligent parsing
- ✅ **Image Extraction** - Base64-encoded with complete metadata (width, height, format)
- ✅ **Content Ordering** - Y-coordinate based layout preservation for natural reading flow
- ✅ **Metadata Extraction** - Author, title, creation date, and custom properties
- ✅ **Page Counting** - Fast enumeration without loading full content
- ✅ **Dual Sources** - Local files (absolute or relative paths) and HTTP/HTTPS URLs
- ✅ **Batch Processing** - Multiple PDFs processed concurrently
### Advanced Features
- ⚡ **5-10x Performance** - Parallel page processing with Promise.all
- 🎯 **Smart Pagination** - Extract ranges like "1-5,10-15,20"
- 🖼️ **Multi-Format Images** - RGB, RGBA, Grayscale with automatic detection
- 🛡️ **Path Flexibility** - Windows, Unix, and relative paths all supported (v1.3.0)
- 🔍 **Error Resilience** - Per-page error isolation with detailed messages
- 📏 **Large File Support** - Efficient streaming and memory management
- 📝 **Type Safe** - Full TypeScript with strict mode enabled
---
## 🆕 What's New in v1.3.0
### 🎉 Absolute Paths Now Supported!
```json
// ✅ Windows
{ "path": "C:\Users\John\Documents\eport.pdf" }
{ "path": "C:/Users/John/Documents/report.pdf" }
// ✅ Unix/Mac
{ "path": "/home/john/documents/report.pdf" }
{ "path": "/Users/john/Documents/report.pdf" }
// ✅ Relative (still works)
{ "path": "documents/report.pdf" }
```
**Other Improvements:**
- 🐛 Fixed Zod validation error handling
- 📦 Updated all dependencies to latest versions
- ✅ 103 tests passing, 94%+ coverage maintained
📋 View Full Changelog
**v1.2.0 - Content Ordering**
- Y-coordinate based text and image ordering
- Natural reading flow for AI models
- Intelligent line grouping
**v1.1.0 - Image Extraction & Performance**
- Base64-encoded image extraction
- 10x speedup with parallel processing
- Comprehensive test coverage (94%+)
[View Full Changelog →](./CHANGELOG.md)
---
## 📖 API Reference
### `read_pdf` Tool
The single tool that handles all PDF operations.
#### Parameters
| Parameter | Type | Description | Default |
|-----------|------|-------------|---------|
| `sources` | Array | List of PDF sources to process | Required |
| `include_full_text` | boolean | Extract full text content | `false` |
| `include_metadata` | boolean | Extract PDF metadata | `true` |
| `include_page_count` | boolean | Include total page count | `true` |
| `include_images` | boolean | Extract embedded images | `false` |
#### Source Object
```typescript
{
path?: string; // Local file path (absolute or relative)
url?: string; // HTTP/HTTPS URL to PDF
pages?: string | number[]; // Pages to extract: "1-5,10" or [1,2,3]
}
```
#### Examples
**Metadata only (fast):**
```json
{
"sources": [{ "path": "large.pdf" }],
"include_metadata": true,
"include_page_count": true,
"include_full_text": false
}
```
**From URL:**
```json
{
"sources": [{
"url": "https://arxiv.org/pdf/2301.00001.pdf"
}],
"include_full_text": true
}
```
**Page ranges:**
```json
{
---