How do I install pdf-extraction?

Run `npx skills add https://github.com/claude-office-skills/skills --skill pdf-extraction` in your terminal. You need to have run `npx skills init` once in your project first.

Which agent frameworks does pdf-extraction support?

pdf-extraction works with any agent framework supported by the Skills registry, including Claude Code, Cursor, GitHub Copilot, Cline, Codex, and Gemini CLI.

Is pdf-extraction free to use?

Yes. pdf-extraction is free to install and use. It is available from the open explainx.ai skill registry published by claude-office-skills.

Where can I read ratings and reviews for pdf-extraction?

Community ratings and review text appear on this explainx.ai skill page below the description. Reviews use a 1–5 scale and may include short written feedback from signed-in members.

How do I install pdf-extraction?

Run `npx skills add https://github.com/claude-office-skills/skills --skill pdf-extraction` in your terminal. You need to have run `npx skills init` once in your project first.

Which agent frameworks does pdf-extraction support?

pdf-extraction works with any agent framework supported by the Skills registry, including Claude Code, Cursor, GitHub Copilot, Cline, Codex, and Gemini CLI.

Is pdf-extraction free to use?

Yes. pdf-extraction is free to install and use. It is available from the open explainx.ai skill registry published by claude-office-skills.

Where can I read ratings and reviews for pdf-extraction?

Community ratings and review text appear on this explainx.ai skill page below the description. Reviews use a 1–5 scale and may include short written feedback from signed-in members.

How do I install pdf-extraction?

Run `npx skills add https://github.com/claude-office-skills/skills --skill pdf-extraction` in your terminal. You need to have run `npx skills init` once in your project first.

Which agent frameworks does pdf-extraction support?

pdf-extraction works with any agent framework supported by the Skills registry, including Claude Code, Cursor, GitHub Copilot, Cline, Codex, and Gemini CLI.

Is pdf-extraction free to use?

Yes. pdf-extraction is free to install and use. It is available from the open explainx.ai skill registry published by claude-office-skills.

Where can I read ratings and reviews for pdf-extraction?

Community ratings and review text appear on this explainx.ai skill page below the description. Reviews use a 1–5 scale and may include short written feedback from signed-in members.

How do I install pdf-extraction?

Run `npx skills add https://github.com/claude-office-skills/skills --skill pdf-extraction` in your terminal. You need to have run `npx skills init` once in your project first.

Which agent frameworks does pdf-extraction support?

pdf-extraction works with any agent framework supported by the Skills registry, including Claude Code, Cursor, GitHub Copilot, Cline, Codex, and Gemini CLI.

Is pdf-extraction free to use?

Yes. pdf-extraction is free to install and use. It is available from the open explainx.ai skill registry published by claude-office-skills.

Where can I read ratings and reviews for pdf-extraction?

Community ratings and review text appear on this explainx.ai skill page below the description. Reviews use a 1–5 scale and may include short written feedback from signed-in members.

How do I install pdf-extraction?

Run `npx skills add https://github.com/claude-office-skills/skills --skill pdf-extraction` in your terminal. You need to have run `npx skills init` once in your project first.

Which agent frameworks does pdf-extraction support?

pdf-extraction works with any agent framework supported by the Skills registry, including Claude Code, Cursor, GitHub Copilot, Cline, Codex, and Gemini CLI.

Is pdf-extraction free to use?

Yes. pdf-extraction is free to install and use. It is available from the open explainx.ai skill registry published by claude-office-skills.

Where can I read ratings and reviews for pdf-extraction?

Community ratings and review text appear on this explainx.ai skill page below the description. Reviews use a 1–5 scale and may include short written feedback from signed-in members.

How do I install pdf-extraction?

Run `npx skills add https://github.com/claude-office-skills/skills --skill pdf-extraction` in your terminal. You need to have run `npx skills init` once in your project first.

Which agent frameworks does pdf-extraction support?

pdf-extraction works with any agent framework supported by the Skills registry, including Claude Code, Cursor, GitHub Copilot, Cline, Codex, and Gemini CLI.

Is pdf-extraction free to use?

Yes. pdf-extraction is free to install and use. It is available from the open explainx.ai skill registry published by claude-office-skills.

Where can I read ratings and reviews for pdf-extraction?

Community ratings and review text appear on this explainx.ai skill page below the description. Reviews use a 1–5 scale and may include short written feedback from signed-in members.

How do I install pdf-extraction?

Run `npx skills add https://github.com/claude-office-skills/skills --skill pdf-extraction` in your terminal. You need to have run `npx skills init` once in your project first.

Which agent frameworks does pdf-extraction support?

pdf-extraction works with any agent framework supported by the Skills registry, including Claude Code, Cursor, GitHub Copilot, Cline, Codex, and Gemini CLI.

Is pdf-extraction free to use?

Yes. pdf-extraction is free to install and use. It is available from the open explainx.ai skill registry published by claude-office-skills.

Where can I read ratings and reviews for pdf-extraction?

Community ratings and review text appear on this explainx.ai skill page below the description. Reviews use a 1–5 scale and may include short written feedback from signed-in members.

How do I install pdf-extraction?

Run `npx skills add https://github.com/claude-office-skills/skills --skill pdf-extraction` in your terminal. You need to have run `npx skills init` once in your project first.

Which agent frameworks does pdf-extraction support?

pdf-extraction works with any agent framework supported by the Skills registry, including Claude Code, Cursor, GitHub Copilot, Cline, Codex, and Gemini CLI.

Is pdf-extraction free to use?

Yes. pdf-extraction is free to install and use. It is available from the open explainx.ai skill registry published by claude-office-skills.

Where can I read ratings and reviews for pdf-extraction?

Community ratings and review text appear on this explainx.ai skill page below the description. Reviews use a 1–5 scale and may include short written feedback from signed-in members.

How do I install pdf-extraction?

Run `npx skills add https://github.com/claude-office-skills/skills --skill pdf-extraction` in your terminal. You need to have run `npx skills init` once in your project first.

Which agent frameworks does pdf-extraction support?

pdf-extraction works with any agent framework supported by the Skills registry, including Claude Code, Cursor, GitHub Copilot, Cline, Codex, and Gemini CLI.

Is pdf-extraction free to use?

Yes. pdf-extraction is free to install and use. It is available from the open explainx.ai skill registry published by claude-office-skills.

Where can I read ratings and reviews for pdf-extraction?

Community ratings and review text appear on this explainx.ai skill page below the description. Reviews use a 1–5 scale and may include short written feedback from signed-in members.

How do I install pdf-extraction?

Run `npx skills add https://github.com/claude-office-skills/skills --skill pdf-extraction` in your terminal. You need to have run `npx skills init` once in your project first.

Which agent frameworks does pdf-extraction support?

pdf-extraction works with any agent framework supported by the Skills registry, including Claude Code, Cursor, GitHub Copilot, Cline, Codex, and Gemini CLI.

Is pdf-extraction free to use?

Yes. pdf-extraction is free to install and use. It is available from the open explainx.ai skill registry published by claude-office-skills.

Where can I read ratings and reviews for pdf-extraction?

Community ratings and review text appear on this explainx.ai skill page below the description. Reviews use a 1–5 scale and may include short written feedback from signed-in members.

How do I install pdf-extraction?

Run `npx skills add https://github.com/claude-office-skills/skills --skill pdf-extraction` in your terminal. You need to have run `npx skills init` once in your project first.

Documents

pdf-extraction▌

claude-office-skills/skills · updated Apr 20, 2026

$npx skills add https://github.com/claude-office-skills/skills --skill pdf-extraction

0 commentsdiscussion

summary

Extract text, tables, and metadata from PDF documents with character-level precision.

›Supports text extraction with layout preservation, word-level positioning, and character-level access including font and size metadata
›Includes advanced table detection with customizable strategies (lines, text, explicit) and tolerance tuning for complex layouts
›Provides visual debugging via image rendering with overlays for characters, words, lines, and detected table boundaries
›Handles cropping

skill.md

PDF Extraction Skill

Overview

This skill enables precise extraction of text, tables, and metadata from PDF documents using pdfplumber - the go-to library for PDF data extraction. Unlike basic PDF readers, pdfplumber provides detailed character-level positioning, accurate table detection, and visual debugging.

How to Use

Provide the PDF file you want to extract from
Specify what you need: text, tables, images, or metadata
I'll generate pdfplumber code and execute it

Example prompts:

"Extract all tables from this financial report"
"Get text from pages 5-10 of this document"
"Find and extract the invoice total from this PDF"
"Convert this PDF table to CSV/Excel"

Domain Knowledge

pdfplumber Fundamentals

import pdfplumber

# Open PDF
with pdfplumber.open('document.pdf') as pdf:
    # Access pages
    first_page = pdf.pages[0]
    
    # Document metadata
    print(pdf.metadata)
    
    # Number of pages
    print(len(pdf.pages))

PDF Structure

PDF Document
├── metadata (title, author, creation date)
├── pages[]
│   ├── chars (individual characters with position)
│   ├── words (grouped characters)
│   ├── lines (horizontal/vertical lines)
│   ├── rects (rectangles)
│   ├── curves (bezier curves)
│   └── images (embedded images)
└── outline (bookmarks/TOC)

Text Extraction

Basic Text

with pdfplumber.open('document.pdf') as pdf:
    # Single page
    text = pdf.pages[0].extract_text()
    
    # All pages
    full_text = ''
    for page in pdf.pages:
        full_text += page.extract_text() or ''

Advanced Text Options

# With layout preservation
text = page.extract_text(
    x_tolerance=3,      # Horizontal tolerance for grouping
    y_tolerance=3,      # Vertical tolerance
    layout=True,        # Preserve layout
    x_density=7.25,     # Chars per unit width
    y_density=13        # Chars per unit height
)

# Extract words with positions
words = page.extract_words(
    x_tolerance=3,
    y_tolerance=3,
    keep_blank_chars=False,
    use_text_flow=False
)

# Each word includes: text, x0, top, x1, bottom, etc.
for word in words:
    print(f"{word['text']} at ({word['x0']}, {word['top']})")

Character-Level Access

# Get all characters
chars = page.chars

for char in chars:
    print(f"'{char['text']}' at ({char['x0']}, {char['top']})")
    print(f"  Font: {char['fontname']}, Size: {char['size']}")

Table Extraction

Basic Table Extraction

with pdfplumber.open('report.pdf') as pdf:
    page = pdf.pages[0]
    
    # Extract all tables
    tables = page.extract_tables()
    
    for i, table in enumerate(tables):
        print(f"Table {i+1}:")
        for row in table:
            print(row)

Advanced Table Settings

# Custom table detection
table_settings = {
    "vertical_strategy": "lines",      # or "text", "explicit"
    "horizontal_strategy": "lines",
    "explicit_vertical_lines": [],     # Custom line positions
    "explicit_horizontal_lines": [],
    "snap_tolerance": 3,
    "snap_x_tolerance": 3,
    "snap_y_tolerance": 3,
    "join_tolerance": 3,
    "edge_min_length": 3,
    "min_words_vertical": 3,
    "min_words_horizontal": 1,
    "intersection_tolerance": 3,
    "text_tolerance": 3,
    "text_x_tolerance": 3,
    "text_y_tolerance": 3,
}

tables = page.extract_tables(table_settings)

Table Finding

# Find tables (without extracting)
table_finder = page.find_tables()

for table in table_finder:
    print(f"Table at: {table.bbox}")  # (x0, top, x1, bottom)
    
    # Extract specific table
    data = table.extract()

Visual Debugging

# Create visual debug image
im = page.to_image(resolution=150)

# Draw detected objects
im.draw_rects(page.chars)        # Character bounding boxes
im.draw_rects(page.words)        # Word bounding boxes
im.draw_lines(page.lines)        # Lines
im.draw_rects(page.rects)        # Rectangles

# Save debug image
im.save('debug.png')

# Debug tables
im.reset()
im.debug_tablefinder()
im.save('table_debug.png')

Cropping and Filtering

Crop to Region

# Define bounding box (x0, top, x1, bottom)
bbox = (0, 0, 300, 200)

# Crop page
cropped = page.crop(bbox)

# Extract from cropped area
text = cropped.extract_text()
tables = cropped.extract_tables()

Filter by Position

# Filter characters by region
def within_bbox(obj, bbox):
    x0, topDiscussion
Product Hunt–style comments (not star reviews)
No comments yet — start the thread.

general reviews

`Ratings`

4.5★★★★★69 reviews

★★★★★Harper Gupta· Dec 24, 2024
pdf-extraction reduced setup friction for our internal harness; good balance of opinion and flexibility.
★★★★★Meera Liu· Dec 24, 2024
Registry listing for pdf-extraction matched our evaluation — installs cleanly and behaves as described in the markdown.
★★★★★Jin Park· Dec 20, 2024
Keeps context tight: pdf-extraction is the kind of skill you can hand to a new teammate without a long onboarding doc.
★★★★★Naina Martinez· Dec 20, 2024
I recommend pdf-extraction for anyone iterating fast on agent tooling; clear intent and a small, reviewable surface area.
★★★★★Nia Okafor· Dec 12, 2024
Useful defaults in pdf-extraction — fewer surprises than typical one-off scripts, and it plays nicely with `npx skills` flows.
★★★★★Meera Shah· Dec 8, 2024
We added pdf-extraction from the explainx registry; install was straightforward and the SKILL.md answered most questions upfront.
★★★★★Nia Abebe· Dec 8, 2024
Keeps context tight: pdf-extraction is the kind of skill you can hand to a new teammate without a long onboarding doc.
★★★★★Li Srinivasan· Dec 4, 2024
pdf-extraction reduced setup friction for our internal harness; good balance of opinion and flexibility.
★★★★★Luis Johnson· Nov 27, 2024
Registry listing for pdf-extraction matched our evaluation — installs cleanly and behaves as described in the markdown.
★★★★★Rahul Santra· Nov 19, 2024
pdf-extraction reduced setup friction for our internal harness; good balance of opinion and flexibility.

showing 1-10 of 69

1 / 7