pdf-processing▌
davila7/claude-code-templates · updated Apr 8, 2026
Use pdfplumber to extract text from PDFs:
PDF Processing
Quick start
Use pdfplumber to extract text from PDFs:
import pdfplumber
with pdfplumber.open("document.pdf") as pdf:
text = pdf.pages[0].extract_text()
print(text)
Extracting tables
Extract tables from PDFs with automatic detection:
import pdfplumber
with pdfplumber.open("report.pdf") as pdf:
page = pdf.pages[0]
tables = page.extract_tables()
for table in tables:
for row in table:
print(row)
Extracting all pages
Process multi-page documents efficiently:
import pdfplumber
with pdfplumber.open("document.pdf") as pdf:
full_text = ""
for page in pdf.pages:
full_text += page.extract_text() + "\n\n"
print(full_text)
Form filling
For PDF form filling, see FORMS.md for the complete guide including field analysis and validation.
Merging PDFs
Combine multiple PDF files:
from pypdf import PdfMerger
merger = PdfMerger()
for pdf in ["file1.pdf", "file2.pdf", "file3.pdf"]:
merger.append(pdf)
merger.write("merged.pdf")
merger.close()
Splitting PDFs
Extract specific pages or ranges:
from pypdf import PdfReader, PdfWriter
reader = PdfReader("input.pdf")
writer = PdfWriter()
# Extract pages 2-5
for page_num in range(1, 5):
writer.add_page(reader.pages[page_num])
with open("output.pdf", "wb") as output:
writer.write(output)
Available packages
- pdfplumber - Text and table extraction (recommended)
- pypdf - PDF manipulation, merging, splitting
- pdf2image - Convert PDFs to images (requires poppler)
- pytesseract - OCR for scanned PDFs (requires tesseract)
Common patterns
Extract and save text:
import pdfplumber
with pdfplumber.open("input.pdf") as pdf:
text = "\n\n".join(page.extract_text() for page in pdf.pages)
with open("output.txt", "w") as f:
f.write(text)
Extract tables to CSV:
import pdfplumber
import csv
with pdfplumber.open("tables.pdf") as pdf:
tables = pdf.pages[0].extract_tables()
with open("output.csv", "w", newline="") as f:
writer = csv.writer(f)
for table in tables:
writer.writerows(table)
Error handling
Handle common PDF issues:
import pdfplumber
try:
with pdfplumber.open("document.pdf") as pdf:
if len(pdf.pages) == 0:
print("PDF has no pages")
else:
text = pdf.pages[0].extract_text()
if text is None or text.strip() == "":
print("Page contains no extractable text (might be scanned)")
else:
print(text)
except Exception as e:
print(f"Error processing PDF: {e}")
Performance tips
- Process pages in batches for large PDFs
- Use multiprocessing for multiple files
- Extract only needed pages rather than entire document
- Close PDF objects after use
Discussion
Product Hunt–style comments (not star reviews)- No comments yet — start the thread.
Ratings
4.7★★★★★40 reviews- ★★★★★Michael Mensah· Dec 8, 2024
Useful defaults in pdf-processing — fewer surprises than typical one-off scripts, and it plays nicely with `npx skills` flows.
- ★★★★★Michael Iyer· Dec 8, 2024
Keeps context tight: pdf-processing is the kind of skill you can hand to a new teammate without a long onboarding doc.
- ★★★★★Maya Martinez· Nov 27, 2024
I recommend pdf-processing for anyone iterating fast on agent tooling; clear intent and a small, reviewable surface area.
- ★★★★★Maya Robinson· Nov 27, 2024
pdf-processing fits our agent workflows well — practical, well scoped, and easy to wire into existing repos.
- ★★★★★Ishan Perez· Nov 27, 2024
We added pdf-processing from the explainx registry; install was straightforward and the SKILL.md answered most questions upfront.
- ★★★★★Chen Gonzalez· Oct 18, 2024
pdf-processing reduced setup friction for our internal harness; good balance of opinion and flexibility.
- ★★★★★Kiara Zhang· Oct 18, 2024
Registry listing for pdf-processing matched our evaluation — installs cleanly and behaves as described in the markdown.
- ★★★★★Ishan Gill· Oct 18, 2024
Solid pick for teams standardizing on skills: pdf-processing is focused, and the summary matches what you get after install.
- ★★★★★Michael Gupta· Sep 25, 2024
We added pdf-processing from the explainx registry; install was straightforward and the SKILL.md answered most questions upfront.
- ★★★★★Mateo Kim· Sep 25, 2024
Solid pick for teams standardizing on skills: pdf-processing is focused, and the summary matches what you get after install.
showing 1-10 of 40