Confirm successful installation by checking the skill directory location:
.cursor/skills/document-processing
Restart Cursor to activate document-processing. Access via /document-processing in your agent's command palette.
โ
Security Notice
We perform automated surface-level scans (Gen AI Scanner, Socket, Snyk) during installation. These checks detect common vulnerabilities but do not guarantee complete security. Always review skill source code and verify the publisher's reputation before production use.
Skills execute code in your environment. Always review source, verify the publisher, and test in isolation before production.
Source: This skill is adapted from Anthropic's document-processing skill
document processing skills (pdf, docx, pptx, xlsx) for Claude Code and AI agents.
Create, edit, and analyze office documents including PDFs, Word documents, PowerPoint presentations,
and Excel spreadsheets.
Quick Reference: Which Tool to Use
Task
Document Type
Best Tool
Extract text
PDF
pdfplumber, pdftotext
Merge/split
PDF
pypdf, qpdf
Fill forms
PDF
pdf-lib (JS), pypdf
Create new
PDF
reportlab
OCR scanned
PDF
pytesseract + pdf2image
Extract text
DOCX
pandoc, markitdown
Create new
DOCX
docx-js (JS)
Edit existing
DOCX
OOXML (unpack/edit/pack)
Extract text
PPTX
markitdown
Create new
PPTX
html2pptx, PptxGenJS
Edit existing
PPTX
OOXML (unpack/edit/pack)
Data analysis
XLSX
pandas
Formulas/formatting
XLSX
openpyxl
PDF Processing
Text Extraction
import pdfplumber
# Extract text with layout preservationwith pdfplumber.open("document.pdf")as pdf:for page in pdf.pages: text = page.extract_text()print(text)
Table Extraction
import pdfplumber
import pandas as pd
with pdfplumber.open("document.pdf")as pdf: all_tables =[]for page in pdf.pages: tables = page.extract_tables()for table in tables:if table: df = pd.DataFrame(table[1:], columns=table[0]) all_tables.append(df)# Combine all tablesif all_tables: combined_df = pd.concat(all_tables, ignore_index=True) combined_df.to_excel("extracted_tables.xlsx", index=False)
Merge PDFs
from pypdf import PdfWriter, PdfReader
writer = PdfWriter()for pdf_file in["doc1.pdf","doc2.pdf","doc3.pdf"]: reader = PdfReader(pdf_file)for page in reader.pages: writer.add_page(page)withopen("merged.pdf","wb")as output: writer.write(output)
Split PDF
from pypdf import PdfReader, PdfWriter
reader = PdfReader("input.pdf")for i, page inenumerate(reader.pages): writer = PdfWriter() writer.add_page(page)withopen(f"page_{i+1}.pdf","wb")as output: writer.write(output)
# Requires: pip install pytesseract pdf2imageimport pytesseract
from pdf2image import convert_from_path
# Convert PDF to imagesimages = convert_from_path('scanned.pdf')# OCR each pagetext =""for i, image inenumerate(images): text +=f"Page {i+1}:\n" text += pytesseract.image_to_string(image) text +="\n\n"print(text)
Add Watermark
from pypdf import PdfReader, PdfWriter
watermark = PdfReader("watermark.pdf").pages[0]reader = PdfReader("document.pdf")writer = PdfWriter()for page in reader.pages: page.merge_page(watermark) writer.add_page(page)withopen("watermarked.pdf","wb")as output: writer.write(output)
Password Protection
from pypdf import PdfReader, PdfWriter
reader = PdfReader("input.pdf")writer = PdfWriter()for page in reader.pages: writer.add_page(page)writer.encrypt("userpassword","ownerpassword")withopen("encrypted.pdf","wb")as output: writer.write(output)
Create PDF with ReportLab
from reportlab.lib.pagesizes import letter
from reportlab.platypus import SimpleDocTemplate, Paragraph, Spacer, PageBreak
from reportlab.lib.styles import getSampleStyleSheet
doc = SimpleDocTemplate("report.pdf", pagesize=letter)styles = getSampleStyleSheet()story =[]# Add contenttitle = Paragraph("Report Title", styles['Title'])story.append(title)story.append(Spacer(1,12))body = Paragraph("This is the body of the report. "*20, styles['Normal'])story.append(body)story.append(PageBreak())# Page 2story.append(Paragraph("Page 2", styles['Heading1']))story.append(Paragraph("Content for page 2", styles['Normal']))doc
Implementation Guide
Prerequisites
โบClaude Desktop or compatible AI client with skill support
โบClear understanding of task or problem to solve
โบWillingness to iterate and refine outputs
Time Estimate
15-45 minutes depending on use case complexity
Steps
1Install skill using provided installation command
2Test with simple use case relevant to your work
3Evaluate output quality and relevance
4Iterate on prompts to improve results
5Integrate into regular workflow if valuable
Common Pitfalls
โ Expecting perfect results without iteration
โ Not providing enough context in prompts
โ Using skill for tasks outside its intended scope
โ Accepting outputs without review and validation
Best Practices
โ Do
+Start with clear, specific prompts
+Provide relevant context and constraints
+Review and refine all outputs before using
+Iterate to improve output quality
+Document successful prompt patterns
โ Don't
โDon't use without understanding skill limitations
โDon't skip validation of outputs
โDon't share sensitive information in prompts
โDon't expect skill to replace human judgment
๐ก Pro Tips
โ Be specific about desired format and style
โ Ask for multiple options to choose from
โ Request explanations to understand reasoning
โ Combine AI efficiency with human expertise
When to Use This
โ Use when
Use when skill capabilities match your task, clear ROI on time saved, and you can validate outputs. Best for repetitive tasks, learning, and quality improvement.
โ Avoid when
Avoid when task requires deep expertise you can't validate, involves sensitive decisions, or when learning process is more valuable than speed of completion.
Learning Path
1Familiarize yourself with skill capabilities and limitations
2Start with low-risk, non-critical tasks
3Progress to more complex and valuable use cases
4Build expertise through regular use and experimentation