document-processing

eyadsibai/ltk · updated Apr 8, 2026

MDX-style export adds YAML metadata + attribution linking explainx.ai and this canonical listing URL.

$npx skills add https://github.com/eyadsibai/ltk --skill document-processing
0 commentsdiscussion
summary

Process, extract, and manipulate PDF, Excel, Word, and PowerPoint documents programmatically.

  • Supports four major office formats (PDF, XLSX, DOCX, PPTX) with format-specific tools: pypdf and pdfplumber for PDFs, openpyxl and pandas for Excel, python-docx for Word, python-pptx for PowerPoint
  • Core operations include text and table extraction, document merging and splitting, format conversion, and OCR for scanned PDFs
  • Excel-specific guidance emphasizes writing formulas rather than stati
skill.md

Document Processing Guide

Work with office documents: PDF, Excel, Word, and PowerPoint.


Format Overview

Format Extension Structure Best For
PDF .pdf Binary/text Reports, forms, archives
Excel .xlsx XML in ZIP Data, calculations, models
Word .docx XML in ZIP Text documents, contracts
PowerPoint .pptx XML in ZIP Presentations, slides

Key concept: XLSX, DOCX, and PPTX are all ZIP archives containing XML files. You can unzip them to access raw content.


PDF Processing

PDF Tools

Task Best Tool
Basic read/write pypdf
Text extraction pdfplumber
Table extraction pdfplumber
Create PDFs reportlab
OCR scanned PDFs pytesseract + pdf2image
Command line qpdf, pdftotext

Common Operations

Operation Approach
Merge Loop through files, add pages to writer
Split Create new writer per page
Extract tables Use pdfplumber, convert to DataFrame
Rotate Call .rotate(degrees) on page
Encrypt Use writer's .encrypt() method
OCR Convert to images, run pytesseract

Excel Processing

Excel Tools

Task Best Tool
Data analysis pandas
Formulas & formatting openpyxl
Simple CSV pandas
Financial models openpyxl

Critical Rule: Use Formulas

Approach Result
Wrong: Calculate in Python, write value Static number, breaks when data changes
Right: Write Excel formula Dynamic, recalculates automatically

Financial Model Standards

Convention Meaning
Blue text Hardcoded inputs
Black text Formulas
Green text Links to other sheets
Yellow fill Needs attention

Common Formula Errors

Error Cause
#REF! Invalid cell reference
#DIV/0! Division by zero
#VALUE! Wrong data type
#NAME? Unknown function name

Word Processing

Word Tools

Task Best Tool
Text extraction pandoc
Create new python-docx or docx-js
Simple edits python-docx
Tracked changes Direct XML editing

Document Structure

File Contains
word/document.xml Main content
word/comments.xml Comments
word/media/ Images

Tracked Changes (Redlining)

Element XML Tag
Deletion <w:del><w:delText>...</w:delText></w:del>
Insertion <w:ins><w:t>...</w:t></w:ins>

Key concept: For professional/legal documents, use tracked changes XML rather than replacing text directly.


PowerPoint Processing

PowerPoint Tools

Task Best Tool
Text extraction markitdown
Create new pptxgenjs (JS) or python-pptx
Edit existing Direct XML or python-pptx

Slide Structure

Path Contains
ppt/slides/slide{N}.xml Slide content
ppt/notesSlides/ Speaker notes
ppt/slideMasters/ Master templates
ppt/media/ Images

Design Principles

Principle Guideline
Fonts Use web-safe: Arial, Helvetica, Georgia
Layout Two-column preferred, avoid vertical stacking
Hierarchy Size, weight, color for emphasis
Consistency Repeat patterns across slides

Converting Between Formats

Conversion Tool
Any → PDF LibreOffice headless
PDF → Images pdftoppm
DOCX → Markdown pandoc
Any → Text Appropriate extractor

Best Practices

Practice Why
Use formulas in Excel Dynamic calculations
Preserve formatting on edit Don't lose styles
Test output opens correctly Catch corruption early
Use tracked changes for contracts Audit trail
Extract to markdown for analysis Easier to process

Common Packages

Language Packages
Python pypdf, pdfplumber, openpyxl, python-docx, python-pptx
JavaScript docx, pptxgenjs
CLI pandoc, qpdf, pdftotext, libreoffice
how to use document-processing

How to use document-processing on Cursor

AI-first code editor with Composer

1

Prerequisites

Before installing skills in Cursor, ensure your development environment meets these requirements:

  • Cursor installed and configured on your development machine
  • Node.js version 16.0+ with npm package manager (verify with node --version)
  • Active project directory or workspace where you want to add document-processing
2

Execute installation command

Execute the skills CLI command in your project's root directory to begin installation:

$npx skills add https://github.com/eyadsibai/ltk --skill document-processing

The skills CLI fetches document-processing from GitHub repository eyadsibai/ltk and configures it for Cursor.

3

Select Cursor when prompted

The CLI will show a list of available agents. Use arrow keys to navigate and space to select Cursor:

◆ Which agents do you want to install to?
│ ── Universal (.agents/skills) ── always included ────
│ • Amp
│ • Antigravity
│ • Cline
│ • Codex
│ ●Cursor(selected)
│ • Cursor
│ • Windsurf
4

Verify installation

Confirm successful installation by checking the skill directory location:

.cursor/skills/document-processing

Reload or restart Cursor to activate document-processing. Access the skill through slash commands (e.g., /document-processing) or your agent's skill management interface.

Security & Verification Notice

We perform automated surface-level scans (Gen AI Scanner, Socket, Snyk) during installation. These checks detect common vulnerabilities but do not guarantee complete security. Always review skill source code and verify the publisher's reputation before production use.

Skills execute code in your development environment. Always verify the publisher's identity, review recent commits, and test in isolated environments before production deployment.

List & Monetize Your Skill

Submit your Claude Code skill and start earning

GET_STARTED →

Use Cases

Task Automation & Efficiency

Automate repetitive workflows and reduce manual effort

Example

Generate reports, summarize documents, draft communications

Save 3-5 hours per week on routine tasks

Knowledge Enhancement

Learn new skills, understand complex topics, get expert guidance

Example

Explain concepts, provide examples, suggest learning resources

Accelerate learning and skill development by 2x

Quality Improvement

Enhance output quality through reviews, suggestions, and refinements

Example

Review drafts, suggest improvements, catch errors

Improve work quality by 30-40% with less effort

Implementation Guide

Prerequisites

  • Claude Desktop or compatible AI client with skill support
  • Clear understanding of task or problem to solve
  • Willingness to iterate and refine outputs

Time Estimate

15-45 minutes depending on use case complexity

Installation Steps

  1. 1.Install skill using provided installation command
  2. 2.Test with simple use case relevant to your work
  3. 3.Evaluate output quality and relevance
  4. 4.Iterate on prompts to improve results
  5. 5.Integrate into regular workflow if valuable

Common Pitfalls

  • Expecting perfect results without iteration
  • Not providing enough context in prompts
  • Using skill for tasks outside its intended scope
  • Accepting outputs without review and validation

Best Practices

✓ Do

  • +Start with clear, specific prompts
  • +Provide relevant context and constraints
  • +Review and refine all outputs before using
  • +Iterate to improve output quality
  • +Document successful prompt patterns

✗ Don't

  • Don't use without understanding skill limitations
  • Don't skip validation of outputs
  • Don't share sensitive information in prompts
  • Don't expect skill to replace human judgment

💡 Pro Tips

  • Be specific about desired format and style
  • Ask for multiple options to choose from
  • Request explanations to understand reasoning
  • Combine AI efficiency with human expertise

When to Use This

✓ Use When

Use when skill capabilities match your task, clear ROI on time saved, and you can validate outputs. Best for repetitive tasks, learning, and quality improvement.

✗ Avoid When

Avoid when task requires deep expertise you can't validate, involves sensitive decisions, or when learning process is more valuable than speed of completion.

Learning Path

  1. 1Familiarize yourself with skill capabilities and limitations
  2. 2Start with low-risk, non-critical tasks
  3. 3Progress to more complex and valuable use cases
  4. 4Build expertise through regular use and experimentation

Discussion

Product Hunt–style comments (not star reviews)
  • No comments yet — start the thread.
general reviews

Ratings

4.668 reviews
  • Hana Mensah· Dec 28, 2024

    I recommend document-processing for anyone iterating fast on agent tooling; clear intent and a small, reviewable surface area.

  • Anika Haddad· Dec 20, 2024

    Solid pick for teams standardizing on skills: document-processing is focused, and the summary matches what you get after install.

  • Zara Chawla· Dec 16, 2024

    document-processing fits our agent workflows well — practical, well scoped, and easy to wire into existing repos.

  • Emma Jain· Dec 12, 2024

    Registry listing for document-processing matched our evaluation — installs cleanly and behaves as described in the markdown.

  • Dhruvi Jain· Dec 8, 2024

    Keeps context tight: document-processing is the kind of skill you can hand to a new teammate without a long onboarding doc.

  • Ren Ramirez· Dec 8, 2024

    I recommend document-processing for anyone iterating fast on agent tooling; clear intent and a small, reviewable surface area.

  • Omar Sethi· Dec 8, 2024

    Keeps context tight: document-processing is the kind of skill you can hand to a new teammate without a long onboarding doc.

  • Oshnikdeep· Nov 27, 2024

    document-processing has been reliable in day-to-day use. Documentation quality is above average for community skills.

  • Tariq Thomas· Nov 27, 2024

    document-processing has been reliable in day-to-day use. Documentation quality is above average for community skills.

  • Chinedu Sanchez· Nov 11, 2024

    document-processing is among the better-maintained entries we tried; worth keeping pinned for repeat workflows.

showing 1-10 of 68

1 / 7