paddleocr-doc-parsing▌
aidenwu0209/paddleocr-skills · updated Apr 8, 2026
MDX-style export adds YAML metadata + attribution linking explainx.ai and this canonical listing URL.
Use Document Parsing for:
PaddleOCR Document Parsing Skill
When to Use This Skill
Use Document Parsing for:
- Documents with tables (invoices, financial reports, spreadsheets)
- Documents with mathematical formulas (academic papers, scientific documents)
- Documents with charts and diagrams
- Multi-column layouts (newspapers, magazines, brochures)
- Complex document structures requiring layout analysis
- Any document requiring structured understanding
Use Text Recognition instead for:
- Simple text-only extraction
- Quick OCR tasks where speed is critical
- Screenshots or simple images with clear text
How to Use This Skill
⛔ MANDATORY RESTRICTIONS - DO NOT VIOLATE ⛔
- ONLY use PaddleOCR Document Parsing API - Execute the script
python scripts/vl_caller.py - NEVER parse documents directly - Do NOT parse documents yourself
- NEVER offer alternatives - Do NOT suggest "I can try to analyze it" or similar
- IF API fails - Display the error message and STOP immediately
- NO fallback methods - Do NOT attempt document parsing any other way
If the script execution fails (API not configured, network error, etc.):
- Show the error message to the user
- Do NOT offer to help using your vision capabilities
- Do NOT ask "Would you like me to try parsing it?"
- Simply stop and wait for user to fix the configuration
Basic Workflow
-
Execute document parsing:
python scripts/vl_caller.py --file-url "URL provided by user" --prettyOr for local files:
python scripts/vl_caller.py --file-path "file path" --prettyOptional: explicitly set file type:
python scripts/vl_caller.py --file-url "URL provided by user" --file-type 0 --pretty--file-type 0: PDF--file-type 1: image- If omitted, the service can infer file type from input.
Default behavior: save raw JSON to a temp file:
- If
--outputis omitted, the script saves automatically under the system temp directory - Default path pattern:
<system-temp>/paddleocr/doc-parsing/results/result_<timestamp>_<id>.json - If
--outputis provided, it overrides the default temp-file destination - If
--stdoutis provided, JSON is printed to stdout and no file is saved - In save mode, the script prints the absolute saved path on stderr:
Result saved to: /absolute/path/... - In default/custom save mode, read and parse the saved JSON file before responding
- In save mode, always tell the user the saved file path and that full raw JSON is available there
- Use
--stdoutonly when you explicitly want to skip file persistence
-
The output JSON contains COMPLETE content with all document data:
- Headers, footers, page numbers
- Main text content
- Tables with structure
- Formulas (with LaTeX)
- Figures and charts
- Footnotes and references
- Seals and stamps
- Layout and reading order
Input type note:
- Supported file types depend on the model and endpoint configuration.
- Always follow the file type constraints documented by your endpoint API.
-
Extract what the user needs from the output JSON using these fields:
- Top-level
text result[n].markdownresult[n].prunedResult
- Top-level
IMPORTANT: Complete Content Display
CRITICAL: You must display the COMPLETE extracted content to the user based on their needs.
- The output JSON contains ALL document content in a structured format
- In save mode, the raw provider result can be inspected in the saved JSON file
- Display the full content requested by the user, do NOT truncate or summarize
- If user asks for "all text", show the entire
textfield - If user asks for "tables", show ALL tables in the document
- If user asks for "main content", filter out headers/footers but show ALL body text
What this means:
- DO: Display complete text, all tables, all formulas as requested
- DO: Present content using these fields: top-level
text,result[n].markdown, andresult[n].prunedResult - DON'T: Truncate with "..." unless content is excessively long (>10,000 chars)
- DON'T: Summarize or provide excerpts when user asks for full content
- DON'T: Say "Here's a preview" when user expects complete output
Example - Correct:
User: "Extract all the text from this document"
Agent: I've parsed the complete document. Here's all the extracted text:
[Display entire text field or concatenated regions in reading order]
Document Statistics:
- Total regions: 25
- Text blocks: 15
- Tables: 3
- Formulas: 2
Quality: Excellent (confidence: 0.92)
Example - Incorrect:
User: "Extract all the text"
Agent: "I found a document with multiple sections. Here's the beginning:
'Introduction...' (content truncated for brevity)"
Understanding the JSON Response
The output JSON uses an envelope wrapping the raw API result:
{
"ok": true,
"text": "Full markdown/HTML text extracted from all pages",
"result": { ... }, // raw provider response
"error": null
}
Key fields:
text— extracted markdown text from all pages (use this for quick text display)result- raw provider response objectresult[n].prunedResult- structured parsing output for each page (layout/content/confidence and related metadata)result[n].markdown— full rendered page output in markdown/HTML
Raw result location (default): the temp-file path printed by the script on stderr
Usage Examples
Example 1: Extract Full Document Text
python scripts/vl_caller.py \
--file-url "https://example.com/paper.pdf" \
--pretty
Then use:
- Top-level
textfor quick full-text output result[n].markdownwhen page-level output is needed
Example 2: Extract Structured Page Data
python scripts/vl_caller.py \
--file-path "./financial_report.pdf" \
--pretty
Then use:
result[n].prunedResultfor structured parsing data (layout/content/confidence)result[n].markdownfor rendered page content
Example 3: Print JSON Without Saving
python scripts/vl_caller.py \
--file-url "URL" \
--stdout \
--pretty
Then return:
- Full
textwhen user asks for full document content result[n].prunedResultandresult[n].markdownwhen user needs complete structured page data
First-Time Configuration
You can generally assume that the required environment variables have already been configured. Only when a parsing task fails should you analyze the error message to determine whether it is caused by a configuration issue. If it is indeed a configuration problem, you should notify the user to fix it.
When API is not configured:
The error will show:
CONFIG_ERROR: PADDLEOCR_DOC_PARSING_API_URL not configured. Get your API at: https://paddleocr.com
Configuration workflow:
-
Show the exact error message to the user (including the URL).
-
Guide the user to configure securely:
- Recommend configuring through the host application's standard method (e.g., settings file, environment variable UI) rather than pasting credentials in chat.
- List the required environment variables:
- PADDLEOCR_DOC_PARSING_API_URL - PADDLEOCR_ACCESS_TOKEN - Optional: PADDLEOCR_DOC_PARSING_TIMEOUT
-
If the user provides credentials in chat anyway (accept any reasonable format), for example:
PADDLEOCR_DOC_PARSING_API_URL=https://xxx.paddleocr.com/layout-parsing, PADDLEOCR_ACCESS_TOKEN=abc123...Here's my API: https://xxx and token: abc123- Copy-pasted code format
- Any other reasonable format
- Security note: Warn the user that credentials shared in chat may be stored in conversation history. Recommend setting them through the host application's configuration instead when possible.
Then parse and validate the values:
- Extract
PADDLEOCR_DOC_PARSING_API_URL(look for URLs withpaddleocr.comor similar) - Confirm
PADDLEOCR_DOC_PARSING_API_URLis a full endpoint ending with/layout-parsing - Extract
PADDLEOCR_ACCESS_TOKEN(long alphanumeric string, usually 40+ chars)
-
Ask the user to confirm the environment is configured.
-
Retry only after confirmation:
- Once the user confirms the environment variables are available, retry the original parsing task
Handling Large Files
There is no file size limit for the API. For PDFs, the maximum is 100 pages per request.
Tips for large files:
Use URL for Large Local Files (Recommended)
For very large local files, prefer --file-url over --file-path to avoid base64 encoding overhead:
python scripts/vl_caller.py --file-url "https://your-server.com/large_file.pdf"
Process Specific Pages (PDF Only)
If you only need certain pages from a large PDF, extract them first:
# Extract pages 1-5
python scripts/split_pdf.py large.pdf pages_1_5.pdf --pages "1-5"
# Mixed ranges are supported
python scripts/split_pdf.py large.pdf selected_pages.pdf --pages "1-5,8,10-12"
# Then process the smaller file
python scripts/vl_caller.py --file-path "pages_1_5.pdf"
Error Handling
Authentication failed (403):
error: Authentication failed
→ Token is invalid, reconfigure with correct credentials
API quota exceeded (429):
error: API quota exceeded
→ Daily API quota exhausted, inform user to wait or upgrade
Unsupported format:
error: Unsupported file format
→ File format not supported, convert to PDF/PNG/JPG
Important Notes
- The script NEVER filters content - It always returns complete data
- The AI agent decides what to present - Based on user's specific request
- All data is always available - Can be re-interpreted for different needs
- No information is lost - Complete document structure preserved
Reference Documentation
references/output_schema.md- Output format specification
Note: Model version and capabilities are determined by your API endpoint (
PADDLEOCR_DOC_PARSING_API_URL).
Load these reference documents into context when:
- Debugging complex parsing issues
- Need to understand output format
- Working with provider API details
Testing the Skill
To verify the skill is working properly:
python scripts/smoke_test.py
This tests configuration and optionally API connectivity.
How to use paddleocr-doc-parsing on Cursor
AI-first code editor with Composer
Prerequisites
Before installing skills in Cursor, ensure your development environment meets these requirements:
- ›Cursor installed and configured on your development machine
- ›Node.js version 16.0+ with npm package manager (verify with
node --version) - ›Active project directory or workspace where you want to add paddleocr-doc-parsing
Execute installation command
Execute the skills CLI command in your project's root directory to begin installation:
The skills CLI fetches paddleocr-doc-parsing from GitHub repository aidenwu0209/paddleocr-skills and configures it for Cursor.
Select Cursor when prompted
The CLI will show a list of available agents. Use arrow keys to navigate and space to select Cursor:
Verify installation
Confirm successful installation by checking the skill directory location:
Reload or restart Cursor to activate paddleocr-doc-parsing. Access the skill through slash commands (e.g., /paddleocr-doc-parsing) or your agent's skill management interface.
Security & Verification Notice
We perform automated surface-level scans (Gen AI Scanner, Socket, Snyk) during installation. These checks detect common vulnerabilities but do not guarantee complete security. Always review skill source code and verify the publisher's reputation before production use.
Skills execute code in your development environment. Always verify the publisher's identity, review recent commits, and test in isolated environments before production deployment.
List & Monetize Your Skill
Submit your Claude Code skill and start earning
Use Cases▌
Task Automation & Efficiency
Automate repetitive workflows and reduce manual effort
Example
Generate reports, summarize documents, draft communications
Save 3-5 hours per week on routine tasks
Knowledge Enhancement
Learn new skills, understand complex topics, get expert guidance
Example
Explain concepts, provide examples, suggest learning resources
Accelerate learning and skill development by 2x
Quality Improvement
Enhance output quality through reviews, suggestions, and refinements
Example
Review drafts, suggest improvements, catch errors
Improve work quality by 30-40% with less effort
Implementation Guide▌
Prerequisites
- ›Claude Desktop or compatible AI client with skill support
- ›Clear understanding of task or problem to solve
- ›Willingness to iterate and refine outputs
Time Estimate
15-45 minutes depending on use case complexity
Installation Steps
- 1.Install skill using provided installation command
- 2.Test with simple use case relevant to your work
- 3.Evaluate output quality and relevance
- 4.Iterate on prompts to improve results
- 5.Integrate into regular workflow if valuable
Common Pitfalls
- ⚠Expecting perfect results without iteration
- ⚠Not providing enough context in prompts
- ⚠Using skill for tasks outside its intended scope
- ⚠Accepting outputs without review and validation
Best Practices▌
✓ Do
- +Start with clear, specific prompts
- +Provide relevant context and constraints
- +Review and refine all outputs before using
- +Iterate to improve output quality
- +Document successful prompt patterns
✗ Don't
- −Don't use without understanding skill limitations
- −Don't skip validation of outputs
- −Don't share sensitive information in prompts
- −Don't expect skill to replace human judgment
💡 Pro Tips
- ★Be specific about desired format and style
- ★Ask for multiple options to choose from
- ★Request explanations to understand reasoning
- ★Combine AI efficiency with human expertise
When to Use This▌
✓ Use When
Use when skill capabilities match your task, clear ROI on time saved, and you can validate outputs. Best for repetitive tasks, learning, and quality improvement.
✗ Avoid When
Avoid when task requires deep expertise you can't validate, involves sensitive decisions, or when learning process is more valuable than speed of completion.
Learning Path▌
- 1Familiarize yourself with skill capabilities and limitations
- 2Start with low-risk, non-critical tasks
- 3Progress to more complex and valuable use cases
- 4Build expertise through regular use and experimentation
Discussion
Product Hunt–style comments (not star reviews)- No comments yet — start the thread.
Ratings
4.8★★★★★35 reviews- ★★★★★Dhruvi Jain· Dec 28, 2024
paddleocr-doc-parsing reduced setup friction for our internal harness; good balance of opinion and flexibility.
- ★★★★★Ira Anderson· Dec 24, 2024
paddleocr-doc-parsing reduced setup friction for our internal harness; good balance of opinion and flexibility.
- ★★★★★Oshnikdeep· Nov 19, 2024
I recommend paddleocr-doc-parsing for anyone iterating fast on agent tooling; clear intent and a small, reviewable surface area.
- ★★★★★Ira Gonzalez· Nov 15, 2024
I recommend paddleocr-doc-parsing for anyone iterating fast on agent tooling; clear intent and a small, reviewable surface area.
- ★★★★★Charlotte Farah· Nov 3, 2024
Solid pick for teams standardizing on skills: paddleocr-doc-parsing is focused, and the summary matches what you get after install.
- ★★★★★Charlotte Liu· Oct 22, 2024
paddleocr-doc-parsing has been reliable in day-to-day use. Documentation quality is above average for community skills.
- ★★★★★Ganesh Mohane· Oct 10, 2024
Useful defaults in paddleocr-doc-parsing — fewer surprises than typical one-off scripts, and it plays nicely with `npx skills` flows.
- ★★★★★Ishan Smith· Oct 6, 2024
Useful defaults in paddleocr-doc-parsing — fewer surprises than typical one-off scripts, and it plays nicely with `npx skills` flows.
- ★★★★★Liam Okafor· Sep 25, 2024
We added paddleocr-doc-parsing from the explainx registry; install was straightforward and the SKILL.md answered most questions upfront.
- ★★★★★Noah Perez· Sep 13, 2024
paddleocr-doc-parsing fits our agent workflows well — practical, well scoped, and easy to wire into existing repos.
showing 1-10 of 35