baoyu-url-to-markdown▌
jimliu/baoyu-skills · updated Apr 8, 2026
MDX-style export adds YAML metadata + attribution linking explainx.ai and this canonical listing URL.
Fetch any URL and convert to clean markdown using Chrome CDP with intelligent fallback conversion.
- ›Supports two capture modes: auto-capture on page load or wait-for-user-signal for login-required and lazy-loading pages
- ›Saves rendered HTML snapshot alongside markdown output with YAML front matter including metadata, URL, title, and capture timestamp
- ›Upgraded Defuddle-based conversion pipeline with automatic fallback to legacy HTML-to-Markdown extractor; falls back to hosted defuddle.m
URL to Markdown
Fetches any URL via baoyu-fetch CLI (Chrome CDP + site-specific adapters) and converts it to clean markdown.
CLI Setup
Important: The CLI source is vendored in the scripts/vendor/baoyu-fetch/ subdirectory of this skill.
Agent Execution Instructions:
- Determine this SKILL.md file's directory path as
{baseDir} - CLI entry point =
{baseDir}/scripts/vendor/baoyu-fetch/src/cli.ts - Resolve
${BUN_X}runtime: ifbuninstalled →bun; ifnpxavailable →npx -y bun; else suggest installing bun ${READER}=${BUN_X} {baseDir}/scripts/vendor/baoyu-fetch/src/cli.ts- Replace all
${READER}in this document with the resolved value
Preferences (EXTEND.md)
Check EXTEND.md existence (priority order):
# macOS, Linux, WSL, Git Bash
test -f .baoyu-skills/baoyu-url-to-markdown/EXTEND.md && echo "project"
test -f "${XDG_CONFIG_HOME:-$HOME/.config}/baoyu-skills/baoyu-url-to-markdown/EXTEND.md" && echo "xdg"
test -f "$HOME/.baoyu-skills/baoyu-url-to-markdown/EXTEND.md" && echo "user"
# PowerShell (Windows)
if (Test-Path .baoyu-skills/baoyu-url-to-markdown/EXTEND.md) { "project" }
$xdg = if ($env:XDG_CONFIG_HOME) { $env:XDG_CONFIG_HOME } else { "$HOME/.config" }
if (Test-Path "$xdg/baoyu-skills/baoyu-url-to-markdown/EXTEND.md") { "xdg" }
if (Test-Path "$HOME/.baoyu-skills/baoyu-url-to-markdown/EXTEND.md") { "user" }
| Path | Location |
|---|---|
.baoyu-skills/baoyu-url-to-markdown/EXTEND.md |
Project directory |
$HOME/.baoyu-skills/baoyu-url-to-markdown/EXTEND.md |
User home |
| Result | Action |
|---|---|
| Found | Read, parse, apply settings |
| Not found | MUST run first-time setup (see below) — do NOT silently create defaults |
EXTEND.md Supports: Download media by default | Default output directory
First-Time Setup (BLOCKING)
CRITICAL: When EXTEND.md is not found, you MUST use AskUserQuestion to ask the user for their preferences before creating EXTEND.md. NEVER create EXTEND.md with defaults without asking. This is a BLOCKING operation — do NOT proceed with any conversion until setup is complete.
Use AskUserQuestion with ALL questions in ONE call:
Question 1 — header: "Media", question: "How to handle images and videos in pages?"
- "Ask each time (Recommended)" — After saving markdown, ask whether to download media
- "Always download" — Always download media to local imgs/ and videos/ directories
- "Never download" — Keep original remote URLs in markdown
Question 2 — header: "Output", question: "Default output directory?"
- "url-to-markdown (Recommended)" — Save to ./url-to-markdown/{domain}/{slug}.md
- (User may choose "Other" to type a custom path)
Question 3 — header: "Save", question: "Where to save preferences?"
- "User (Recommended)" — ~/.baoyu-skills/ (all projects)
- "Project" — .baoyu-skills/ (this project only)
After user answers, create EXTEND.md at the chosen location, confirm "Preferences saved to [path]", then continue.
Full reference: references/config/first-time-setup.md
Supported Keys
| Key | Default | Values | Description |
|---|---|---|---|
download_media |
ask |
ask / 1 / 0 |
ask = prompt each time, 1 = always download, 0 = never |
default_output_dir |
empty | path or empty | Default output directory (empty = ./url-to-markdown/) |
EXTEND.md → CLI mapping:
| EXTEND.md key | CLI argument | Notes |
|---|---|---|
download_media: 1 |
--download-media |
Requires --output to be set |
default_output_dir: ./posts/ |
Agent constructs --output ./posts/{domain}/{slug}.md |
Agent generates path, not a direct CLI flag |
Value priority:
- CLI arguments (
--download-media,--output) - EXTEND.md
- Skill defaults
Features
- Chrome CDP for full JavaScript rendering via
baoyu-fetchCLI - Site-specific adapters: X/Twitter, YouTube, Hacker News, generic (Defuddle)
- Automatic adapter selection based on URL, or force with
--adapter - Interaction gate detection: Cloudflare, reCAPTCHA, hCAPTCHA, custom challenges
- Two capture modes: headless (default) or interactive with wait-for-interaction
- Clean markdown output with YAML front matter
- Structured JSON output available via
--format json - X/Twitter: extracts tweets, threads, and X Articles with media
- YouTube: transcript/caption extraction, chapters, cover images
- Hacker News: threaded comment parsing with proper nesting
- Generic: Defuddle extraction with Readability fallback
- Download images and videos to local directories
- Chrome profile persistence for authenticated sessions
- Debug artifact output for troubleshooting
Usage
# Default: headless capture, markdown to stdout
${READER} <url>
# Save to file
${READER} <url> --output article.md
# Save with media download
${READER} <url> --output article.md --download-media
# Headless mode (explicit)
${READER} <url> --headless --output article.md
# Wait for interaction (login/CAPTCHA) — auto-detect and continue
${READER} <url> --wait-for interaction --output article.md
# Wait for interaction — manual control (Enter to continue)
${READER} <url> --wait-for force --output article.md
# JSON output
${READER} <url> --format json --output article.json
# Force specific adapter
${READER} <url> --adapter youtube --output transcript.md
# Connect to existing Chrome
${READER} <url> --cdp-url http://localhost:9222 --output article.md
# Debug artifacts
${READER} <url> --output article.md --debug-dir ./debug/
Options
| Option | Description |
|---|---|
<url> |
URL to fetch |
--output <path> |
Output file path (default: stdout) |
--format <type> |
Output format: markdown (default) or json |
--json |
Shorthand for --format json |
--adapter <name> |
Force adapter: x, youtube, hn, or generic (default: auto-detect) |
--headless |
Force headless Chrome (no visible window) |
--wait-for <mode> |
Interaction wait mode: none (default), interaction, or force |
--wait-for-interaction |
Alias for --wait-for interaction |
--wait-for-login |
Alias for --wait-for interaction |
--timeout <ms> |
Page load timeout (default: 30000) |
--interaction-timeout <ms> |
Login/CAPTCHA wait timeout (default: 600000 = 10 min) |
--interaction-poll-interval <ms> |
Poll interval for interaction checks (default: 1500) |
--download-media |
Download images/videos to local imgs/ and videos/, rewrite markdown links. Requires --output |
--media-dir <dir> |
Base directory for downloaded media (default: same as --output directory) |
--cdp-url <url> |
Reuse existing Chrome DevTools Protocol endpoint |
--browser-path <path> |
Custom Chrome/Chromium binary path |
--chrome-profile-dir <path> |
Chrome user data directory (default: BAOYU_CHROME_PROFILE_DIR env or ./baoyu-skills/chrome-profile) |
--debug-dir <dir> |
Write debug artifacts (document.json, markdown.md, page.html, network.json) |
Capture Modes
| Mode | Behavior | Use When |
|---|---|---|
| Default | Headless Chrome, auto-extract on network idle | Public pages, static content |
--headless |
Explicit headless (same as default) | Clarify intent |
--wait-for interaction |
Opens visible Chrome, auto-detects login/CAPTCHA gates, waits for them to clear, then continues | Login-required, CAPTCHA-protected |
--wait-for force |
Opens visible Chrome, auto-detects OR accepts Enter keypress to continue | Complex flows, lazy loading, paywalls |
Interaction gate auto-detection:
- Cloudflare Turnstile / "just a moment" pages
- Google reCAPTCHA
- hCaptcha
- Custom challenge / verification screens
Wait-for-interaction workflow:
- Run with
--wait-for interaction→ Chrome opens visibly - CLI auto-detects login/CAPTCHA gates
- User completes login or solves CAPTCHA in the browser
- CLI auto-detects gate cleared → captures page
- If
--wait-for forceis used, user can also press Enter to trigger capture manually
Agent Quality Gate
CRITICAL: The agent must treat default headless capture as provisional. Some sites render differently in headless mode and can silently return low-quality content without causing the CLI to fail.
After every headless run, the agent MUST inspect the saved markdown output.
Quality checks the agent must perform
- Confirm the markdown title matches the target page, not a generic site shell
- Confirm the body contains the expected article or page content, not just navigation, footer, or a generic error
- Watch for obvious failure signs:
Application errorThis page could not be found- Login, signup, subscribe, or verification shells
- Extremely short markdown for a page that should be long-form
- Raw framework payloads or mostly boilerplate content
- If the result is low quality, incomplete, or clearly wrong, do not accept the run as successful just because the CLI exited with code 0
Tip: Use --format json to get structured output including status, login.state, and interaction fields for programmatic quality assessment. A "status": "needs_interaction" response means the page requires manual interaction.
Recovery workflow the agent must follow
- First run headless (default) unless there is already a clear reason to use interaction mode
- Review markdown quality immediately after the run
- If the content is low quality or indicates login/CAPTCHA:
--wait-for interactionfor auto-detected gates (login, CAPTCHA, Cloudflare)--wait-for forcewhen the page needs manual browsing, scroll loading, or complex interaction
- If
--wait-foris used, tell the user exactly what to do:- If login is required, ask them to sign in in the browser
- If CAPTCHA appears, ask them to solve it
- If the page needs time to load, ask them to wait until content is visible
- For
--wait-for force: tell them to press Enter when ready
- If JSON output shows
"status": "needs_interaction", switch to--wait-for interactionautomatically
Output Path Generation
The agent must construct the output file path since baoyu-fetch does not auto-generate paths.
Algorithm:
- Determine base directory from EXTEND.md
default_output_diror default./url-to-markdown/ - Extract domain from URL (e.g.,
example.com) - Generate slug from URL path or page title (kebab-case, 2-6 words)
- Construct:
{base_dir}/{domain}/{slug}/{slug}.md— each URL gets its own directory so media files stay isolated - Conflict resolution: append timestamp
{slug}-YYYYMMDD-HHMMSS/{slug}-YYYYMMDD-HHMMSS.md
Pass the constructed path to --output. Media files (--download-media) are saved into subdirectories next to the markdown file, keeping each URL's assets self-contained.
Output Format
Markdown output to stdout (or file with --output) as clean markdown text.
JSON output (--format json) returns structured data including:
adapter— which adapter handled the URLstatus—"ok"or"needs_interaction"login— login state detection (logged_in,logged_out,unknown)interaction— interaction gate details (kind, provider, prompt)document— structured content (url, title, author, publishedAt, content blocks, metadata)media— collected media assets with url, kind, rolemarkdown— converted markdown textdownloads— media download results (when--download-mediaused)
When --download-media is enabled:
- Images are saved to
imgs/next to the output file (or in--media-dir) - Videos are saved to
videos/next to the output file (or in--media-dir) - Markdown media links are rewritten to local relative paths
Built-in Adapters
| Adapter | URLs | Key Features |
|---|---|---|
x |
x.com, twitter.com | Tweets, threads, X Articles, media, login detection |
youtube |
youtube.com, youtu.be | Transcript/captions, chapters, cover image, metadata |
hn |
news.ycombinator.com | Threaded comments, story metadata, nested replies |
generic |
Any URL (fallback) | Defuddle extraction, Readability fallback, auto-scroll, network idle detection |
Adapter is auto-selected based on URL. Use --adapter <name> to override.
Media Download Workflow
Based on download_media setting in EXTEND.md:
| Setting | Behavior |
|---|---|
1 (always) |
Run CLI with --download-media --output <path> |
0 (never) |
Run CLI with --output <path> (no media download) |
ask (default) |
Follow the ask-each-time flow below |
Ask-Each-Time Flow
- Run CLI without
--download-mediawith--output <path>→ markdown saved - Check saved markdown for remote media URLs (
https://in image/video links) - If no remote media found → done, no prompt needed
- If remote media found → use
AskUserQuestion:- header: "Media", question: "Download N images/videos to local files?"
- "Yes" — Download to local directories
- "No" — Keep remote URLs
- If user confirms → run CLI again with
--download-media --output <same-path>(overwrites markdown with localized links)
Environment Variables
| Variable | Description |
|---|---|
BAOYU_CHROME_PROFILE_DIR |
Chrome user data directory (can also use --chrome-profile-dir) |
Troubleshooting: Chrome not found → use --browser-path. Timeout → increase --timeout. Login/CAPTCHA pages → use --wait-for interaction. Debug → use --debug-dir to inspect captured HTML and network logs.
How to use baoyu-url-to-markdown on Cursor
AI-first code editor with Composer
Prerequisites
Before installing skills in Cursor, ensure your development environment meets these requirements:
- ›Cursor installed and configured on your development machine
- ›Node.js version 16.0+ with npm package manager (verify with
node --version) - ›Active project directory or workspace where you want to add baoyu-url-to-markdown
Execute installation command
Execute the skills CLI command in your project's root directory to begin installation:
The skills CLI fetches baoyu-url-to-markdown from GitHub repository jimliu/baoyu-skills and configures it for Cursor.
Select Cursor when prompted
The CLI will show a list of available agents. Use arrow keys to navigate and space to select Cursor:
Verify installation
Confirm successful installation by checking the skill directory location:
Reload or restart Cursor to activate baoyu-url-to-markdown. Access the skill through slash commands (e.g., /baoyu-url-to-markdown) or your agent's skill management interface.
Security & Verification Notice
We perform automated surface-level scans (Gen AI Scanner, Socket, Snyk) during installation. These checks detect common vulnerabilities but do not guarantee complete security. Always review skill source code and verify the publisher's reputation before production use.
Skills execute code in your development environment. Always verify the publisher's identity, review recent commits, and test in isolated environments before production deployment.
List & Monetize Your Skill
Submit your Claude Code skill and start earning
Use Cases▌
User Story & Requirements Generation
Create detailed user stories, acceptance criteria, and feature specs
Example
Generate user stories for 'password reset feature' with acceptance criteria, edge cases, and test scenarios
Reduce spec writing time by 50%, ensure comprehensive coverage
Competitive Analysis
Research competitors, compare features, identify gaps
Example
Analyze 5 competitor products, create feature comparison matrix, suggest differentiation opportunities
Complete competitive research in 2 hours instead of 2 days
Roadmap Prioritization
Evaluate features using frameworks (RICE, ICE, Kano) and create prioritized backlogs
Example
Score 20 feature ideas using RICE framework, generate prioritized roadmap with rationale
Make data-driven prioritization decisions faster
Stakeholder Communication
Draft PRDs, status updates, and stakeholder presentations
Example
Create executive summary of Q3 roadmap, monthly progress report, feature launch announcement
Save 3-5 hours/week on communication overhead
Implementation Guide▌
Prerequisites
- ›Claude Desktop or compatible AI client
- ›Access to product documentation and roadmap tools (Jira, Notion, etc.)
- ›Understanding of product management frameworks (RICE, Jobs-to-be-Done, etc.)
- ›Stakeholder contact information and communication channels
Time Estimate
30-60 minutes to see productivity improvements
Installation Steps
- 1.Install product management skill
- 2.Start with user story generation for known feature
- 3.Progress to competitive analysis: research 2-3 competitors
- 4.Use for roadmap prioritization: apply RICE/ICE scoring
- 5.Draft stakeholder communications and refine based on feedback
- 6.Build template library for recurring PM tasks
- 7.Share effective prompts with product team
Common Pitfalls
- ⚠Not validating competitive research—verify facts before sharing
- ⚠Accepting user stories without involving engineering team
- ⚠Over-relying on frameworks without qualitative judgment
- ⚠Not customizing outputs to company culture and communication style
- ⚠Skipping stakeholder validation of generated requirements
Best Practices▌
✓ Do
- +Validate research and competitive analysis with real data
- +Collaborate with engineering when generating technical requirements
- +Customize frameworks and templates to your company context
- +Use skill for first drafts, refine with stakeholder input
- +Document successful prompt patterns for PM tasks
- +Combine AI efficiency with human judgment and intuition
✗ Don't
- −Don't publish competitive analysis without fact-checking
- −Don't finalize user stories without engineering review
- −Don't make prioritization decisions solely on AI scoring
- −Don't skip customer validation of generated requirements
- −Don't ignore company-specific context and culture
💡 Pro Tips
- ★Provide context: company goals, constraints, customer feedback
- ★Ask for alternatives: 'Show 3 ways to prioritize this roadmap'
- ★Request stakeholder-specific formatting: 'Executive summary vs. engineering spec'
- ★Use skill for 70% generation + 30% customization to company needs
When to Use This▌
✓ Use When
Use for user story writing, competitive research, roadmap prioritization, stakeholder communication, and PRD drafting. Best for reducing repetitive documentation and research work.
✗ Avoid When
Avoid for strategic product vision (requires deep customer empathy), pricing decisions (needs market and financial expertise), or when face-to-face customer discovery is more valuable than speed.
Learning Path▌
- 1Basic: user stories, feature specs, status updates
- 2Intermediate: competitive analysis, prioritization frameworks, PRDs
- 3Advanced: product strategy, go-to-market planning, OKR setting
- 4Expert: product vision, market positioning, business model innovation
Discussion
Product Hunt–style comments (not star reviews)- No comments yet — start the thread.
Ratings
4.8★★★★★73 reviews- ★★★★★Liam Chen· Dec 20, 2024
baoyu-url-to-markdown is among the better-maintained entries we tried; worth keeping pinned for repeat workflows.
- ★★★★★Advait Thompson· Dec 20, 2024
Keeps context tight: baoyu-url-to-markdown is the kind of skill you can hand to a new teammate without a long onboarding doc.
- ★★★★★Li Bhatia· Dec 12, 2024
baoyu-url-to-markdown fits our agent workflows well — practical, well scoped, and easy to wire into existing repos.
- ★★★★★Dhruvi Jain· Dec 8, 2024
baoyu-url-to-markdown fits our agent workflows well — practical, well scoped, and easy to wire into existing repos.
- ★★★★★Kiara Chawla· Dec 8, 2024
Useful defaults in baoyu-url-to-markdown — fewer surprises than typical one-off scripts, and it plays nicely with `npx skills` flows.
- ★★★★★Oshnikdeep· Nov 27, 2024
baoyu-url-to-markdown is among the better-maintained entries we tried; worth keeping pinned for repeat workflows.
- ★★★★★Evelyn Anderson· Nov 27, 2024
baoyu-url-to-markdown has been reliable in day-to-day use. Documentation quality is above average for community skills.
- ★★★★★Kiara Agarwal· Nov 11, 2024
baoyu-url-to-markdown fits our agent workflows well — practical, well scoped, and easy to wire into existing repos.
- ★★★★★Camila Wang· Nov 11, 2024
We added baoyu-url-to-markdown from the explainx registry; install was straightforward and the SKILL.md answered most questions upfront.
- ★★★★★Fatima Ghosh· Nov 3, 2024
baoyu-url-to-markdown is among the better-maintained entries we tried; worth keeping pinned for repeat workflows.
showing 1-10 of 73