web-scraping

mindrally/skills · updated Apr 8, 2026

$npx skills add https://github.com/mindrally/skills --skill web-scraping
0 commentsdiscussion
summary

Web scraping and data extraction using Python tools for static, dynamic, and large-scale content.

  • Supports static sites via requests and BeautifulSoup, dynamic content via Selenium and Playwright, and large-scale extraction via Scrapy and firecrawl
  • Includes specialized tools for AI-powered extraction (jina), structured queries (agentQL), and complex automation workflows (multion)
  • Built-in guidance on rate limiting, robots.txt compliance, error handling, session management, and pagina
skill.md

Web Scraping

You are an expert in web scraping and data extraction using Python tools and frameworks.

Core Tools

Static Sites

  • Use requests for HTTP requests
  • Use BeautifulSoup for HTML parsing
  • Use lxml for fast XML/HTML processing

Dynamic Content

  • Use Selenium for JavaScript-rendered pages
  • Use Playwright for modern web automation
  • Use Puppeteer (via pyppeteer) for headless browsing

Large-Scale Extraction

  • Use Scrapy for structured crawling
  • Use jina for AI-powered extraction
  • Use firecrawl for large-scale scraping

Complex Workflows

  • Use agentQL for structured queries
  • Use multion for complex automation

Best Practices

  • Implement rate limiting and delays
  • Respect robots.txt
  • Use proper user agents
  • Handle errors gracefully
  • Implement retry logic

Error Handling

  • Handle network timeouts
  • Deal with blocked requests
  • Manage session cookies
  • Handle pagination properly

Ethical Considerations

  • Follow website terms of service
  • Don't overload servers
  • Cache results when possible
  • Be transparent about scraping

Data Processing

  • Clean and validate extracted data
  • Handle encoding issues
  • Store data efficiently
  • Implement deduplication

Discussion

Product Hunt–style comments (not star reviews)
  • No comments yet — start the thread.
general reviews

Ratings

4.647 reviews
  • Dhruvi Jain· Dec 28, 2024

    Useful defaults in web-scraping — fewer surprises than typical one-off scripts, and it plays nicely with `npx skills` flows.

  • Liam Smith· Dec 20, 2024

    web-scraping is among the better-maintained entries we tried; worth keeping pinned for repeat workflows.

  • Kabir Choi· Dec 16, 2024

    Keeps context tight: web-scraping is the kind of skill you can hand to a new teammate without a long onboarding doc.

  • Neel Haddad· Dec 4, 2024

    I recommend web-scraping for anyone iterating fast on agent tooling; clear intent and a small, reviewable surface area.

  • Soo Martin· Nov 23, 2024

    Keeps context tight: web-scraping is the kind of skill you can hand to a new teammate without a long onboarding doc.

  • Oshnikdeep· Nov 19, 2024

    web-scraping is among the better-maintained entries we tried; worth keeping pinned for repeat workflows.

  • Neel Khan· Nov 11, 2024

    Useful defaults in web-scraping — fewer surprises than typical one-off scripts, and it plays nicely with `npx skills` flows.

  • Layla Wang· Nov 7, 2024

    I recommend web-scraping for anyone iterating fast on agent tooling; clear intent and a small, reviewable surface area.

  • Yusuf Desai· Oct 26, 2024

    Useful defaults in web-scraping — fewer surprises than typical one-off scripts, and it plays nicely with `npx skills` flows.

  • Liam Johnson· Oct 14, 2024

    web-scraping is among the better-maintained entries we tried; worth keeping pinned for repeat workflows.

showing 1-10 of 47

1 / 5