browser-automationsearch-web

Puppeteer Vision Web Scraper

by djannot

Automate web scraping and scrape any website with Puppeteer Vision. Handle CAPTCHAs, cookie banners, & paywalls to extra

Automates web scraping by intelligently handling cookie banners, CAPTCHAs, and paywalls to extract clean markdown content from websites

github stars

47

AI-powered interaction with blocking elementsRun instantly via npxReal-time browser viewing option

best for

  • / Content researchers scraping protected websites
  • / Data analysts extracting articles from news sites
  • / Developers building content aggregation systems
  • / Anyone needing clean text from complex modern websites

capabilities

  • / Scrape webpages with stealth mode Puppeteer
  • / Handle cookie banners and consent prompts automatically
  • / Bypass CAPTCHAs and paywalls with AI interaction
  • / Extract main content using Mozilla Readability
  • / Convert HTML to well-formatted markdown
  • / Process code blocks and tables with special formatting

what it does

Automatically scrapes web content by using AI to handle cookie banners, CAPTCHAs, paywalls, and other blocking elements, then converts the extracted content to clean markdown.

about

Puppeteer Vision Web Scraper is a community-built MCP server published by djannot that provides AI assistants with tools and capabilities via the Model Context Protocol. Automate web scraping and scrape any website with Puppeteer Vision. Handle CAPTCHAs, cookie banners, & paywalls to extra It is categorized under browser automation, search web.

how to install

You can install Puppeteer Vision Web Scraper in your AI client of choice. Use the install panel on this page to get one-click setup for Cursor, Claude Desktop, VS Code, and other MCP-compatible clients. This server runs locally on your machine via the stdio transport.

license

MIT

Puppeteer Vision Web Scraper is released under the MIT license. This is a permissive open-source license, meaning you can freely use, modify, and distribute the software.

readme

README content is unavailable from source data for this server.

Open GitHub repository