agent-browser
Fast, persistent browser automation with session continuity across sequential agent commands.
Works with
29
total installs
29
this week
27.7K
GitHub stars
0
upvotes
Install Skill
Run in your terminal
29
installs
29
this week
27.7K
stars
What it does
Supports three browser modes: headless Chromium, real Chrome with profile support, and cloud-hosted remote browsers with proxy configuration
Includes 15+ command categories covering navigation, page inspection, interactions, data extraction, cookie management, and JavaScript execution
Offers cloud session management, local server tunneling via Cloudflare, and parallel subagent execution thro
Installation Guide
How to use agent-browser on Cursor
AI-first code editor with Composer
Prerequisites
Before installing skills in Cursor, ensure your development environment meets these requirements:
- ›Cursor installed and configured on your machine
- ›Node.js 16+ with npm — verify with
node --version - ›Active project directory where you want to add
agent-browser
Run the install command
Execute the skills CLI command in your project's root directory to begin installation:
Fetches agent-browser from vercel-labs/agent-browser and configures it for Cursor.
Select Cursor when prompted
The CLI shows a list of agents. Use arrow keys and space to select Cursor:
Verify installation
Confirm successful installation by checking the skill directory location:
Restart Cursor to activate agent-browser. Access via /agent-browser in your agent's command palette.
Security Notice
We perform automated surface-level scans (Gen AI Scanner, Socket, Snyk) during installation. These checks detect common vulnerabilities but do not guarantee complete security. Always review skill source code and verify the publisher's reputation before production use.
Skills execute code in your environment. Always review source, verify the publisher, and test in isolation before production.
Documentation
Browser Automation with agent-browser
The CLI uses Chrome/Chromium via CDP directly. Install via npm i -g agent-browser, brew install agent-browser, or cargo install agent-browser. Run agent-browser install to download Chrome. Existing Chrome, Brave, Playwright, and Puppeteer installations are detected automatically. Run agent-browser upgrade to update to the latest version.
Core Workflow
Every browser automation follows this pattern:
- Navigate:
agent-browser open <url> - Snapshot:
agent-browser snapshot -i(get element refs like@e1,@e2) - Interact: Use refs to click, fill, select
- Re-snapshot: After navigation or DOM changes, get fresh refs
agent-browser open https://example.com/form
agent-browser snapshot -i
# Output: @e1 [input type="email"], @e2 [input type="password"], @e3 [button] "Submit"
agent-browser fill @e1 "[email protected]"
agent-browser fill @e2 "password123"
agent-browser click @e3
agent-browser wait 2000
agent-browser snapshot -i # Check result
Command Chaining
Commands can be chained with && in a single shell invocation. The browser persists between commands via a background daemon, so chaining is safe and more efficient than separate calls.
# Chain open + snapshot in one call (open already waits for page load)
agent-browser open https://example.com && agent-browser snapshot -i
# Chain multiple interactions
agent-browser fill @e1 "[email protected]" && agent-browser fill @e2 "password123" && agent-browser click @e3
# Navigate and capture
agent-browser open https://example.com && agent-browser screenshot
When to chain: Use && when you don't need to read the output of an intermediate command before proceeding (e.g., open + wait + screenshot). Run commands separately when you need to parse the output first (e.g., snapshot to discover refs, then interact using those refs).
Handling Authentication
When automating a site that requires login, choose the approach that fits:
Option 1: Import auth from the user's browser (fastest for one-off tasks)
# Connect to the user's running Chrome (they're already logged in)
agent-browser --auto-connect state save ./auth.json
# Use that auth state
agent-browser --state ./auth.json open https://app.example.com/dashboard
State files contain session tokens in plaintext -- add to .gitignore and delete when no longer needed. Set AGENT_BROWSER_ENCRYPTION_KEY for encryption at rest.
Option 2: Chrome profile reuse (zero setup)
# List available Chrome profiles
agent-browser profiles
# Reuse the user's existing Chrome login state
agent-browser --profile Default open https://gmail.com
Option 3: Persistent profile (for recurring tasks)
# First run: login manually or via automation
agent-browser --profile ~/.myapp open https://app.example.com/login
# ... fill credentials, submit ...
# All future runs: already authenticated
agent-browser --profile ~/.myapp open https://app.example.com/dashboard
Option 4: Session name (auto-save/restore cookies + localStorage)
agent-browser --session-name myapp open https://app.example.com/login
# ... login flow ...
agent-browser close # State auto-saved
# Next time: state auto-restored
agent-browser --session-name myapp open https://app.example.com/dashboard
Option 5: Auth vault (credentials stored encrypted, login by name)
echo "$PASSWORD" | agent-browser auth save myapp --url https://app.example.com/login --username user --password-stdin
agent-browser auth login myapp
auth login navigates with load and then waits for login form selectors to appear before filling/clicking, which is more reliable on delayed SPA login screens.
Option 6: State file (manual save/load)
# After logging in:
agent-browser state save ./auth.json
# In a future session:
agent-browser state load ./auth.json
agent-browser open https://app.example.com/dashboard
See references/authentication.md for OAuth, 2FA, cookie-based auth, and token refresh patterns.
Essential Commands
# Batch: ALWAYS use batch for 2+ sequential commands. Commands run in order.
agent-browser batch "open https://example.com" "snapshot -i"
agent-browser batch "open https://example.com" "screenshot"
agent-browser batch "click @e1" "wait 1000" "screenshot"
# Navigation
agent-browser open <url> # Navigate (aliases: goto, navigate)
agent-browser close # Close browser
agent-browser close --all # Close all active sessions
# Snapshot
agent-browser snapshot -i # Interactive elements with refs (recommended)
agent-browser snapshot -i --urls # Include href URLs for links
agent-browser snapshot -s "#selector" # Scope to CSS selector
# Interaction (use @refs from snapshot)
agent-browser click @e1 # Click element
agent-browser click @e1 --new-tab # Click and open in new tab
agent-browser fill @e2 "text" # Clear and type text
agent-browser type @e2 "text" # Type without clearing
agent-browser select @e1 "option" # Select dropdown option
agent-browser check @e1 # Check checkbox
agent-browser press Enter # Press key
agent-browser keyboard type "text" # Type at current focus (no selector)
agent-browser keyboard inserttext "text" # Insert without key events
agent-browser scroll down 500 # Scroll page
agent-browser scroll down 500 --selector "div.content" # Scroll within a specific container
# Get information
agent-browser get text @e1 # Get element text
agent-browser get url # Get current URL
agent-browser get title # Get page title
agent-browser get cdp-url # Get CDP WebSocket URL
# Wait
agent-browser wait @e1 # Wait for element
agent-browser wait 2000 # Wait milliseconds
agent-browser wait --url "**/page" # Wait for URL pattern
agent-browser wait --text "Welcome" # Wait for text to appear (substring match)
agent-browser wait --load networkidle # Wait for network idle (caution: see Pitfalls)
agent-browser wait --fn "!document.body.innerText.includes('Loading...')" # Wait for text to disappear
agent-browser wait "#spinner" --state hidden # Wait for element to disappear
# Downloads
agent-browser download @e1 ./file.pdf # Click element to trigger download
agent-browser wait --download ./output.zip # Wait for any download to complete
agent-browser --download-path ./downloads open <url> # Set default download directory
# Tab management
agent-browser tab list # List all open tabs
agent-browser tab new # Open a blank new tab
agent-browser tab new https://example.com # Open URL in a new tab
agent-browser tab 2 # Switch to tab by index (0-based)
agent-browser tab close # Close the current tab
agent-browser tab close 2 # Close tab by index
# Network
agent-browser network requests # Inspect tracked requests
agent-browser network requests --type xhr,fetch # Filter by resource type
agent-browser network requests --method POST # Filter by HTTP method
agent-browser network requests --status 2xx # Filter by status (200, 2xx, 400-499)
agent-browser network request <requestId> # View full request/response detail
agent-browser network route "**/api/*" --abort # Block matching requests
agent-browser network har start # Start HAR recording
agent-browser network har stop ./capture.har # Stop and save HAR file
# Viewport & Device Emulation
agent-browser set viewport 1920 1080 # Set viewport size (default: 1280x720)
agent-browser set viewport 1920 1080 2 # 2x retina (same CSS size, higher res screenshots)
agent-browser set device "iPhone 14" # Emulate device (viewport + user agent)
# Capture
agent-browser screenshot # Screenshot to temp dir
agent-browser screenshot --full # Full page screenshot
agent-browser screenshot --annotate # Annotated screenshot with numbered element labels
agent-browser screenshot --screenshot-dir ./shots # Save to custom directory
agent-browser screenshot --screenshot-format jpeg --screenshot-quality 80
agent-browser pdf output.pdf # Save as PDF
# Live preview / streaming
agent-browser stream enable # Start runtime WebSocket streaming on an auto-selected port
agent-browser stream enable --port 9223 # Bind a specific localhost port
agent-browser stream status # Inspect enabled state, port, connection, and screencasting
agent-browser stream disable # Stop runtime streaming and remove the .stream metadata file
# Clipboard
agent-browser clipboard read # Read text from clipboard
agent-browser clipboard write "Hello, World!" # Write text to clipboard
agent-browser clipboard copy # Copy current selection
agent-browser clipboard paste # Paste from clipboard
# Dialogs (alert, confirm, prompt, beforeunload)
# By default, alert and beforeunload dialogs are auto-accepted so they never block the agent.
# confirm and prompt dialogs still require explicit handling.
# Use --no-auto-dialog (or AGENT_BROWSER_NO_AUTO_DIALOG=1) to disable automatic handling.
agent-browser dialog accept # Accept dialog
agent-browser dialog accept "my input" # Accept prompt dialog with text
agent-browser dialog dismiss # Dismiss/cancel dialog
agent-browser dialog status # Check if a dialog is currently open
# Diff (compare page states)
agent-browser diff snapshot # Compare current vs last snapshot
agent-browser diff snapshot --baseline before.txt # Compare current vs saved file
agent-browser diff screenshot --baseline before.png List & Monetize Your Skill
Submit your Claude Code skill and start earning
Use Cases
User Story & Requirements Generation
Create detailed user stories, acceptance criteria, and feature specs
Example
Generate user stories for 'password reset feature' with acceptance criteria, edge cases, and test scenarios
Reduce spec writing time by 50%, ensure comprehensive coverage
Competitive Analysis
Research competitors, compare features, identify gaps
Example
Analyze 5 competitor products, create feature comparison matrix, suggest differentiation opportunities
Complete competitive research in 2 hours instead of 2 days
Roadmap Prioritization
Evaluate features using frameworks (RICE, ICE, Kano) and create prioritized backlogs
Example
Score 20 feature ideas using RICE framework, generate prioritized roadmap with rationale
Make data-driven prioritization decisions faster
Stakeholder Communication
Draft PRDs, status updates, and stakeholder presentations
Example
Create executive summary of Q3 roadmap, monthly progress report, feature launch announcement
Save 3-5 hours/week on communication overhead
Implementation Guide
Prerequisites
- ›Claude Desktop or compatible AI client
- ›Access to product documentation and roadmap tools (Jira, Notion, etc.)
- ›Understanding of product management frameworks (RICE, Jobs-to-be-Done, etc.)
- ›Stakeholder contact information and communication channels
Time Estimate
30-60 minutes to see productivity improvements
Steps
- 1Install product management skill
- 2Start with user story generation for known feature
- 3Progress to competitive analysis: research 2-3 competitors
- 4Use for roadmap prioritization: apply RICE/ICE scoring
- 5Draft stakeholder communications and refine based on feedback
- 6Build template library for recurring PM tasks
- 7Share effective prompts with product team
Common Pitfalls
- ⚠Not validating competitive research—verify facts before sharing
- ⚠Accepting user stories without involving engineering team
- ⚠Over-relying on frameworks without qualitative judgment
- ⚠Not customizing outputs to company culture and communication style
- ⚠Skipping stakeholder validation of generated requirements
Best Practices
✓ Do
- +Validate research and competitive analysis with real data
- +Collaborate with engineering when generating technical requirements
- +Customize frameworks and templates to your company context
- +Use skill for first drafts, refine with stakeholder input
- +Document successful prompt patterns for PM tasks
- +Combine AI efficiency with human judgment and intuition
✗ Don't
- −Don't publish competitive analysis without fact-checking
- −Don't finalize user stories without engineering review
- −Don't make prioritization decisions solely on AI scoring
- −Don't skip customer validation of generated requirements
- −Don't ignore company-specific context and culture
💡 Pro Tips
- ★Provide context: company goals, constraints, customer feedback
- ★Ask for alternatives: 'Show 3 ways to prioritize this roadmap'
- ★Request stakeholder-specific formatting: 'Executive summary vs. engineering spec'
- ★Use skill for 70% generation + 30% customization to company needs
When to Use This
✓ Use when
Use for user story writing, competitive research, roadmap prioritization, stakeholder communication, and PRD drafting. Best for reducing repetitive documentation and research work.
✗ Avoid when
Avoid for strategic product vision (requires deep customer empathy), pricing decisions (needs market and financial expertise), or when face-to-face customer discovery is more valuable than speed.
Learning Path
- 1Basic: user stories, feature specs, status updates
- 2Intermediate: competitive analysis, prioritization frameworks, PRDs
- 3Advanced: product strategy, go-to-market planning, OKR setting
- 4Expert: product vision, market positioning, business model innovation
Related Skills
cmux-browser
44manaflow-ai/cmux
grill-me
384mattpocock/skills
premortem
197parcadei/continuous-claude-v3
deslop
118cursor/plugins
framer-motion
98pproenca/dot-skills
write-a-prd
91mattpocock/skills
Reviews
- MMin Martin★★★★★Dec 28, 2024
agent-browser has been reliable in day-to-day use. Documentation quality is above average for community skills.
- WWilliam Anderson★★★★★Dec 28, 2024
agent-browser is among the better-maintained entries we tried; worth keeping pinned for repeat workflows.
- MMin Torres★★★★★Dec 24, 2024
Solid pick for teams standardizing on skills: agent-browser is focused, and the summary matches what you get after install.
- SSophia Taylor★★★★★Dec 12, 2024
Useful defaults in agent-browser — fewer surprises than typical one-off scripts, and it plays nicely with `npx skills` flows.
- AAisha Farah★★★★★Dec 12, 2024
Registry listing for agent-browser matched our evaluation — installs cleanly and behaves as described in the markdown.
- SSofia White★★★★★Dec 4, 2024
Useful defaults in agent-browser — fewer surprises than typical one-off scripts, and it plays nicely with `npx skills` flows.
- SSofia Robinson★★★★★Nov 23, 2024
I recommend agent-browser for anyone iterating fast on agent tooling; clear intent and a small, reviewable surface area.
- AAlexander Kim★★★★★Nov 19, 2024
agent-browser reduced setup friction for our internal harness; good balance of opinion and flexibility.
- AAisha Nasser★★★★★Nov 19, 2024
Keeps context tight: agent-browser is the kind of skill you can hand to a new teammate without a long onboarding doc.
- KKiara Martin★★★★★Nov 15, 2024
We added agent-browser from the explainx registry; install was straightforward and the SKILL.md answered most questions upfront.
showing 1-10 of 69
Discussion
Comments — not star reviews- No comments yet — start the thread.