Firecrawl Scraping
Overview
Scrape individual web pages and convert them to clean, LLM-ready markdown. Handles JavaScript rendering, anti-bot protection, and dynamic content.
Quick Decision Tree
What are you scraping?
โ
โโโ Single page (article, blog, docs)
โ โโโ references/single-page.md
โ โโโ Script: scripts/firecrawl_scrape.py
โ
โโโ Entire website (multiple pages, crawling)
โโโ references/website-crawler.md
โโโ (Use Apify Website Content Crawler for multi-page)
Environment Setup
FIRECRAWL_API_KEY=fc-your-api-key-here
Get your API key: https://firecrawl.dev/app/api-keys
Common Usage
Simple Scrape
python scripts/firecrawl_scrape.py "https://example.com/article"
With Options
python scripts/firecrawl_scrape.py "https://wsj.com/article" \
--proxy stealth \
--format markdown summary \
--timeout 60000
Proxy Modes
| Mode |
Use Case |
basic |
Standard sites, fastest |
stealth |
Anti-bot protection, premium content (WSJ, NYT) |
auto |
Let Firecrawl decide (recommended) |
Output Formats
markdown - Clean markdown content (default)
html - Raw HTML
summary - AI-generated summary
screenshot - Page screenshot
links - All links on page
Cost
~1 credit per page. Stealth proxy may use additional credits.
Security Notes
Credential Handling
- Store
FIRECRAWL_API_KEY in .env file (never commit to git)
- API keys can be regenerated at https://firecrawl.dev/app/api-keys
- Never log or print API keys in script output
- Use environment variables, not hardcoded values
Data Privacy
- Only scrapes publicly accessible web pages
- Scraped content is processed by Firecrawl servers temporarily
- Markdown output stored locally in
.tmp/ directory
- Screenshots (if requested) are stored locally
- No persistent data retention by Firecrawl after request
Access Scopes
- API key provides full access to scraping features
- No granular permission scopes available
- Monitor usage via Firecrawl dashboard
Compliance Considerations
- Robots.txt: Firecrawl respects robots.txt by default
- Public Content Only: Only scrape publicly accessible pages
- Terms of Service: Respect target site ToS
- Rate Limiting: Built-in rate limiting prevents abuse
- Stealth Proxy: Use stealth mode only when necessary (paywalled news, not auth bypass)
- GDPR: Scraped content may contain PII - handle accordingly
- Copyright: Respect intellectual property rights of scraped content
Troubleshooting
Common Issues
Issue: Credits exhausted
Symptoms: API returns "insufficient credits" or quota exceeded error
Cause: Account credits depleted
Solution:
- Check credit balance at https://firecrawl.dev/app
- Upgrade plan or purchase additional credits
- Reduce scraping frequency
- Use
basic proxy mode to conserve credits
Issue: Page not rendering correctly
Symptoms: Empty content or partial HTML returned
Cause: JavaScript-heavy page not fully loading
Solution:
- Enable JavaScript rendering with
--js-render flag
- Increase timeout with
--timeout 60000 (60 seconds)
- Try
stealth proxy mode for protected sites
- Wait for specific elements with
--wait-for selector
Issue: 403 Forbidden error
Symptoms: Script returns 403 status code
Cause: Site blocking automated access
Solution:
- Enable
stealth proxy mode
- Add delay between requests
- Try at different times (some sites rate limit by time)
- Check if site requires login (not supported)
Issue: Empty markdown output
Symptoms: Scrape succeeds but markdown is empty or malformed
Cause: Dynamic content loaded after page load, or unusual page structure
Solution:
- Increase wait time for JavaScript to execute
- Use
--wait-for to wait for specific content
- Try
html format to see raw content
- Check if content is in an iframe (not always supported)
Issue: Timeout errors
Symptoms: Request times out before completion
Cause: Slow page load or large page content
Solution:
- Increase timeout value (up to 120000ms)
- Use
basic proxy for faster response
- Target specific page sections if possible
- Check if site is experiencing issues
Resources
- references/single-page.md - Single page scraping details
- references/website-crawler.md - Multi-page website crawling
Integration Patterns
Scrape and Analyze
Skills: firecrawl-scraping โ parallel-research
Use case: Scrape competitor pages, then analyze content strategy
Flow:
- Scrape competitor website pages with Firecrawl
- Convert to clean markdown
- Use parallel-research to analyze positioning, messaging, features
Scrape and Document
Skills: firecrawl-scraping โ content-generation
Use case: Create summary documents from web research
Flow:
- Scrape multiple article pages on a topic
- Combine markdown content
- Generate summary document via content-generation
Scrape and Enrich CRM
Skills: firecrawl-scraping โ attio-crm
Use case: Enrich company records with website data
Flow:
- Scrape company website (about page, team page, product pages)
- Extract key information (funding, team size, products)
- Update company record in Attio CRM with enriched data