Apify Scrapers
Overview
Scrape content from major social platforms using Apify actors. Each platform has optimized settings for cost and quality.
Quick Decision Tree
What do you want to scrape?
โ
โโโ Social Media Posts
โ โโโ Twitter/X โ references/twitter.md
โ โ โโโ Script: scripts/scrape_twitter_ai_trends.py
โ โ
โ โโโ Reddit โ references/reddit.md
โ โ โโโ Script: scripts/scrape_reddit_ai_tech.py
โ โ
โ โโโ LinkedIn โ references/linkedin.md
โ โ โโโ Script: scripts/scrape_linkedin_posts.py
โ โ
โ โโโ Instagram โ references/instagram.md
โ โ โโโ Script: scripts/scrape_instagram.py
โ โ โโโ Modes: profile, posts, hashtag, reels, comments
โ โ
โ โโโ Facebook โ references/facebook.md
โ โ โโโ Script: scripts/scrape_facebook.py
โ โ โโโ Modes: page, posts, reviews, groups, marketplace
โ โ
โ โโโ TikTok โ references/multi-platform.md
โ โ โโโ Script: scripts/scrape_multi_platform.py
โ โ
โ โโโ YouTube โ references/multi-platform.md
โ โโโ Script: scripts/scrape_multi_platform.py
โ
โโโ Business/Places
โ โโโ Google Maps businesses โ references/google-maps.md
โ โ โโโ Script: scripts/scrape_google_maps.py
โ โ โโโ Modes: search, place, reviews
โ โ
โ โโโ Contact info from websites โ references/contact-enrichment.md
โ โโโ Script: scripts/scrape_contact_info.py
โ โโโ Extract: emails, phone numbers, social profiles
โ
โโโ Auto-detect URL type โ references/url-detect.md
โ โโโ Script: scripts/scrape_content_by_url.py
โ
โโโ Trend Analysis (NEW)
โ โโโ Enriched trend analysis โ workflows/trend-analysis.md
โ โโโ Script: scripts/analyze_trends.py
โ โโโ Features: velocity scoring, lifecycle staging, opportunity scoring
โ
โโโ Workflows (multi-step)
โโโ Lead generation โ workflows/lead-generation.md
โโโ Influencer discovery โ workflows/influencer-discovery.md
โโโ Competitor analysis โ workflows/competitor-intel.md
โโโ Trend analysis โ workflows/trend-analysis.md
โโโ Competitor Ads Intelligence (NEW) โ workflows/competitor-ads.md
โโโ Script: scripts/scrape_competitor_ads.py
โโโ Platforms: Facebook Ads Library, Google Ads Transparency
โโโ Features: Spend estimates, creative analysis, benchmarking
Environment Setup
APIFY_TOKEN=apify_api_xxxxx
Get your API key: https://console.apify.com/account/integrations
Common Usage Patterns
Scrape Twitter Trends
python scripts/scrape_twitter_ai_trends.py --query "AI agents" --max-tweets 50
Scrape Reddit Discussions
python scripts/scrape_reddit_ai_tech.py --subreddits "MachineLearning,LocalLLaMA" --max-posts 100
Scrape LinkedIn Author
python scripts/scrape_linkedin_posts.py author "https://linkedin.com/in/username" --max-posts 30
Auto-detect and Scrape URL
python scripts/scrape_content_by_url.py "https://x.com/user/status/123456"
Scrape Instagram Profile
python scripts/scrape_instagram.py profile "https://instagram.com/username" --max-posts 20
Scrape Instagram Hashtag
python scripts/scrape_instagram.py hashtag "#artificialintelligence" --max-posts 50
Scrape Instagram Reels
python scripts/scrape_instagram.py reels "https://instagram.com/username" --max-reels 30
Scrape Facebook Page
python scripts/scrape_facebook.py page "https://facebook.com/pagename" --max-posts 50
Scrape Facebook Reviews
python scripts/scrape_facebook.py reviews "https://facebook.com/pagename" --max-reviews 100
Scrape Facebook Marketplace
python scripts/scrape_facebook.py marketplace "laptops in san francisco" --max-items 30
Scrape Google Maps Businesses
python scripts/scrape_google_maps.py search "AI consulting firms in New York" --max-results 50
Scrape Google Maps Reviews
python scripts/scrape_google_maps.py reviews "ChIJN1t_tDeuEmsRUsoyG83frY4" --max-reviews 100
Extract Contact Info from Websites
python scripts/scrape_contact_info.py "https://example.com" --depth 2
Bulk Contact Enrichment
python scripts/scrape_contact_info.py --urls-file companies.txt --output contacts.json
Scrape Competitor Ads (Single Competitor)
python scripts/scrape_competitor_ads.py "Nike" --platforms facebook google --country US --days 30
Compare Multiple Competitors' Ads
python scripts/scrape_competitor_ads.py "Nike" "Adidas" "Puma" --compare --output comparison.json
Discover Advertisers by Keyword
python scripts/scrape_competitor_ads.py --search "running shoes" --country US --max-ads 200
Filter Competitor Ads by Media Type
python scripts/scrape_competitor_ads.py "Netflix" "Disney+" --platforms facebook --media-types video --days 7
Analyze Trends (NEW)
python scripts/analyze_trends.py "artificial intelligence" --sources google instagram tiktok --days 90
python scripts/analyze_trends.py --category technology --discover --top 50
python scripts/analyze_trends.py "AI" "blockchain" "metaverse" --compare
python scripts/analyze_trends.py "sustainable fashion" --format html --output trend_report.html
Cost Estimates
| Platform |
Actor |
Cost per Item |
| Twitter |
kaitoeasyapi/twitter-x-data-tweet-scraper |
~$0.00025 |
| Reddit |
trudax/reddit-scraper |
~$0.001-0.005 |
| LinkedIn |
harvestapi/linkedin-post-search |
~$0.01-0.05 |
| YouTube |
streamers/youtube-scraper |
~$0.01-0.05 |
| TikTok |
clockworks/tiktok-scraper |
~$0.005 |
| Instagram (profile) |
apify/instagram-profile-scraper |
~$0.005 |
| Instagram (posts) |
apify/instagram-post-scraper |
~$0.002-0.005 |
| Instagram (hashtag) |
apify/instagram-hashtag-scraper |
~$0.002-0.005 |
| Instagram (reels) |
apify/instagram-reel-scraper |
~$0.005-0.01 |
| Instagram (comments) |
apify/instagram-comment-scraper |
~$0.001-0.003 |
| Facebook (page) |
apify/facebook-pages-scraper |
~$0.005-0.01 |
| Facebook (posts) |
apify/facebook-posts-scraper |
~$0.003-0.005 |
| Facebook (reviews) |
apify/facebook-reviews-scraper |
~$0.002-0.005 |
| Facebook (groups) |
apify/facebook-groups-scraper |
~$0.005-0.01 |
| Facebook (marketplace) |
apify/facebook-marketplace-scraper |
~$0.005-0.01 |
| Google Maps (search) |
compass/crawler-google-places |
~$0.01-0.02 |
| Google Maps (place) |
compass/google-maps-business-scraper |
~$0.01 |
| Google Maps (reviews) |
compass/google-maps-reviews-scraper |
~$0.003-0.005 |
| Contact Enrichment |
lukaskrivka/contact-info-scraper |
~$0.01-0.03 |
| Google Trends |
apify/google-trends-scraper |
~$0.01 |
| Trend Analysis (multi) |
Multiple actors |
~$0.50-1.50/run |
| Facebook Ads Library |
apify/facebook-ads-scraper |
~$0.75/1K ads |
| Facebook Ads (alt) |
curious_coder/facebook-ads-library-scraper |
~$0.50/1K ads |
| Google Ads Transparency |
lexis-solutions/google-ads-scraper |
~$1.00/1K ads |
| Google Ads (alt) |
xtech/google-ad-transparency-scraper |
~$0.80/1K ads |
Output Location
All scraped data saves to .tmp/ with timestamped filenames:
.tmp/twitter_ai_trends_YYYYMMDD.json
.tmp/reddit_ai_tech_YYYYMMDD.json
.tmp/linkedin_posts_YYYYMMDD_HHMMSS.json
Security Notes
Credential Handling
- Store
APIFY_TOKEN in .env file (never commit to git)
- Rotate API tokens periodically via Apify Console
- Never log or print API tokens in script output
- Use environment variables, not hardcoded values
Data Privacy
- Scraped data contains only publicly available content
- Social media posts may include PII (names, handles, profile info)
- Data is stored locally in
.tmp/ directory
- No data is retained by Apify after actor run completes
- Consider data minimization - only scrape what you need
Access Scopes
- Apify tokens have full account access (no granular scopes)
- Use separate Apify accounts for different projects if needed
- Monitor usage via Apify Console dashboard
Compliance Considerations
- Terms of Service: Respect each platform's ToS (Twitter, Reddit, LinkedIn)
- Rate Limiting: Actors have built-in rate limiting to avoid bans
- Robots.txt: Some actors may bypass robots.txt - use responsibly
- GDPR: Scraped PII may be subject to GDPR if EU residents
- Ethical Use: Only scrape public data; never bypass authentication
- Proxy Ethics: Residential proxies should be used ethically
Troubleshooting
Common Issues
Issue: Actor run failed
Symptoms: Script terminates with "Actor run failed" or timeout error
Cause: Invalid actor ID, insufficient proxy credits, or actor configuration issue
Solution:
- Verify the actor ID is correct in the script
- Check Apify Console for actor run logs
- Ensure proxy settings match actor requirements
- Try running with default proxy settings first
Issue: Empty results returned
Symptoms: Script completes but returns 0 items
Cause: Content blocked by platform, invalid query, or proxy being detected
Solution:
- Try a different proxy type (residential vs datacenter)
- Simplify the search query
- Reduce the number of results requested
- Check if the platform is blocking scraping attempts
Issue: Rate limited by platform
Symptoms: Script fails with 429 errors or "rate limited" messages
Cause: Too many requests in a short time period
Solution:
- Add delays between requests (actor settings)
- Reduce concurrent requests
- Use proxy rotation
- Wait and retry after a cooldown period
Issue: Invalid API token
Symptoms: Authentication error or "invalid token" message
Cause: Token expired, revoked, or incorrectly set
Solution:
- Regenerate API token in Apify Console
- Verify token is correctly set in
.env file
- Check for leading/trailing whitespace in token
- Ensure
APIFY_TOKEN environment variable is loaded
Issue: Proxy connection errors
Symptoms: Connection timeout or proxy errors
Cause: Proxy pool exhausted or geo-restriction issues
Solution:
- Switch proxy type (basic, residential, or datacenter)
- Verify proxy credit balance in Apify Console
- Try a different proxy country/region
- Disable proxy to test if that's the root cause
Resources
Platform References
- references/twitter.md - Twitter/X scraping details
- references/reddit.md - Reddit scraping with subreddit targeting
- references/linkedin.md - LinkedIn post scraping (author or search mode)
- references/instagram.md - Instagram profile, posts, hashtag, reels, and comments scraping
- references/facebook.md - Facebook page, posts, reviews, groups, and marketplace scraping
- references/multi-platform.md - TikTok and YouTube scraping
- references/url-detect.md - Auto-detect URL type and scrape
Business/Places References
- references/google-maps.md - Google Maps business search, place details, and reviews
- references/contact-enrichment.md - Extract emails, phone numbers, and social profiles from websites
Workflow References
- workflows/lead-generation.md - Multi-step lead generation workflow
- workflows/influencer-discovery.md - Find and analyze influencers across platforms
- workflows/competitor-intel.md - Competitive intelligence gathering workflow
- workflows/trend-analysis.md - Enriched multi-platform trend analysis with scoring
Integration Patterns
Scrape and Enrich
Skills: apify-scrapers โ parallel-research
Use case: Scrape social media posts, then enrich with deep research
Flow:
- Scrape Twitter/Reddit for mentions of a topic
- Extract company names or URLs from posts
- Use parallel-research to get detailed info on each company
Scrape and Summarize
Skills: apify-scrapers โ content-generation
Use case: Create newsletter content from social media trends
Flow:
- Scrape trending AI posts from Twitter
- Pass scraped data to content-generation summarize
- Generate a formatted newsletter section
Scrape and Archive
Skills: apify-scrapers โ google-workspace
Use case: Save scraped data to Google Drive for team access
Flow:
- Scrape LinkedIn posts from target accounts
- Format data as CSV or JSON
- Upload to Google Drive client folder via google-workspace
Trend Analysis + Content Strategy
Skills: apify-scrapers (trend-analysis) โ content-generation
Use case: Identify trending topics and create content strategy
Flow:
- Run trend analysis:
python scripts/analyze_trends.py "AI productivity" --sources all
- Review lifecycle stage and opportunity score
- Use content-generation to create content for high-opportunity trends
- Focus on emerging trends with high velocity scores
Competitive Trend Monitoring
Skills: apify-scrapers (trend-analysis) โ parallel-research
Use case: Monitor competitor visibility in trending topics
Flow:
- Analyze industry trends:
python scripts/analyze_trends.py --category "your-industry" --discover
- Compare your brand vs competitors in those trends
- Use parallel-research for deep dive on gaps
- Generate competitive intelligence report