detecting-s3-data-exfiltration-attempts▌
mukul975/Anthropic-Cybersecurity-Skills · updated May 25, 2026
MDX-style export adds YAML metadata + attribution linking explainx.ai and this canonical listing URL.
Detecting data exfiltration attempts from AWS S3 buckets by analyzing CloudTrail S3 data events, VPC Flow Logs, GuardDuty findings, Amazon Macie alerts, and S3 access patterns to identify unauthorized bulk downloads and cross-account data transfers.
| name | detecting-s3-data-exfiltration-attempts |
| description | 'Detecting data exfiltration attempts from AWS S3 buckets by analyzing CloudTrail S3 data events, VPC Flow Logs, GuardDuty findings, Amazon Macie alerts, and S3 access patterns to identify unauthorized bulk downloads and cross-account data transfers. ' |
| domain | cybersecurity |
| subdomain | cloud-security |
| tags | - cloud-security - aws - s3 - data-exfiltration - guardduty - macie - threat-detection |
| version | '1.0' |
| author | mahipal |
| license | Apache-2.0 |
| nist_csf | - PR.IR-01 - ID.AM-08 - GV.SC-06 - DE.CM-01 |
Detecting S3 Data Exfiltration Attempts
When to Use
- When GuardDuty detects anomalous S3 access patterns such as bulk downloads from unusual IPs
- When investigating suspected data breach involving S3-stored sensitive data
- When building detection rules for S3 data loss prevention monitoring
- When responding to Macie alerts about sensitive data being accessed or moved
- When compliance requires monitoring and logging of all access to classified data stores
Do not use for preventing data exfiltration (use S3 bucket policies, VPC endpoints, and SCPs), for data classification (use Amazon Macie discovery jobs), or for network-level exfiltration detection (use VPC Flow Logs with network analysis tools).
Prerequisites
- CloudTrail configured with S3 data event logging (
GetObject,PutObject,CopyObject) - GuardDuty enabled with S3 Protection feature activated
- Amazon Macie enabled for sensitive data discovery in target buckets
- CloudWatch Logs or Athena for querying CloudTrail logs at scale
- VPC endpoint policies configured for S3 access monitoring
Workflow
Step 1: Enable S3 Data Event Logging in CloudTrail
Configure CloudTrail to capture all S3 object-level operations for forensic analysis.
# Enable S3 data events on an existing trail
aws cloudtrail put-event-selectors \
--trail-name management-trail \
--event-selectors '[{
"ReadWriteType": "All",
"IncludeManagementEvents": true,
"DataResources": [{
"Type": "AWS::S3::Object",
"Values": ["arn:aws:s3:::sensitive-data-bucket/", "arn:aws:s3:::customer-records/"]
}]
}]'
# Verify data event configuration
aws cloudtrail get-event-selectors --trail-name management-trail \
--query 'EventSelectors[*].DataResources' --output json
# Enable GuardDuty S3 Protection
aws guardduty update-detector \
--detector-id $(aws guardduty list-detectors --query 'DetectorIds[0]' --output text) \
--data-sources '{"S3Logs":{"Enable":true}}'
Step 2: Query CloudTrail for Anomalous S3 Access Patterns
Analyze CloudTrail logs for bulk download activity, unusual access times, and unfamiliar source IPs.
# Athena query: Top S3 downloaders by volume in last 24 hours
cat << 'EOF'
SELECT
useridentity.arn as principal,
sourceipaddress,
COUNT(*) as request_count,
SUM(CAST(json_extract_scalar(requestparameters, '$.bytesTransferredOut') AS bigint)) as bytes_downloaded
FROM cloudtrail_logs
WHERE eventname = 'GetObject'
AND eventsource = 's3.amazonaws.com'
AND eventtime > date_add('hour', -24, now())
GROUP BY useridentity.arn, sourceipaddress
ORDER BY request_count DESC
LIMIT 50
EOF
# CloudWatch Logs Insights: S3 GetObject requests from unusual IPs
aws logs start-query \
--log-group-name cloudtrail-logs \
--start-time $(date -d "24 hours ago" +%s) \
--end-time $(date +%s) \
--query-string '
fields @timestamp, userIdentity.arn, sourceIPAddress, requestParameters.bucketName, requestParameters.key
| filter eventName = "GetObject"
| stats count() as requestCount by sourceIPAddress, userIdentity.arn
| sort requestCount desc
| limit 25
'
# Detect cross-account copies (potential exfiltration)
aws logs start-query \
--log-group-name cloudtrail-logs \
--start-time $(date -d "7 days ago" +%s) \
--end-time $(date +%s) \
--query-string '
fields @timestamp, userIdentity.arn, sourceIPAddress, requestParameters.bucketName
| filter eventName in ["CopyObject", "ReplicateObject", "UploadPart"]
| filter userIdentity.accountId != "OUR_ACCOUNT_ID"
| sort @timestamp desc
| limit 100
'
Step 3: Review GuardDuty S3 Findings
Check for GuardDuty S3-specific finding types that indicate exfiltration activity.
# List active S3 exfiltration-related findings
aws guardduty list-findings \
--detector-id $(aws guardduty list-detectors --query 'DetectorIds[0]' --output text) \
--finding-criteria '{
"Criterion": {
"type": {
"Eq": [
"Exfiltration:S3/MaliciousIPCaller",
"Exfiltration:S3/ObjectRead.Unusual",
"Discovery:S3/MaliciousIPCaller.Custom",
"Discovery:S3/BucketEnumeration.Unusual",
"UnauthorizedAccess:S3/MaliciousIPCaller.Custom",
"UnauthorizedAccess:S3/TorIPCaller",
"Impact:S3/AnomalousBehavior.Delete"
]
}
}
}' --output json
# Get detailed finding information
aws guardduty get-findings \
--detector-id $(aws guardduty list-detectors --query 'DetectorIds[0]' --output text) \
--finding-ids FINDING_IDS \
--query 'Findings[*].{Type:Type,Severity:Severity,Resource:Resource.S3BucketDetails[0].Name,Action:Service.Action}' \
--output table
Step 4: Analyze Macie Findings for Sensitive Data Access
Review Macie findings to correlate data sensitivity with access anomalies.
# List Macie findings for sensitive data exposure
aws macie2 list-findings \
--finding-criteria '{
"criterion": {
"category": {"eq": ["CLASSIFICATION"]},
"severity.description": {"eq": ["High", "Critical"]}
}
}' \
--sort-criteria '{"attributeName": "updatedAt", "orderBy": "DESC"}' \
--max-results 25
# Get detailed finding with data classification
aws macie2 get-findings \
--finding-ids FINDING_IDS \
--query 'findings[*].{Type:type,Severity:severity.description,Bucket:resourcesAffected.s3Bucket.name,SensitiveDataTypes:classificationDetails.result.sensitiveData[*].category}' \
--output table
# Run a sensitive data discovery job on target bucket
aws macie2 create-classification-job \
--job-type ONE_TIME \
--name "exfiltration-investigation" \
--s3-job-definition '{
"bucketDefinitions": [{
"accountId": "ACCOUNT_ID",
"buckets": ["sensitive-data-bucket"]
}]
}'
Step 5: Build Automated Detection Rules
Create CloudWatch alarms and EventBridge rules for real-time exfiltration detection.
# CloudWatch metric filter for high-volume S3 downloads
aws logs put-metric-filter \
--log-group-name cloudtrail-logs \
--filter-name s3-bulk-download \
--filter-pattern '{$.eventName = "GetObject" && $.eventSource = "s3.amazonaws.com"}' \
--metric-transformations '[{
"metricName": "S3GetObjectCount",
"metricNamespace": "SecurityMetrics",
"metricValue": "1",
"defaultValue": 0
}]'
# Alarm for anomalous download volume (>1000 objects/hour)
aws cloudwatch put-metric-alarm \
--alarm-name s3-exfiltration-alert \
--metric-name S3GetObjectCount \
--namespace SecurityMetrics \
--statistic Sum \
--period 3600 \
--threshold 1000 \
--comparison-operator GreaterThanThreshold \
--evaluation-periods 1 \
--alarm-actions arn:aws:sns:us-east-1:ACCOUNT:security-alerts
# EventBridge rule for GuardDuty S3 findings
aws events put-rule \
--name guardduty-s3-exfiltration \
--event-pattern '{
"source": ["aws.guardduty"],
"detail-type": ["GuardDuty Finding"],
"detail": {
"type": [{"prefix": "Exfiltration:S3/"}]
}
}'
Step 6: Implement Preventive Controls
Deploy bucket policies and VPC endpoint policies to restrict data movement paths.
# VPC endpoint policy restricting S3 access to specific buckets
aws ec2 modify-vpc-endpoint \
--vpc-endpoint-id vpce-ENDPOINT_ID \
--policy-document '{
"Statement": [{
"Sid": "RestrictToOwnBuckets",
"Effect": "Allow",
"Principal": "*",
"Action": ["s3:GetObject", "s3:PutObject"],
"Resource": ["arn:aws:s3:::approved-bucket-1/*", "arn:aws:s3:::approved-bucket-2/*"]
}]
}'
# Bucket policy denying access from outside the VPC
aws s3api put-bucket-policy --bucket sensitive-data-bucket --policy '{
"Version": "2012-10-17",
"Statement": [{
"Sid": "DenyNonVpcAccess",
"Effect": "Deny",
"Principal": "*",
"Action": "s3:GetObject",
"Resource": "arn:aws:s3:::sensitive-data-bucket/*",
"Condition": {
"StringNotEquals": {
"aws:sourceVpce": "vpce-ENDPOINT_ID"
}
}
}]
}'
Key Concepts
| Term | Definition |
|---|---|
| S3 Data Events | CloudTrail object-level logging that captures GetObject, PutObject, DeleteObject, and CopyObject API calls with request details |
| GuardDuty S3 Protection | Threat detection feature analyzing CloudTrail S3 data events to identify anomalous access patterns and exfiltration attempts |
| Amazon Macie | Data security service that discovers and classifies sensitive data in S3 and generates findings for data exposure risks |
| VPC Endpoint Policy | Access control policy on an S3 VPC endpoint that restricts which buckets and actions can be accessed through the endpoint |
| Data Exfiltration | Unauthorized transfer of data from an organization's S3 storage to an external location controlled by an attacker |
| Anomalous Behavior Detection | Machine learning-based identification of S3 access patterns that deviate from established baselines for a principal |
Tools & Systems
- AWS CloudTrail: Audit logging of S3 object-level operations for forensic analysis and anomaly detection
- Amazon GuardDuty: ML-based threat detection with S3-specific finding types for exfiltration and unauthorized access
- Amazon Macie: Sensitive data discovery and classification for correlating access anomalies with data sensitivity
- Amazon Athena: SQL query engine for analyzing CloudTrail logs at scale to identify bulk download patterns
- CloudWatch Logs Insights: Real-time log analysis for building detection queries against CloudTrail data
Common Scenarios
Scenario: Compromised IAM Credentials Used for Bulk S3 Data Download
Context: GuardDuty reports an Exfiltration:S3/ObjectRead.Unusual finding indicating that a developer's access key is downloading thousands of objects from a sensitive data bucket at 3 AM from an IP address in a foreign country.
Approach:
- Immediately deactivate the compromised access key
- Query CloudTrail for all S3 actions by the compromised principal in the last 72 hours
- Identify which buckets and objects were accessed using Athena queries
- Cross-reference accessed objects with Macie classifications to assess data sensitivity
- Check for CopyObject calls to external accounts (cross-account exfiltration)
- Review how the credentials were compromised (TruffleHog scan, phishing investigation)
- Implement VPC endpoint policies to restrict future S3 access to approved network paths
Pitfalls: CloudTrail S3 data events can generate massive log volume. Use Athena with partitioned tables rather than CloudWatch Logs Insights for queries spanning more than 24 hours. GuardDuty baseline learning requires 7-14 days, so new accounts may generate false positives for normal access patterns.
Output Format
S3 Data Exfiltration Investigation Report
============================================
Account: 123456789012
Detection Source: GuardDuty Exfiltration:S3/ObjectRead.Unusual
Investigation Date: 2026-02-23
INCIDENT TIMELINE:
2026-02-23 02:47 UTC - First anomalous GetObject from 185.x.x.x
2026-02-23 02:47-04:12 UTC - 12,847 GetObject requests
2026-02-23 04:15 UTC - GuardDuty finding generated
2026-02-23 04:20 UTC - PagerDuty alert received by SOC
2026-02-23 04:25 UTC - Access key deactivated
COMPROMISED PRINCIPAL:
ARN: arn:aws:iam::123456789012:user/developer-jane
Access Key: AKIA...WXYZ
Source IP: 185.x.x.x (Tor exit node)
DATA IMPACT ASSESSMENT:
Buckets accessed: 3
Objects downloaded: 12,847
Total data volume: 4.7 GB
Sensitive data types: PII (SSN, email), Financial (credit card)
Macie severity: CRITICAL
CONTAINMENT ACTIONS:
[x] Access key deactivated
[x] User password reset and MFA re-enrolled
[x] VPC endpoint policy applied to sensitive buckets
[x] Bucket policy restricting to VPC-only access
[x] TruffleHog scan initiated on developer repositories
How to use detecting-s3-data-exfiltration-attempts on Cursor
AI-first code editor with Composer
Prerequisites
Before installing skills in Cursor, ensure your development environment meets these requirements:
- ›Cursor installed and configured on your development machine
- ›Node.js version 16.0+ with npm package manager (verify with
node --version) - ›Active project directory or workspace where you want to add detecting-s3-data-exfiltration-attempts
Execute installation command
Execute the skills CLI command in your project's root directory to begin installation:
The skills CLI fetches detecting-s3-data-exfiltration-attempts from GitHub repository mukul975/Anthropic-Cybersecurity-Skills and configures it for Cursor.
Select Cursor when prompted
The CLI will show a list of available agents. Use arrow keys to navigate and space to select Cursor:
Verify installation
Confirm successful installation by checking the skill directory location:
Reload or restart Cursor to activate detecting-s3-data-exfiltration-attempts. Access the skill through slash commands (e.g., /detecting-s3-data-exfiltration-attempts) or your agent's skill management interface.
Security & Verification Notice
We perform automated surface-level scans (Gen AI Scanner, Socket, Snyk) during installation. These checks detect common vulnerabilities but do not guarantee complete security. Always review skill source code and verify the publisher's reputation before production use.
Skills execute code in your development environment. Always verify the publisher's identity, review recent commits, and test in isolated environments before production deployment.
List & Monetize Your Skill
Submit your Claude Code skill and start earning
Use Cases▌
Exploratory Data Analysis
Quickly understand datasets, identify patterns, and generate insights
Example
Analyze CSV with 100K rows, identify outliers, visualize correlations, suggest hypotheses
Reduce EDA time from hours to minutes, uncover insights faster
Data Cleaning & Transformation
Write scripts to clean messy data, handle missing values, normalize formats
Example
Generate Python/SQL to fix date formats, impute missing values, remove duplicates
Automate 80% of data preprocessing work
Statistical Analysis
Perform hypothesis testing, regression, and statistical modeling
Example
Run A/B test analysis, calculate confidence intervals, interpret p-values
Get statistically sound analysis without PhD in statistics
Data Visualization
Create charts, dashboards, and visual reports
Example
Generate matplotlib/seaborn code for time series plots, distribution charts, heatmaps
Build presentation-ready visualizations 3x faster
Implementation Guide▌
Prerequisites
- ›Claude Desktop or compatible AI client
- ›Python environment (pandas, numpy, matplotlib) or SQL database access
- ›Basic understanding of data analysis concepts
- ›Sample datasets for testing skill capabilities
Time Estimate
20-40 minutes to set up and run first analysis
Installation Steps
- 1.Install data analysis skill using provided command
- 2.Prepare a sample dataset (CSV, JSON, or database connection)
- 3.Start with descriptive statistics: 'Summarize this dataset'
- 4.Progress to visualization: 'Create a scatter plot of X vs Y'
- 5.Advanced analysis: 'Run linear regression and interpret results'
- 6.Validate outputs: check calculations, verify visualizations make sense
- 7.Document analysis workflow for reproducibility
Common Pitfalls
- ⚠Not validating statistical assumptions before applying tests
- ⚠Accepting visualizations without checking data accuracy
- ⚠Overlooking data quality issues (missing values, outliers)
- ⚠Misinterpreting correlation as causation
- ⚠Using wrong statistical test for data distribution
- ⚠Not considering sample size and statistical power
Best Practices▌
✓ Do
- +Always validate data quality before analysis
- +Check statistical assumptions (normality, independence, etc.)
- +Visualize data before running statistical tests
- +Document analysis steps for reproducibility
- +Cross-validate findings with domain experts
- +Use skill for initial exploration, then dive deeper manually
- +Save generated code for reuse on similar datasets
✗ Don't
- −Don't trust analysis without verifying data quality
- −Don't apply statistical tests without checking assumptions
- −Don't make business decisions solely on AI-generated analysis
- −Don't ignore outliers without investigating cause
- −Don't skip data validation and sanity checks
- −Don't use for mission-critical financial or medical analysis without expert review
💡 Pro Tips
- ★Describe data context: 'This is user behavior data from e-commerce site'
- ★Ask for interpretation: 'What does this correlation mean for business?'
- ★Request multiple approaches: 'Show 3 ways to handle missing data'
- ★Combine AI analysis with domain expertise for best insights
- ★Use for rapid prototyping, then refine analysis manually
When to Use This▌
✓ Use When
Use for exploratory data analysis, data cleaning, statistical testing, visualization prototyping, and learning new analysis techniques. Best for initial exploration and rapid insights.
✗ Avoid When
Avoid for mission-critical financial analysis, medical research requiring regulatory compliance, production ML models, or when deep statistical expertise is required for nuanced interpretation.
Learning Path▌
- 1Basic: descriptive statistics, data cleaning, simple visualizations
- 2Intermediate: hypothesis testing, regression, correlation analysis
- 3Advanced: time series analysis, clustering, predictive modeling
- 4Expert: causal inference, experimental design, advanced statistical methods
Discussion
Product Hunt–style comments (not star reviews)- No comments yet — start the thread.
Ratings
4.6★★★★★38 reviews- ★★★★★Mia Khanna· Dec 28, 2024
Registry listing for detecting-s3-data-exfiltration-attempts matched our evaluation — installs cleanly and behaves as described in the markdown.
- ★★★★★Mia Malhotra· Dec 16, 2024
I recommend detecting-s3-data-exfiltration-attempts for anyone iterating fast on agent tooling; clear intent and a small, reviewable surface area.
- ★★★★★Chaitanya Patil· Dec 4, 2024
detecting-s3-data-exfiltration-attempts fits our agent workflows well — practical, well scoped, and easy to wire into existing repos.
- ★★★★★Piyush G· Nov 23, 2024
detecting-s3-data-exfiltration-attempts is among the better-maintained entries we tried; worth keeping pinned for repeat workflows.
- ★★★★★Rahul Santra· Nov 19, 2024
detecting-s3-data-exfiltration-attempts has been reliable in day-to-day use. Documentation quality is above average for community skills.
- ★★★★★Chinedu Martinez· Nov 19, 2024
Useful defaults in detecting-s3-data-exfiltration-attempts — fewer surprises than typical one-off scripts, and it plays nicely with `npx skills` flows.
- ★★★★★Mia Johnson· Nov 7, 2024
detecting-s3-data-exfiltration-attempts reduced setup friction for our internal harness; good balance of opinion and flexibility.
- ★★★★★Zara Dixit· Oct 26, 2024
Registry listing for detecting-s3-data-exfiltration-attempts matched our evaluation — installs cleanly and behaves as described in the markdown.
- ★★★★★Shikha Mishra· Oct 14, 2024
Keeps context tight: detecting-s3-data-exfiltration-attempts is the kind of skill you can hand to a new teammate without a long onboarding doc.
- ★★★★★Pratham Ware· Oct 10, 2024
Solid pick for teams standardizing on skills: detecting-s3-data-exfiltration-attempts is focused, and the summary matches what you get after install.
showing 1-10 of 38