detecting-s3-data-exfiltration-attempts

mukul975/Anthropic-Cybersecurity-Skills · updated May 25, 2026

MDX-style export adds YAML metadata + attribution linking explainx.ai and this canonical listing URL.

$npx skills install mukul975/Anthropic-Cybersecurity-Skills/detecting-s3-data-exfiltration-attempts
0 commentsdiscussion
summary

Detecting data exfiltration attempts from AWS S3 buckets by analyzing CloudTrail S3 data events, VPC Flow Logs, GuardDuty findings, Amazon Macie alerts, and S3 access patterns to identify unauthorized bulk downloads and cross-account data transfers.

skill.md
name
detecting-s3-data-exfiltration-attempts
description
'Detecting data exfiltration attempts from AWS S3 buckets by analyzing CloudTrail S3 data events, VPC Flow Logs, GuardDuty findings, Amazon Macie alerts, and S3 access patterns to identify unauthorized bulk downloads and cross-account data transfers. '
domain
cybersecurity
subdomain
cloud-security
tags
- cloud-security - aws - s3 - data-exfiltration - guardduty - macie - threat-detection
version
'1.0'
author
mahipal
license
Apache-2.0
nist_csf
- PR.IR-01 - ID.AM-08 - GV.SC-06 - DE.CM-01

Detecting S3 Data Exfiltration Attempts

When to Use

  • When GuardDuty detects anomalous S3 access patterns such as bulk downloads from unusual IPs
  • When investigating suspected data breach involving S3-stored sensitive data
  • When building detection rules for S3 data loss prevention monitoring
  • When responding to Macie alerts about sensitive data being accessed or moved
  • When compliance requires monitoring and logging of all access to classified data stores

Do not use for preventing data exfiltration (use S3 bucket policies, VPC endpoints, and SCPs), for data classification (use Amazon Macie discovery jobs), or for network-level exfiltration detection (use VPC Flow Logs with network analysis tools).

Prerequisites

  • CloudTrail configured with S3 data event logging (GetObject, PutObject, CopyObject)
  • GuardDuty enabled with S3 Protection feature activated
  • Amazon Macie enabled for sensitive data discovery in target buckets
  • CloudWatch Logs or Athena for querying CloudTrail logs at scale
  • VPC endpoint policies configured for S3 access monitoring

Workflow

Step 1: Enable S3 Data Event Logging in CloudTrail

Configure CloudTrail to capture all S3 object-level operations for forensic analysis.

# Enable S3 data events on an existing trail
aws cloudtrail put-event-selectors \
  --trail-name management-trail \
  --event-selectors '[{
    "ReadWriteType": "All",
    "IncludeManagementEvents": true,
    "DataResources": [{
      "Type": "AWS::S3::Object",
      "Values": ["arn:aws:s3:::sensitive-data-bucket/", "arn:aws:s3:::customer-records/"]
    }]
  }]'

# Verify data event configuration
aws cloudtrail get-event-selectors --trail-name management-trail \
  --query 'EventSelectors[*].DataResources' --output json

# Enable GuardDuty S3 Protection
aws guardduty update-detector \
  --detector-id $(aws guardduty list-detectors --query 'DetectorIds[0]' --output text) \
  --data-sources '{"S3Logs":{"Enable":true}}'

Step 2: Query CloudTrail for Anomalous S3 Access Patterns

Analyze CloudTrail logs for bulk download activity, unusual access times, and unfamiliar source IPs.

# Athena query: Top S3 downloaders by volume in last 24 hours
cat << 'EOF'
SELECT
  useridentity.arn as principal,
  sourceipaddress,
  COUNT(*) as request_count,
  SUM(CAST(json_extract_scalar(requestparameters, '$.bytesTransferredOut') AS bigint)) as bytes_downloaded
FROM cloudtrail_logs
WHERE eventname = 'GetObject'
  AND eventsource = 's3.amazonaws.com'
  AND eventtime > date_add('hour', -24, now())
GROUP BY useridentity.arn, sourceipaddress
ORDER BY request_count DESC
LIMIT 50
EOF

# CloudWatch Logs Insights: S3 GetObject requests from unusual IPs
aws logs start-query \
  --log-group-name cloudtrail-logs \
  --start-time $(date -d "24 hours ago" +%s) \
  --end-time $(date +%s) \
  --query-string '
    fields @timestamp, userIdentity.arn, sourceIPAddress, requestParameters.bucketName, requestParameters.key
    | filter eventName = "GetObject"
    | stats count() as requestCount by sourceIPAddress, userIdentity.arn
    | sort requestCount desc
    | limit 25
  '

# Detect cross-account copies (potential exfiltration)
aws logs start-query \
  --log-group-name cloudtrail-logs \
  --start-time $(date -d "7 days ago" +%s) \
  --end-time $(date +%s) \
  --query-string '
    fields @timestamp, userIdentity.arn, sourceIPAddress, requestParameters.bucketName
    | filter eventName in ["CopyObject", "ReplicateObject", "UploadPart"]
    | filter userIdentity.accountId != "OUR_ACCOUNT_ID"
    | sort @timestamp desc
    | limit 100
  '

Step 3: Review GuardDuty S3 Findings

Check for GuardDuty S3-specific finding types that indicate exfiltration activity.

# List active S3 exfiltration-related findings
aws guardduty list-findings \
  --detector-id $(aws guardduty list-detectors --query 'DetectorIds[0]' --output text) \
  --finding-criteria '{
    "Criterion": {
      "type": {
        "Eq": [
          "Exfiltration:S3/MaliciousIPCaller",
          "Exfiltration:S3/ObjectRead.Unusual",
          "Discovery:S3/MaliciousIPCaller.Custom",
          "Discovery:S3/BucketEnumeration.Unusual",
          "UnauthorizedAccess:S3/MaliciousIPCaller.Custom",
          "UnauthorizedAccess:S3/TorIPCaller",
          "Impact:S3/AnomalousBehavior.Delete"
        ]
      }
    }
  }' --output json

# Get detailed finding information
aws guardduty get-findings \
  --detector-id $(aws guardduty list-detectors --query 'DetectorIds[0]' --output text) \
  --finding-ids FINDING_IDS \
  --query 'Findings[*].{Type:Type,Severity:Severity,Resource:Resource.S3BucketDetails[0].Name,Action:Service.Action}' \
  --output table

Step 4: Analyze Macie Findings for Sensitive Data Access

Review Macie findings to correlate data sensitivity with access anomalies.

# List Macie findings for sensitive data exposure
aws macie2 list-findings \
  --finding-criteria '{
    "criterion": {
      "category": {"eq": ["CLASSIFICATION"]},
      "severity.description": {"eq": ["High", "Critical"]}
    }
  }' \
  --sort-criteria '{"attributeName": "updatedAt", "orderBy": "DESC"}' \
  --max-results 25

# Get detailed finding with data classification
aws macie2 get-findings \
  --finding-ids FINDING_IDS \
  --query 'findings[*].{Type:type,Severity:severity.description,Bucket:resourcesAffected.s3Bucket.name,SensitiveDataTypes:classificationDetails.result.sensitiveData[*].category}' \
  --output table

# Run a sensitive data discovery job on target bucket
aws macie2 create-classification-job \
  --job-type ONE_TIME \
  --name "exfiltration-investigation" \
  --s3-job-definition '{
    "bucketDefinitions": [{
      "accountId": "ACCOUNT_ID",
      "buckets": ["sensitive-data-bucket"]
    }]
  }'

Step 5: Build Automated Detection Rules

Create CloudWatch alarms and EventBridge rules for real-time exfiltration detection.

# CloudWatch metric filter for high-volume S3 downloads
aws logs put-metric-filter \
  --log-group-name cloudtrail-logs \
  --filter-name s3-bulk-download \
  --filter-pattern '{$.eventName = "GetObject" && $.eventSource = "s3.amazonaws.com"}' \
  --metric-transformations '[{
    "metricName": "S3GetObjectCount",
    "metricNamespace": "SecurityMetrics",
    "metricValue": "1",
    "defaultValue": 0
  }]'

# Alarm for anomalous download volume (>1000 objects/hour)
aws cloudwatch put-metric-alarm \
  --alarm-name s3-exfiltration-alert \
  --metric-name S3GetObjectCount \
  --namespace SecurityMetrics \
  --statistic Sum \
  --period 3600 \
  --threshold 1000 \
  --comparison-operator GreaterThanThreshold \
  --evaluation-periods 1 \
  --alarm-actions arn:aws:sns:us-east-1:ACCOUNT:security-alerts

# EventBridge rule for GuardDuty S3 findings
aws events put-rule \
  --name guardduty-s3-exfiltration \
  --event-pattern '{
    "source": ["aws.guardduty"],
    "detail-type": ["GuardDuty Finding"],
    "detail": {
      "type": [{"prefix": "Exfiltration:S3/"}]
    }
  }'

Step 6: Implement Preventive Controls

Deploy bucket policies and VPC endpoint policies to restrict data movement paths.

# VPC endpoint policy restricting S3 access to specific buckets
aws ec2 modify-vpc-endpoint \
  --vpc-endpoint-id vpce-ENDPOINT_ID \
  --policy-document '{
    "Statement": [{
      "Sid": "RestrictToOwnBuckets",
      "Effect": "Allow",
      "Principal": "*",
      "Action": ["s3:GetObject", "s3:PutObject"],
      "Resource": ["arn:aws:s3:::approved-bucket-1/*", "arn:aws:s3:::approved-bucket-2/*"]
    }]
  }'

# Bucket policy denying access from outside the VPC
aws s3api put-bucket-policy --bucket sensitive-data-bucket --policy '{
  "Version": "2012-10-17",
  "Statement": [{
    "Sid": "DenyNonVpcAccess",
    "Effect": "Deny",
    "Principal": "*",
    "Action": "s3:GetObject",
    "Resource": "arn:aws:s3:::sensitive-data-bucket/*",
    "Condition": {
      "StringNotEquals": {
        "aws:sourceVpce": "vpce-ENDPOINT_ID"
      }
    }
  }]
}'

Key Concepts

TermDefinition
S3 Data EventsCloudTrail object-level logging that captures GetObject, PutObject, DeleteObject, and CopyObject API calls with request details
GuardDuty S3 ProtectionThreat detection feature analyzing CloudTrail S3 data events to identify anomalous access patterns and exfiltration attempts
Amazon MacieData security service that discovers and classifies sensitive data in S3 and generates findings for data exposure risks
VPC Endpoint PolicyAccess control policy on an S3 VPC endpoint that restricts which buckets and actions can be accessed through the endpoint
Data ExfiltrationUnauthorized transfer of data from an organization's S3 storage to an external location controlled by an attacker
Anomalous Behavior DetectionMachine learning-based identification of S3 access patterns that deviate from established baselines for a principal

Tools & Systems

  • AWS CloudTrail: Audit logging of S3 object-level operations for forensic analysis and anomaly detection
  • Amazon GuardDuty: ML-based threat detection with S3-specific finding types for exfiltration and unauthorized access
  • Amazon Macie: Sensitive data discovery and classification for correlating access anomalies with data sensitivity
  • Amazon Athena: SQL query engine for analyzing CloudTrail logs at scale to identify bulk download patterns
  • CloudWatch Logs Insights: Real-time log analysis for building detection queries against CloudTrail data

Common Scenarios

Scenario: Compromised IAM Credentials Used for Bulk S3 Data Download

Context: GuardDuty reports an Exfiltration:S3/ObjectRead.Unusual finding indicating that a developer's access key is downloading thousands of objects from a sensitive data bucket at 3 AM from an IP address in a foreign country.

Approach:

  1. Immediately deactivate the compromised access key
  2. Query CloudTrail for all S3 actions by the compromised principal in the last 72 hours
  3. Identify which buckets and objects were accessed using Athena queries
  4. Cross-reference accessed objects with Macie classifications to assess data sensitivity
  5. Check for CopyObject calls to external accounts (cross-account exfiltration)
  6. Review how the credentials were compromised (TruffleHog scan, phishing investigation)
  7. Implement VPC endpoint policies to restrict future S3 access to approved network paths

Pitfalls: CloudTrail S3 data events can generate massive log volume. Use Athena with partitioned tables rather than CloudWatch Logs Insights for queries spanning more than 24 hours. GuardDuty baseline learning requires 7-14 days, so new accounts may generate false positives for normal access patterns.

Output Format

S3 Data Exfiltration Investigation Report
============================================
Account: 123456789012
Detection Source: GuardDuty Exfiltration:S3/ObjectRead.Unusual
Investigation Date: 2026-02-23

INCIDENT TIMELINE:
  2026-02-23 02:47 UTC - First anomalous GetObject from 185.x.x.x
  2026-02-23 02:47-04:12 UTC - 12,847 GetObject requests
  2026-02-23 04:15 UTC - GuardDuty finding generated
  2026-02-23 04:20 UTC - PagerDuty alert received by SOC
  2026-02-23 04:25 UTC - Access key deactivated

COMPROMISED PRINCIPAL:
  ARN: arn:aws:iam::123456789012:user/developer-jane
  Access Key: AKIA...WXYZ
  Source IP: 185.x.x.x (Tor exit node)

DATA IMPACT ASSESSMENT:
  Buckets accessed: 3
  Objects downloaded: 12,847
  Total data volume: 4.7 GB
  Sensitive data types: PII (SSN, email), Financial (credit card)
  Macie severity: CRITICAL

CONTAINMENT ACTIONS:
  [x] Access key deactivated
  [x] User password reset and MFA re-enrolled
  [x] VPC endpoint policy applied to sensitive buckets
  [x] Bucket policy restricting to VPC-only access
  [x] TruffleHog scan initiated on developer repositories
how to use detecting-s3-data-exfiltration-attempts

How to use detecting-s3-data-exfiltration-attempts on Cursor

AI-first code editor with Composer

1

Prerequisites

Before installing skills in Cursor, ensure your development environment meets these requirements:

  • Cursor installed and configured on your development machine
  • Node.js version 16.0+ with npm package manager (verify with node --version)
  • Active project directory or workspace where you want to add detecting-s3-data-exfiltration-attempts
2

Execute installation command

Execute the skills CLI command in your project's root directory to begin installation:

$npx skills install mukul975/Anthropic-Cybersecurity-Skills/detecting-s3-data-exfiltration-attempts

The skills CLI fetches detecting-s3-data-exfiltration-attempts from GitHub repository mukul975/Anthropic-Cybersecurity-Skills and configures it for Cursor.

3

Select Cursor when prompted

The CLI will show a list of available agents. Use arrow keys to navigate and space to select Cursor:

◆ Which agents do you want to install to?
│ ── Universal (.agents/skills) ── always included ────
│ • Amp
│ • Antigravity
│ • Cline
│ • Codex
│ ●Cursor(selected)
│ • Cursor
│ • Windsurf
4

Verify installation

Confirm successful installation by checking the skill directory location:

.cursor/skills/detecting-s3-data-exfiltration-attempts

Reload or restart Cursor to activate detecting-s3-data-exfiltration-attempts. Access the skill through slash commands (e.g., /detecting-s3-data-exfiltration-attempts) or your agent's skill management interface.

Security & Verification Notice

We perform automated surface-level scans (Gen AI Scanner, Socket, Snyk) during installation. These checks detect common vulnerabilities but do not guarantee complete security. Always review skill source code and verify the publisher's reputation before production use.

Skills execute code in your development environment. Always verify the publisher's identity, review recent commits, and test in isolated environments before production deployment.

List & Monetize Your Skill

Submit your Claude Code skill and start earning

GET_STARTED →

Use Cases

Exploratory Data Analysis

Quickly understand datasets, identify patterns, and generate insights

Example

Analyze CSV with 100K rows, identify outliers, visualize correlations, suggest hypotheses

Reduce EDA time from hours to minutes, uncover insights faster

Data Cleaning & Transformation

Write scripts to clean messy data, handle missing values, normalize formats

Example

Generate Python/SQL to fix date formats, impute missing values, remove duplicates

Automate 80% of data preprocessing work

Statistical Analysis

Perform hypothesis testing, regression, and statistical modeling

Example

Run A/B test analysis, calculate confidence intervals, interpret p-values

Get statistically sound analysis without PhD in statistics

Data Visualization

Create charts, dashboards, and visual reports

Example

Generate matplotlib/seaborn code for time series plots, distribution charts, heatmaps

Build presentation-ready visualizations 3x faster

Implementation Guide

Prerequisites

  • Claude Desktop or compatible AI client
  • Python environment (pandas, numpy, matplotlib) or SQL database access
  • Basic understanding of data analysis concepts
  • Sample datasets for testing skill capabilities

Time Estimate

20-40 minutes to set up and run first analysis

Installation Steps

  1. 1.Install data analysis skill using provided command
  2. 2.Prepare a sample dataset (CSV, JSON, or database connection)
  3. 3.Start with descriptive statistics: 'Summarize this dataset'
  4. 4.Progress to visualization: 'Create a scatter plot of X vs Y'
  5. 5.Advanced analysis: 'Run linear regression and interpret results'
  6. 6.Validate outputs: check calculations, verify visualizations make sense
  7. 7.Document analysis workflow for reproducibility

Common Pitfalls

  • Not validating statistical assumptions before applying tests
  • Accepting visualizations without checking data accuracy
  • Overlooking data quality issues (missing values, outliers)
  • Misinterpreting correlation as causation
  • Using wrong statistical test for data distribution
  • Not considering sample size and statistical power

Best Practices

✓ Do

  • +Always validate data quality before analysis
  • +Check statistical assumptions (normality, independence, etc.)
  • +Visualize data before running statistical tests
  • +Document analysis steps for reproducibility
  • +Cross-validate findings with domain experts
  • +Use skill for initial exploration, then dive deeper manually
  • +Save generated code for reuse on similar datasets

✗ Don't

  • Don't trust analysis without verifying data quality
  • Don't apply statistical tests without checking assumptions
  • Don't make business decisions solely on AI-generated analysis
  • Don't ignore outliers without investigating cause
  • Don't skip data validation and sanity checks
  • Don't use for mission-critical financial or medical analysis without expert review

💡 Pro Tips

  • Describe data context: 'This is user behavior data from e-commerce site'
  • Ask for interpretation: 'What does this correlation mean for business?'
  • Request multiple approaches: 'Show 3 ways to handle missing data'
  • Combine AI analysis with domain expertise for best insights
  • Use for rapid prototyping, then refine analysis manually

When to Use This

✓ Use When

Use for exploratory data analysis, data cleaning, statistical testing, visualization prototyping, and learning new analysis techniques. Best for initial exploration and rapid insights.

✗ Avoid When

Avoid for mission-critical financial analysis, medical research requiring regulatory compliance, production ML models, or when deep statistical expertise is required for nuanced interpretation.

Learning Path

  1. 1Basic: descriptive statistics, data cleaning, simple visualizations
  2. 2Intermediate: hypothesis testing, regression, correlation analysis
  3. 3Advanced: time series analysis, clustering, predictive modeling
  4. 4Expert: causal inference, experimental design, advanced statistical methods

Discussion

Product Hunt–style comments (not star reviews)
  • No comments yet — start the thread.
general reviews

Ratings

4.638 reviews
  • Mia Khanna· Dec 28, 2024

    Registry listing for detecting-s3-data-exfiltration-attempts matched our evaluation — installs cleanly and behaves as described in the markdown.

  • Mia Malhotra· Dec 16, 2024

    I recommend detecting-s3-data-exfiltration-attempts for anyone iterating fast on agent tooling; clear intent and a small, reviewable surface area.

  • Chaitanya Patil· Dec 4, 2024

    detecting-s3-data-exfiltration-attempts fits our agent workflows well — practical, well scoped, and easy to wire into existing repos.

  • Piyush G· Nov 23, 2024

    detecting-s3-data-exfiltration-attempts is among the better-maintained entries we tried; worth keeping pinned for repeat workflows.

  • Rahul Santra· Nov 19, 2024

    detecting-s3-data-exfiltration-attempts has been reliable in day-to-day use. Documentation quality is above average for community skills.

  • Chinedu Martinez· Nov 19, 2024

    Useful defaults in detecting-s3-data-exfiltration-attempts — fewer surprises than typical one-off scripts, and it plays nicely with `npx skills` flows.

  • Mia Johnson· Nov 7, 2024

    detecting-s3-data-exfiltration-attempts reduced setup friction for our internal harness; good balance of opinion and flexibility.

  • Zara Dixit· Oct 26, 2024

    Registry listing for detecting-s3-data-exfiltration-attempts matched our evaluation — installs cleanly and behaves as described in the markdown.

  • Shikha Mishra· Oct 14, 2024

    Keeps context tight: detecting-s3-data-exfiltration-attempts is the kind of skill you can hand to a new teammate without a long onboarding doc.

  • Pratham Ware· Oct 10, 2024

    Solid pick for teams standardizing on skills: detecting-s3-data-exfiltration-attempts is focused, and the summary matches what you get after install.

showing 1-10 of 38

1 / 4