implementing-aws-macie-for-data-classification
Implement Amazon Macie to automatically discover, classify, and protect sensitive data in S3 buckets using machine learning and pattern matching for PII, financial data, and credentials detection.
Works with
0
total installs
0
this week
8.6K
GitHub stars
0
upvotes
Install Skill
Run in your terminal
0
installs
0
this week
8.6K
stars
Installation Guide
How to use implementing-aws-macie-for-data-classification on Cursor
AI-first code editor with Composer
Prerequisites
Before installing skills in Cursor, ensure your development environment meets these requirements:
- ›Cursor installed and configured on your machine
- ›Node.js 16+ with npm — verify with
node --version - ›Active project directory where you want to add
implementing-aws-macie-for-data-classification
Run the install command
Execute the skills CLI command in your project's root directory to begin installation:
Fetches implementing-aws-macie-for-data-classification from mukul975/Anthropic-Cybersecurity-Skills and configures it for Cursor.
Select Cursor when prompted
The CLI shows a list of agents. Use arrow keys and space to select Cursor:
Verify installation
Confirm successful installation by checking the skill directory location:
Restart Cursor to activate implementing-aws-macie-for-data-classification. Access via /implementing-aws-macie-for-data-classification in your agent's command palette.
Security Notice
We perform automated surface-level scans (Gen AI Scanner, Socket, Snyk) during installation. These checks detect common vulnerabilities but do not guarantee complete security. Always review skill source code and verify the publisher's reputation before production use.
Skills execute code in your environment. Always review source, verify the publisher, and test in isolation before production.
Documentation
| name | implementing-aws-macie-for-data-classification |
| description | Implement Amazon Macie to automatically discover, classify, and protect sensitive data in S3 buckets using machine learning and pattern matching for PII, financial data, and credentials detection. |
| domain | cybersecurity |
| subdomain | cloud-security |
| tags | - aws - macie - data-classification - s3 - pii - sensitive-data - dlp - compliance |
| version | '1.0' |
| author | mahipal |
| license | Apache-2.0 |
| atlas_techniques | - AML.T0043 - AML.T0018 |
| nist_ai_rmf | - GOVERN-1.1 - GOVERN-4.2 - MAP-2.3 - MEASURE-2.7 - MEASURE-2.5 |
| nist_csf | - PR.IR-01 - ID.AM-08 - GV.SC-06 - DE.CM-01 |
Implementing AWS Macie for Data Classification
Overview
Amazon Macie is a fully managed data security and privacy service that uses machine learning and pattern matching to discover and protect sensitive data in Amazon S3. Macie automatically evaluates your S3 bucket inventory on a daily basis and identifies objects containing PII, financial information, credentials, and other sensitive data types. It provides two discovery approaches: automated sensitive data discovery for broad visibility and targeted discovery jobs for deep analysis.
When to Use
- When deploying or configuring implementing aws macie for data classification capabilities in your environment
- When establishing security controls aligned to compliance requirements
- When building or improving security architecture for this domain
- When conducting security assessments that require this implementation
Prerequisites
- AWS account with S3 buckets containing data to classify
- IAM permissions for Macie service configuration
- AWS Organizations setup (for multi-account deployment)
- S3 buckets in supported regions
Enable Macie
Via AWS CLI
# Enable Macie in the current account/region
aws macie2 enable-macie
# Verify Macie is enabled
aws macie2 get-macie-session
# Enable automated sensitive data discovery
aws macie2 update-automated-discovery-configuration \
--status ENABLED
Via Terraform
resource "aws_macie2_account" "main" {}
resource "aws_macie2_classification_export_configuration" "main" {
depends_on = [aws_macie2_account.main]
s3_destination {
bucket_name = aws_s3_bucket.macie_results.id
key_prefix = "macie-findings/"
kms_key_arn = aws_kms_key.macie.arn
}
}
Configure Discovery Jobs
Create a classification job for specific buckets
aws macie2 create-classification-job \
--job-type ONE_TIME \
--name "pii-scan-production-buckets" \
--s3-job-definition '{
"bucketDefinitions": [{
"accountId": "123456789012",
"buckets": [
"production-data-bucket",
"customer-records-bucket"
]
}]
}' \
--managed-data-identifier-selector ALL
Create a scheduled recurring job
aws macie2 create-classification-job \
--job-type SCHEDULED \
--name "weekly-sensitive-data-scan" \
--schedule-frequency-details '{
"weekly": {
"dayOfWeek": "MONDAY"
}
}' \
--s3-job-definition '{
"bucketDefinitions": [{
"accountId": "123456789012",
"buckets": ["all-data-bucket"]
}],
"scoping": {
"includes": {
"and": [{
"simpleScopeTerm": {
"comparator": "STARTS_WITH",
"key": "OBJECT_KEY",
"values": ["uploads/", "documents/"]
}
}]
}
}
}'
Custom Data Identifiers
Create a custom identifier for internal IDs
aws macie2 create-custom-data-identifier \
--name "internal-employee-id" \
--description "Matches internal employee ID format EMP-XXXXXX" \
--regex "EMP-[0-9]{6}" \
--severity-levels '[
{"occurrencesThreshold": 1, "severity": "LOW"},
{"occurrencesThreshold": 10, "severity": "MEDIUM"},
{"occurrencesThreshold": 50, "severity": "HIGH"}
]'
Create identifier for project codes
aws macie2 create-custom-data-identifier \
--name "project-code-identifier" \
--description "Matches project codes in format PRJ-XXXX-XX" \
--regex "PRJ-[A-Z]{4}-[0-9]{2}" \
--keywords '["project", "code", "initiative"]' \
--maximum-match-distance 50
Allow Lists
Create an allow list to suppress false positives
aws macie2 create-allow-list \
--name "test-data-exclusions" \
--description "Exclude known test data patterns" \
--criteria '{
"regex": "TEST-[0-9]{4}-[0-9]{4}-[0-9]{4}-[0-9]{4}"
}'
Managed Data Identifiers
Macie provides 300+ managed data identifiers covering:
| Category | Examples |
|---|---|
| PII | SSN, passport numbers, driver's license, date of birth, names, addresses |
| Financial | Credit card numbers, bank account numbers, SWIFT codes |
| Credentials | AWS secret keys, API keys, SSH private keys, OAuth tokens |
| Health | HIPAA identifiers, health insurance claim numbers |
| Legal | Tax identification numbers, national ID numbers |
Findings Management
List findings
# Get sensitive data findings
aws macie2 list-findings \
--finding-criteria '{
"criterion": {
"severity.description": {
"eq": ["High"]
},
"category": {
"eq": ["CLASSIFICATION"]
}
}
}' \
--sort-criteria '{"attributeName": "updatedAt", "orderBy": "DESC"}' \
--max-results 25
Get finding details
aws macie2 get-findings \
--finding-ids '["finding-id-1", "finding-id-2"]'
Export findings to Security Hub
# Macie automatically publishes findings to Security Hub
# Verify integration:
aws macie2 get-macie-session --query 'findingPublishingFrequency'
EventBridge Integration for Automated Response
{
"source": ["aws.macie"],
"detail-type": ["Macie Finding"],
"detail": {
"severity": {
"description": ["High", "Critical"]
}
}
}
Lambda function for automated remediation
import boto3
import json
s3 = boto3.client('s3')
sns = boto3.client('sns')
def lambda_handler(event, context):
finding = event['detail']
severity = finding['severity']['description']
bucket = finding['resourcesAffected']['s3Bucket']['name']
key = finding['resourcesAffected']['s3Object']['key']
sensitive_types = [d['type'] for d in finding.get('classificationDetails', {}).get('result', {}).get('sensitiveData', [])]
if severity in ['High', 'Critical']:
# Tag the object for review
s3.put_object_tagging(
Bucket=bucket,
Key=key,
Tagging={
'TagSet': [
{'Key': 'macie-finding', 'Value': severity},
{'Key': 'sensitive-data', 'Value': ','.join(sensitive_types)},
{'Key': 'requires-review', 'Value': 'true'}
]
}
)
# Notify security team
sns.publish(
TopicArn='arn:aws:sns:us-east-1:123456789012:security-alerts',
Subject=f'Macie {severity} Finding: {bucket}/{key}',
Message=json.dumps({
'bucket': bucket,
'key': key,
'severity': severity,
'sensitive_data_types': sensitive_types,
'finding_id': finding['id']
}, indent=2)
)
return {'statusCode': 200}
Multi-Account Deployment
Designate Macie administrator account
# From the management account
aws macie2 enable-organization-admin-account \
--admin-account-id 111111111111
Add member accounts
# From the administrator account
aws macie2 create-member \
--account '{"accountId": "222222222222", "email": "[email protected]"}'
Monitoring Macie Operations
Usage statistics
aws macie2 get-usage-statistics \
--filter-by '[{"comparator": "GT", "key": "accountId", "values": []}]' \
--sort-by '{"key": "accountId", "orderBy": "ASC"}'
Classification job status
aws macie2 list-classification-jobs \
--filter-criteria '{"includes": [{"comparator": "EQ", "key": "jobStatus", "values": ["RUNNING"]}]}'
References
- AWS Macie Documentation: https://docs.aws.amazon.com/macie/
- AWS Macie Pricing
- Supported File Types for Macie Analysis
- GDPR and CCPA Compliance with Macie
List & Monetize Your Skill
Submit your Claude Code skill and start earning
Use Cases
Exploratory Data Analysis
Quickly understand datasets, identify patterns, and generate insights
Example
Analyze CSV with 100K rows, identify outliers, visualize correlations, suggest hypotheses
Reduce EDA time from hours to minutes, uncover insights faster
Data Cleaning & Transformation
Write scripts to clean messy data, handle missing values, normalize formats
Example
Generate Python/SQL to fix date formats, impute missing values, remove duplicates
Automate 80% of data preprocessing work
Statistical Analysis
Perform hypothesis testing, regression, and statistical modeling
Example
Run A/B test analysis, calculate confidence intervals, interpret p-values
Get statistically sound analysis without PhD in statistics
Data Visualization
Create charts, dashboards, and visual reports
Example
Generate matplotlib/seaborn code for time series plots, distribution charts, heatmaps
Build presentation-ready visualizations 3x faster
Implementation Guide
Prerequisites
- ›Claude Desktop or compatible AI client
- ›Python environment (pandas, numpy, matplotlib) or SQL database access
- ›Basic understanding of data analysis concepts
- ›Sample datasets for testing skill capabilities
Time Estimate
20-40 minutes to set up and run first analysis
Steps
- 1Install data analysis skill using provided command
- 2Prepare a sample dataset (CSV, JSON, or database connection)
- 3Start with descriptive statistics: 'Summarize this dataset'
- 4Progress to visualization: 'Create a scatter plot of X vs Y'
- 5Advanced analysis: 'Run linear regression and interpret results'
- 6Validate outputs: check calculations, verify visualizations make sense
- 7Document analysis workflow for reproducibility
Common Pitfalls
- ⚠Not validating statistical assumptions before applying tests
- ⚠Accepting visualizations without checking data accuracy
- ⚠Overlooking data quality issues (missing values, outliers)
- ⚠Misinterpreting correlation as causation
- ⚠Using wrong statistical test for data distribution
- ⚠Not considering sample size and statistical power
Best Practices
✓ Do
- +Always validate data quality before analysis
- +Check statistical assumptions (normality, independence, etc.)
- +Visualize data before running statistical tests
- +Document analysis steps for reproducibility
- +Cross-validate findings with domain experts
- +Use skill for initial exploration, then dive deeper manually
- +Save generated code for reuse on similar datasets
✗ Don't
- −Don't trust analysis without verifying data quality
- −Don't apply statistical tests without checking assumptions
- −Don't make business decisions solely on AI-generated analysis
- −Don't ignore outliers without investigating cause
- −Don't skip data validation and sanity checks
- −Don't use for mission-critical financial or medical analysis without expert review
💡 Pro Tips
- ★Describe data context: 'This is user behavior data from e-commerce site'
- ★Ask for interpretation: 'What does this correlation mean for business?'
- ★Request multiple approaches: 'Show 3 ways to handle missing data'
- ★Combine AI analysis with domain expertise for best insights
- ★Use for rapid prototyping, then refine analysis manually
When to Use This
✓ Use when
Use for exploratory data analysis, data cleaning, statistical testing, visualization prototyping, and learning new analysis techniques. Best for initial exploration and rapid insights.
✗ Avoid when
Avoid for mission-critical financial analysis, medical research requiring regulatory compliance, production ML models, or when deep statistical expertise is required for nuanced interpretation.
Learning Path
- 1Basic: descriptive statistics, data cleaning, simple visualizations
- 2Intermediate: hypothesis testing, regression, correlation analysis
- 3Advanced: time series analysis, clustering, predictive modeling
- 4Expert: causal inference, experimental design, advanced statistical methods
Related Skills
performing-cryptographic-audit-of-application
5mukul975/Anthropic-Cybersecurity-Skills
implementing-soar-playbook-with-palo-alto-xsoar
3mukul975/Anthropic-Cybersecurity-Skills
exploiting-deeplink-vulnerabilities
3mukul975/Anthropic-Cybersecurity-Skills
analyzing-network-traffic-with-wireshark
2mukul975/Anthropic-Cybersecurity-Skills
generating-threat-intelligence-reports
2mukul975/Anthropic-Cybersecurity-Skills
scanning-docker-images-with-trivy
2mukul975/Anthropic-Cybersecurity-Skills
Reviews
- PPratham Ware★★★★★Dec 28, 2024
Keeps context tight: implementing-aws-macie-for-data-classification is the kind of skill you can hand to a new teammate without a long onboarding doc.
- AAdvait Gupta★★★★★Dec 20, 2024
Registry listing for implementing-aws-macie-for-data-classification matched our evaluation — installs cleanly and behaves as described in the markdown.
- DDhruvi Jain★★★★★Dec 4, 2024
Solid pick for teams standardizing on skills: implementing-aws-macie-for-data-classification is focused, and the summary matches what you get after install.
- CChen Chen★★★★★Dec 4, 2024
Solid pick for teams standardizing on skills: implementing-aws-macie-for-data-classification is focused, and the summary matches what you get after install.
- OOshnikdeep★★★★★Nov 23, 2024
We added implementing-aws-macie-for-data-classification from the explainx registry; install was straightforward and the SKILL.md answered most questions upfront.
- CChinedu Choi★★★★★Nov 11, 2024
Useful defaults in implementing-aws-macie-for-data-classification — fewer surprises than typical one-off scripts, and it plays nicely with `npx skills` flows.
- GGanesh Mohane★★★★★Oct 14, 2024
implementing-aws-macie-for-data-classification fits our agent workflows well — practical, well scoped, and easy to wire into existing repos.
- AAma Flores★★★★★Oct 2, 2024
I recommend implementing-aws-macie-for-data-classification for anyone iterating fast on agent tooling; clear intent and a small, reviewable surface area.
- HHarper Smith★★★★★Sep 21, 2024
Solid pick for teams standardizing on skills: implementing-aws-macie-for-data-classification is focused, and the summary matches what you get after install.
- XXiao Chen★★★★★Sep 17, 2024
Keeps context tight: implementing-aws-macie-for-data-classification is the kind of skill you can hand to a new teammate without a long onboarding doc.
showing 1-10 of 25
Discussion
Comments — not star reviews- No comments yet — start the thread.