exploiting-excessive-data-exposure-in-api
Tests APIs for excessive data exposure where endpoints return more data than the client application needs, relying on the frontend to filter sensitive fields. The tester intercepts API responses and analyzes them for leaked PII, internal identifiers, debug information, or sensitive business data that the UI does not display but the API transmits. This maps to OWASP API3:2023 Broken Object Property Level Authorization. Activates for requests involving API data leakage testing, excessive data exposure, response filtering bypass, or API over-fetching.
Works with
0
total installs
0
this week
8.6K
GitHub stars
0
upvotes
Install Skill
Run in your terminal
0
installs
0
this week
8.6K
stars
Installation Guide
How to use exploiting-excessive-data-exposure-in-api on Cursor
AI-first code editor with Composer
Prerequisites
Before installing skills in Cursor, ensure your development environment meets these requirements:
- ›Cursor installed and configured on your machine
- ›Node.js 16+ with npm — verify with
node --version - ›Active project directory where you want to add
exploiting-excessive-data-exposure-in-api
Run the install command
Execute the skills CLI command in your project's root directory to begin installation:
Fetches exploiting-excessive-data-exposure-in-api from mukul975/Anthropic-Cybersecurity-Skills and configures it for Cursor.
Select Cursor when prompted
The CLI shows a list of agents. Use arrow keys and space to select Cursor:
Verify installation
Confirm successful installation by checking the skill directory location:
Restart Cursor to activate exploiting-excessive-data-exposure-in-api. Access via /exploiting-excessive-data-exposure-in-api in your agent's command palette.
Security Notice
We perform automated surface-level scans (Gen AI Scanner, Socket, Snyk) during installation. These checks detect common vulnerabilities but do not guarantee complete security. Always review skill source code and verify the publisher's reputation before production use.
Skills execute code in your environment. Always review source, verify the publisher, and test in isolation before production.
Documentation
| name | exploiting-excessive-data-exposure-in-api |
| description | 'Tests APIs for excessive data exposure where endpoints return more data than the client application needs, relying on the frontend to filter sensitive fields. The tester intercepts API responses and analyzes them for leaked PII, internal identifiers, debug information, or sensitive business data that the UI does not display but the API transmits. This maps to OWASP API3:2023 Broken Object Property Level Authorization. Activates for requests involving API data leakage testing, excessive data exposure, response filtering bypass, or API over-fetching. ' |
| domain | cybersecurity |
| subdomain | api-security |
| tags | - api-security - owasp - data-exposure - rest-security - pii-leakage |
| version | 1.0.0 |
| author | mahipal |
| license | Apache-2.0 |
| nist_csf | - PR.PS-01 - ID.RA-01 - PR.DS-10 - DE.CM-01 |
Exploiting Excessive Data Exposure in API
When to Use
- Testing APIs where the frontend displays a subset of data but the API response includes additional fields
- Assessing mobile application APIs where responses are designed for multiple client types and may contain excess data
- Identifying PII leakage in API responses that include email addresses, phone numbers, SSNs, or payment data not shown in the UI
- Testing GraphQL APIs where clients can request arbitrary fields including sensitive attributes
- Evaluating APIs after microservice refactoring where internal service-to-service data leaks into public endpoints
Do not use without written authorization. Data exposure testing involves capturing and analyzing potentially sensitive personal data.
Prerequisites
- Written authorization specifying target API endpoints and scope
- Burp Suite Professional or mitmproxy configured as intercepting proxy
- Two test accounts at different privilege levels (regular user and admin)
- Browser developer tools or mobile proxy setup for traffic capture
- Python 3.10+ with
requestsandjsonlibraries - API documentation (OpenAPI spec) for comparison against actual responses
Legal Notice: This skill is for authorized security testing and educational purposes only. Unauthorized use against systems you do not own or have written permission to test is illegal and may violate computer fraud laws.
Workflow
Step 1: Response Schema Discovery
Compare documented API responses with actual responses:
import requests
import json
BASE_URL = "https://target-api.example.com/api/v1"
headers = {"Authorization": "Bearer <user_token>", "Content-Type": "application/json"}
# Fetch a resource and analyze all returned fields
endpoints_to_test = [
("GET", "/users/me", None),
("GET", "/users/me/orders", None),
("GET", "/products", None),
("GET", "/users/me/settings", None),
("GET", "/transactions", None),
]
for method, path, body in endpoints_to_test:
resp = requests.request(method, f"{BASE_URL}{path}", headers=headers, json=body)
if resp.status_code == 200:
data = resp.json()
# Recursively extract all field names
def extract_fields(obj, prefix=""):
fields = []
if isinstance(obj, dict):
for k, v in obj.items():
full_key = f"{prefix}.{k}" if prefix else k
fields.append(full_key)
fields.extend(extract_fields(v, full_key))
elif isinstance(obj, list) and obj:
fields.extend(extract_fields(obj[0], f"{prefix}[]"))
return fields
all_fields = extract_fields(data)
print(f"\n{method} {path} - {len(all_fields)} fields returned:")
for f in sorted(all_fields):
print(f" {f}")
Step 2: Sensitive Data Pattern Detection
Scan API responses for sensitive data patterns:
import re
SENSITIVE_PATTERNS = {
"email": r'[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}',
"phone": r'(\+?1?\s?\(?\d{3}\)?[\s.-]?\d{3}[\s.-]?\d{4})',
"ssn": r'\b\d{3}-\d{2}-\d{4}\b',
"credit_card": r'\b(?:4[0-9]{12}(?:[0-9]{3})?|5[1-5][0-9]{14}|3[47][0-9]{13})\b',
"password_hash": r'\$2[aby]?\$\d{2}\$[./A-Za-z0-9]{53}',
"api_key": r'(?:api[_-]?key|apikey)["\s:=]+["\']?([a-zA-Z0-9_\-]{20,})',
"internal_ip": r'\b(?:10\.\d{1,3}|172\.(?:1[6-9]|2\d|3[01])|192\.168)\.\d{1,3}\.\d{1,3}\b',
"aws_key": r'AKIA[0-9A-Z]{16}',
"jwt_token": r'eyJ[A-Za-z0-9_-]+\.eyJ[A-Za-z0-9_-]+\.[A-Za-z0-9_-]+',
"uuid": r'[0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12}',
}
SENSITIVE_FIELD_NAMES = [
"password", "password_hash", "secret", "token", "ssn", "social_security",
"credit_card", "card_number", "cvv", "pin", "private_key", "api_key",
"internal_id", "debug", "trace", "stack_trace", "created_by_ip",
"last_login_ip", "salt", "session_id", "refresh_token", "mfa_secret",
"date_of_birth", "bank_account", "routing_number", "tax_id"
]
def scan_response(endpoint, response_text):
findings = []
# Check for sensitive data patterns in values
for pattern_name, pattern in SENSITIVE_PATTERNS.items():
matches = re.findall(pattern, response_text)
if matches:
findings.append({
"endpoint": endpoint,
"type": "sensitive_value",
"pattern": pattern_name,
"count": len(matches),
"sample": matches[0][:20] + "..." if len(matches[0]) > 20 else matches[0]
})
# Check for sensitive field names
response_lower = response_text.lower()
for field in SENSITIVE_FIELD_NAMES:
if f'"{field}"' in response_lower or f"'{field}'" in response_lower:
findings.append({
"endpoint": endpoint,
"type": "sensitive_field",
"field_name": field
})
return findings
# Scan all endpoint responses
for method, path, body in endpoints_to_test:
resp = requests.request(method, f"{BASE_URL}{path}", headers=headers, json=body)
if resp.status_code == 200:
findings = scan_response(f"{method} {path}", resp.text)
for f in findings:
print(f"[FINDING] {f['endpoint']}: {f['type']} - {f.get('pattern', f.get('field_name'))}")
Step 3: Compare UI Display vs API Response
# Fields the UI shows (observed from the frontend application)
ui_displayed_fields = {
"/users/me": {"name", "email", "avatar_url", "role"},
"/users/me/orders": {"order_id", "date", "status", "total"},
"/products": {"id", "name", "price", "image_url", "description"},
}
# Fields the API actually returns
for method, path, body in endpoints_to_test:
resp = requests.request(method, f"{BASE_URL}{path}", headers=headers, json=body)
if resp.status_code == 200:
data = resp.json()
if isinstance(data, list):
actual_fields = set(data[0].keys()) if data else set()
elif isinstance(data, dict):
# Handle paginated responses
items_key = next((k for k in data if isinstance(data[k], list)), None)
if items_key and data[items_key]:
actual_fields = set(data[items_key][0].keys())
else:
actual_fields = set(data.keys())
else:
continue
expected = ui_displayed_fields.get(path, set())
excess = actual_fields - expected
if excess:
print(f"\n{method} {path} - EXCESS FIELDS (not shown in UI):")
for field in sorted(excess):
print(f" - {field}")
Step 4: Test User Object Exposure in Related Endpoints
# Many APIs embed full user objects in responses for orders, comments, etc.
endpoints_with_user_objects = [
"/orders", # Each order may include full seller/buyer profile
"/comments", # Comments may include full author profile
"/reviews", # Reviews may expose reviewer details
"/transactions", # Transactions may include counterparty info
"/team/members", # Team listing may expose excessive member data
]
for path in endpoints_with_user_objects:
resp = requests.get(f"{BASE_URL}{path}", headers=headers)
if resp.status_code == 200:
text = resp.text
# Check for user data leakage in nested objects
user_fields_found = []
for field in ["password_hash", "last_login_ip", "mfa_enabled", "phone_number",
"date_of_birth", "ssn", "internal_notes", "salary", "address"]:
if f'"{field}"' in text:
user_fields_found.append(field)
if user_fields_found:
print(f"[EXCESSIVE] {path} exposes user fields: {user_fields_found}")
Step 5: GraphQL Over-Fetching Analysis
# GraphQL allows clients to request any available field
GRAPHQL_URL = f"{BASE_URL}/graphql"
# Introspection query to discover all fields on User type
introspection = {
"query": """
{
__type(name: "User") {
fields {
name
type {
name
kind
}
}
}
}
"""
}
resp = requests.post(GRAPHQL_URL, headers=headers, json=introspection)
if resp.status_code == 200:
fields = resp.json().get("data", {}).get("__type", {}).get("fields", [])
print("Available User fields via GraphQL:")
for f in fields:
sensitivity = "SENSITIVE" if f["name"] in SENSITIVE_FIELD_NAMES else "normal"
print(f" {f['name']} ({f['type']['name']}) [{sensitivity}]")
# Try to query sensitive fields
sensitive_query = {
"query": """
query {
users {
id
email
passwordHash
socialSecurityNumber
internalNotes
lastLoginIp
mfaSecret
apiKey
}
}
"""
}
resp = requests.post(GRAPHQL_URL, headers=headers, json=sensitive_query)
if resp.status_code == 200 and "errors" not in resp.json():
print("[CRITICAL] GraphQL exposes sensitive user fields without restriction")
Step 6: Debug and Internal Data Leakage
# Test for debug information in responses
debug_headers_to_check = [
"X-Debug-Token", "X-Debug-Info", "Server", "X-Powered-By",
"X-Request-Id", "X-Correlation-Id", "X-Backend-Server",
"X-Runtime", "X-Version", "X-Build-Version"
]
resp = requests.get(f"{BASE_URL}/users/me", headers=headers)
for h in debug_headers_to_check:
if h.lower() in {k.lower(): v for k, v in resp.headers.items()}:
print(f"[INFO LEAK] Header {h}: {resp.headers.get(h)}")
# Test error responses for stack traces
error_payloads = [
("GET", "/users/invalid-id-format", None),
("POST", "/orders", {"invalid": "payload"}),
("GET", "/users/-1", None),
("GET", "/users/0", None),
]
for method, path, body in error_payloads:
resp = requests.request(method, f"{BASE_URL}{path}", headers=headers, json=body)
if resp.status_code >= 400:
text = resp.text.lower()
if any(kw in text for kw in ["stack trace", "traceback", "at com.", "at org.",
"file \"", "line ", "exception", "sql", "query"]):
print(f"[DEBUG LEAK] {method} {path} -> {resp.status_code}: Contains stack trace or query info")
Key Concepts
| Term | Definition |
|---|---|
| Excessive Data Exposure | API returns more data fields than the client needs, relying on frontend filtering to hide sensitive information from users |
| Over-Fetching | Requesting or receiving more data than needed for a specific operation, common in REST APIs that return fixed response schemas |
| Response Filtering | Client-side filtering of API response data to display only relevant fields, which provides zero security since the full response is interceptable |
| Object Property Level Authorization | OWASP API3:2023 - ensuring that users can only read/write object properties they are authorized to access |
| PII Leakage | Unintended exposure of Personally Identifiable Information in API responses including names, emails, addresses, SSNs, or financial data |
| Schema Validation | Enforcing that API responses conform to a defined schema, stripping unauthorized fields before transmission |
Tools & Systems
- Burp Suite Professional: Intercept API responses and use the Comparer tool to diff expected vs actual response schemas
- mitmproxy: Scriptable proxy for automated response analysis with Python-based content inspection scripts
- OWASP ZAP: Passive scanner detects information disclosure in headers, error messages, and response bodies
- Postman: Compare documented response schemas against actual API responses using test scripts
- jq: Command-line JSON processor for extracting and analyzing specific fields from API responses
Common Scenarios
Scenario: Mobile Banking API Data Exposure Assessment
Context: A mobile banking application's API returns full account objects to the mobile client, which only displays account nickname and balance. The API is accessed by both iOS and Android apps and a web portal.
Approach:
- Configure mitmproxy on a test device and authenticate as the test user
- Capture all API responses during a complete user session (login, view accounts, transfer, logout)
- Analyze
GET /api/v1/accountsresponse: UI shows 4 fields but API returns 23 fields - Discover that the API returns
routing_number,account_holder_ssn_last4,internal_risk_score,kyc_verification_status, andlinked_external_accounts- none shown in UI - Analyze
GET /api/v1/transactionsresponse: API returnsmerchant_id,terminal_id,authorization_code,processor_responsefields not needed by the client - Check
GET /api/v1/users/me: API returnslast_login_ip,mfa_backup_codes_remaining,account_officer_name, andcredit_score_band - Test error responses:
POST /api/v1/transferswith invalid payload returns SQL table name in error message
Pitfalls:
- Only checking top-level fields and missing sensitive data in deeply nested objects
- Not testing paginated responses where subsequent pages may include different fields
- Ignoring response headers that may leak server version, backend technology, or internal routing information
- Missing data exposure in error responses which often contain stack traces, SQL queries, or internal paths
- Assuming that HTTPS encryption prevents data exposure (it protects in transit, not from the authenticated client)
Output Format
## Finding: Excessive Data Exposure in Account and Transaction APIs
**ID**: API-DATA-001
**Severity**: High (CVSS 7.1)
**OWASP API**: API3:2023 - Broken Object Property Level Authorization
**Affected Endpoints**:
- GET /api/v1/accounts
- GET /api/v1/transactions
- GET /api/v1/users/me
**Description**:
The API returns full database objects to the client, including sensitive fields
that are not displayed in the mobile application UI. The mobile app filters
these fields client-side, but they are fully accessible by intercepting the
API response. This exposes SSN fragments, internal risk scores, and KYC
verification data for any authenticated user.
**Excess Fields Discovered**:
- /accounts: routing_number, account_holder_ssn_last4, internal_risk_score,
kyc_verification_status, linked_external_accounts (18 excess fields total)
- /transactions: merchant_id, terminal_id, authorization_code,
processor_response (12 excess fields total)
- /users/me: last_login_ip, mfa_backup_codes_remaining, credit_score_band
**Impact**:
An authenticated user can extract sensitive financial data, internal risk
assessments, and PII for their own account that the application is not
intended to reveal. Combined with BOLA vulnerabilities, this data could
be extracted for all users.
**Remediation**:
1. Implement server-side response filtering using DTOs/view models that only include fields needed by the client
2. Use GraphQL field-level authorization or REST response schemas per endpoint per role
3. Remove sensitive fields from API responses at the serialization layer
4. Implement response schema validation in the API gateway to strip undocumented fields
5. Add automated tests that verify response schemas match documentation
List & Monetize Your Skill
Submit your Claude Code skill and start earning
Use Cases
Exploratory Data Analysis
Quickly understand datasets, identify patterns, and generate insights
Example
Analyze CSV with 100K rows, identify outliers, visualize correlations, suggest hypotheses
Reduce EDA time from hours to minutes, uncover insights faster
Data Cleaning & Transformation
Write scripts to clean messy data, handle missing values, normalize formats
Example
Generate Python/SQL to fix date formats, impute missing values, remove duplicates
Automate 80% of data preprocessing work
Statistical Analysis
Perform hypothesis testing, regression, and statistical modeling
Example
Run A/B test analysis, calculate confidence intervals, interpret p-values
Get statistically sound analysis without PhD in statistics
Data Visualization
Create charts, dashboards, and visual reports
Example
Generate matplotlib/seaborn code for time series plots, distribution charts, heatmaps
Build presentation-ready visualizations 3x faster
Implementation Guide
Prerequisites
- ›Claude Desktop or compatible AI client
- ›Python environment (pandas, numpy, matplotlib) or SQL database access
- ›Basic understanding of data analysis concepts
- ›Sample datasets for testing skill capabilities
Time Estimate
20-40 minutes to set up and run first analysis
Steps
- 1Install data analysis skill using provided command
- 2Prepare a sample dataset (CSV, JSON, or database connection)
- 3Start with descriptive statistics: 'Summarize this dataset'
- 4Progress to visualization: 'Create a scatter plot of X vs Y'
- 5Advanced analysis: 'Run linear regression and interpret results'
- 6Validate outputs: check calculations, verify visualizations make sense
- 7Document analysis workflow for reproducibility
Common Pitfalls
- ⚠Not validating statistical assumptions before applying tests
- ⚠Accepting visualizations without checking data accuracy
- ⚠Overlooking data quality issues (missing values, outliers)
- ⚠Misinterpreting correlation as causation
- ⚠Using wrong statistical test for data distribution
- ⚠Not considering sample size and statistical power
Best Practices
✓ Do
- +Always validate data quality before analysis
- +Check statistical assumptions (normality, independence, etc.)
- +Visualize data before running statistical tests
- +Document analysis steps for reproducibility
- +Cross-validate findings with domain experts
- +Use skill for initial exploration, then dive deeper manually
- +Save generated code for reuse on similar datasets
✗ Don't
- −Don't trust analysis without verifying data quality
- −Don't apply statistical tests without checking assumptions
- −Don't make business decisions solely on AI-generated analysis
- −Don't ignore outliers without investigating cause
- −Don't skip data validation and sanity checks
- −Don't use for mission-critical financial or medical analysis without expert review
💡 Pro Tips
- ★Describe data context: 'This is user behavior data from e-commerce site'
- ★Ask for interpretation: 'What does this correlation mean for business?'
- ★Request multiple approaches: 'Show 3 ways to handle missing data'
- ★Combine AI analysis with domain expertise for best insights
- ★Use for rapid prototyping, then refine analysis manually
When to Use This
✓ Use when
Use for exploratory data analysis, data cleaning, statistical testing, visualization prototyping, and learning new analysis techniques. Best for initial exploration and rapid insights.
✗ Avoid when
Avoid for mission-critical financial analysis, medical research requiring regulatory compliance, production ML models, or when deep statistical expertise is required for nuanced interpretation.
Learning Path
- 1Basic: descriptive statistics, data cleaning, simple visualizations
- 2Intermediate: hypothesis testing, regression, correlation analysis
- 3Advanced: time series analysis, clustering, predictive modeling
- 4Expert: causal inference, experimental design, advanced statistical methods
Related Skills
performing-cryptographic-audit-of-application
5mukul975/Anthropic-Cybersecurity-Skills
implementing-soar-playbook-with-palo-alto-xsoar
3mukul975/Anthropic-Cybersecurity-Skills
exploiting-deeplink-vulnerabilities
3mukul975/Anthropic-Cybersecurity-Skills
generating-threat-intelligence-reports
2mukul975/Anthropic-Cybersecurity-Skills
analyzing-network-traffic-with-wireshark
2mukul975/Anthropic-Cybersecurity-Skills
scanning-docker-images-with-trivy
2mukul975/Anthropic-Cybersecurity-Skills
Reviews
- LLucas Chawla★★★★★Dec 28, 2024
We added exploiting-excessive-data-exposure-in-api from the explainx registry; install was straightforward and the SKILL.md answered most questions upfront.
- SSoo Gonzalez★★★★★Dec 20, 2024
I recommend exploiting-excessive-data-exposure-in-api for anyone iterating fast on agent tooling; clear intent and a small, reviewable surface area.
- SSakura Johnson★★★★★Dec 8, 2024
Keeps context tight: exploiting-excessive-data-exposure-in-api is the kind of skill you can hand to a new teammate without a long onboarding doc.
- GGanesh Mohane★★★★★Dec 4, 2024
exploiting-excessive-data-exposure-in-api fits our agent workflows well — practical, well scoped, and easy to wire into existing repos.
- IIra Chawla★★★★★Nov 27, 2024
Registry listing for exploiting-excessive-data-exposure-in-api matched our evaluation — installs cleanly and behaves as described in the markdown.
- NNaina Liu★★★★★Nov 11, 2024
Solid pick for teams standardizing on skills: exploiting-excessive-data-exposure-in-api is focused, and the summary matches what you get after install.
- MMia Lopez★★★★★Oct 18, 2024
exploiting-excessive-data-exposure-in-api reduced setup friction for our internal harness; good balance of opinion and flexibility.
- SSoo Diallo★★★★★Oct 2, 2024
exploiting-excessive-data-exposure-in-api has been reliable in day-to-day use. Documentation quality is above average for community skills.
- VValentina Farah★★★★★Sep 21, 2024
Keeps context tight: exploiting-excessive-data-exposure-in-api is the kind of skill you can hand to a new teammate without a long onboarding doc.
- IIra Bhatia★★★★★Sep 13, 2024
exploiting-excessive-data-exposure-in-api reduced setup friction for our internal harness; good balance of opinion and flexibility.
showing 1-10 of 43
Discussion
Comments — not star reviews- No comments yet — start the thread.