data-analysis

supercent-io/skills-template · updated Apr 8, 2026

$npx skills add https://github.com/supercent-io/skills-template --skill data-analysis
0 commentsdiscussion
summary

Dataset exploration, cleaning, statistical analysis, and visualization in Python or SQL.

  • Supports CSV, JSON, and SQL data sources with pandas DataFrames and direct database queries
  • Covers the full analysis pipeline: data loading, missing value handling, outlier detection, grouping, correlation analysis, and pivot tables
  • Includes visualization templates for histograms, boxplots, heatmaps, and time series using matplotlib and seaborn
  • Generates structured markdown reports with datase
skill.md

Data Analysis

When to use this skill

  • Data exploration: Understand a new dataset
  • Report generation: Derive data-driven insights
  • Quality validation: Check data consistency
  • Decision support: Make data-driven recommendations

Instructions

Step 1: Load and explore data

Python (Pandas):

import pandas as pd
import numpy as np

# Load CSV
df = pd.read_csv('data.csv')

# Basic info
print(df.info())
print(df.describe())
print(df.head(10))

# Check missing values
print(df.isnull().sum())

# Data types
print(df.dtypes)

SQL:

-- Inspect table schema
DESCRIBE table_name;

-- Sample data
SELECT * FROM table_name LIMIT 10;

-- Basic stats
SELECT
    COUNT(*) as total_rows,
    COUNT(DISTINCT column_name) as unique_values,
    MIN(numeric_column) as min_val,
    MAX(numeric_column) as max_val,
    AVG(numeric_column) as avg_val
FROM table_name;

Step 2: Data cleaning

# Handle missing values
df['column'].fillna(df['column'].mean(), inplace=True)
df.dropna(subset=['required_column'], inplace=True)

# Remove duplicates
df.drop_duplicates(inplace=True)

# Type conversions
df['date'] = pd.to_datetime(df['date'])
df['category'] = df['category'].astype('category')

# Remove outliers (IQR method)
Q1 = df['value'].quantile(0.25)
Q3 = df['value'].quantile(0.75)
IQR = Q3 - Q1
df = df[(df['value'] >= Q1 - 1.5*IQR) & (df['value'] <= Q3 + 1.5*IQR)]

Step 3: Statistical analysis

# Descriptive statistics
print(df['numeric_column'].describe())

# Grouped analysis
grouped = df.groupby('category').agg({
    'value': ['mean', 'sum', 'count'],
    'other': 'nunique'
})
print(grouped)

# Correlation
correlation = df[['col1', 'col2', 'col3']].corr()
print(correlation)

# Pivot table
pivot = pd.pivot_table(df,
    values='sales',
    index='region',
    columns='month',
    aggfunc='sum'
)

Step 4: Visualization

import matplotlib.pyplot as plt
import seaborn as sns

# Histogram
plt.figure(figsize=(10, 6))
df['value'].hist(bins=30)
plt.title('Distribution of Values')
plt.savefig('histogram.png')

# Boxplot
plt.figure(figsize=(10, 6))
sns.boxplot(x='category', y='value', data=df)
plt.title('Value by Category')
plt.savefig('boxplot.png')

# Heatmap (correlation)
plt.figure(figsize=(10, 8))
sns.heatmap(correlation, annot=True, cmap='coolwarm')
plt.title('Correlation Matrix')
plt.savefig('heatmap.png')

# Time series
plt.figure(figsize=(12, 6))
df.groupby('date')['value'].sum().plot()
plt.title('Time Series of Values')
plt.savefig('timeseries.png')

Step 5: Derive insights

# Top/bottom analysis
top_10 = df.nlargest(10, 'value')
bottom_10 = df.nsmallest(10, 'value')

# Trend analysis
df['month'] = df['date'].dt.to_period('M')
monthly_trend = df.groupby('month')['value'].sum()
growth = monthly_trend.pct_change() * 100

# Segment analysis
segments = df.groupby('segment').agg({
    'revenue': 'sum',
    'customers': 'nunique',
    'orders': 'count'
})
segments['avg_order_value'] = segments['revenue'] / segments['orders']

Output format

Analysis report structure

# Data Analysis Report

## 1. Dataset overview
- Dataset: [name]
- Records: X,XXX
- Columns: XX
- Date range: YYYY-MM-DD ~ YYYY-MM-DD

## 2. Key findings
- Insight 1
- Insight 2
- Insight 3

## 3. Statistical summary
| Metric | Value |
|------|-----|
| Mean | X.XX |
| Median | X.XX |
| Std dev | X.XX |

## 4. Recommendations
1. [Recommendation 1]
2. [Recommendation 2]

Best practices

  1. Understand the data first: Learn structure and meaning before analysis
  2. Incremental analysis: Move from simple to complex analyses
  3. Use visualization: Use a variety of charts to spot patterns
  4. Validate assumptions: Always verify assumptions about the data
  5. Reproducibility: Document analysis code and results

Constraints

Required rules (MUST)

  1. Preserve raw data (work on a copy)
  2. Document the analysis process
  3. Validate results

Prohibited (MUST NOT)

  1. Do not expose sensitive personal data
  2. Do not draw unsupported conclusions

References

Examples

Example 1: Basic usage

Example 2: Advanced usage

Discussion

Product Hunt–style comments (not star reviews)
  • No comments yet — start the thread.
general reviews

Ratings

4.836 reviews
  • Anika Zhang· Dec 28, 2024

    data-analysis fits our agent workflows well — practical, well scoped, and easy to wire into existing repos.

  • Chaitanya Patil· Dec 24, 2024

    Useful defaults in data-analysis — fewer surprises than typical one-off scripts, and it plays nicely with `npx skills` flows.

  • Kwame Haddad· Dec 8, 2024

    data-analysis is among the better-maintained entries we tried; worth keeping pinned for repeat workflows.

  • Kwame Taylor· Nov 27, 2024

    Useful defaults in data-analysis — fewer surprises than typical one-off scripts, and it plays nicely with `npx skills` flows.

  • Valentina Taylor· Nov 19, 2024

    Registry listing for data-analysis matched our evaluation — installs cleanly and behaves as described in the markdown.

  • Piyush G· Nov 15, 2024

    data-analysis is among the better-maintained entries we tried; worth keeping pinned for repeat workflows.

  • Omar Nasser· Nov 3, 2024

    Solid pick for teams standardizing on skills: data-analysis is focused, and the summary matches what you get after install.

  • Omar Desai· Oct 22, 2024

    data-analysis has been reliable in day-to-day use. Documentation quality is above average for community skills.

  • Hana Malhotra· Oct 18, 2024

    I recommend data-analysis for anyone iterating fast on agent tooling; clear intent and a small, reviewable surface area.

  • Advait Reddy· Oct 10, 2024

    data-analysis reduced setup friction for our internal harness; good balance of opinion and flexibility.

showing 1-10 of 36

1 / 4