data_analysis▌
artificialanalysis/stirrup · updated Apr 8, 2026
Comprehensive data analysis toolkit using Polars - a blazingly fast DataFrame library. This skill provides instructions, reference documentation, and ready-to-use scripts for common data analysis tasks.
Data Analysis Skill
Comprehensive data analysis toolkit using Polars - a blazingly fast DataFrame library. This skill provides instructions, reference documentation, and ready-to-use scripts for common data analysis tasks.
Iteration Checkpoints
| Step | What to Present | User Input Type |
|---|---|---|
| Data Loading | Shape, columns, sample rows | "Is this the right data?" |
| Data Exploration | Summary stats, data quality issues | "Any columns to focus on?" |
| Transformation | Before/after comparison | "Does this transformation look correct?" |
| Analysis | Key findings, charts | "Should I dig deeper into anything?" |
| Export | Output preview | "Ready to save, or any changes?" |
Quick Start
import polars as pl
from polars import col
# Load data
df = pl.read_csv("data.csv")
# Explore
print(df.shape, df.schema)
df.describe()
# Transform and analyze
result = (
df.filter(col("value") > 0)
.group_by("category")
.agg(col("value").sum().alias("total"))
.sort("total", descending=True)
)
# Export
result.write_csv("output.csv")
When to Use This Skill
- Loading datasets (CSV, JSON, Parquet, Excel, databases)
- Data cleaning, filtering, and transformation
- Aggregations, grouping, and pivot tables
- Statistical analysis and summary statistics
- Time series analysis and resampling
- Joining and merging multiple datasets
- Creating visualizations and charts
- Exporting results to various formats
Skill Contents
Reference Documentation
Detailed API reference and patterns for specific operations:
reference/loading.md- Loading data from all supported formatsreference/transformations.md- Column operations, filtering, sorting, type castingreference/aggregations.md- Group by, window functions, running totalsreference/time_series.md- Date parsing, resampling, lag featuresreference/statistics.md- Correlations, distributions, hypothesis testing setupreference/visualization.md- Creating charts with matplotlib/plotly
Ready-to-Use Scripts
Executable Python scripts for common tasks:
scripts/explore_data.py- Quick dataset exploration and profilingscripts/summary_stats.py- Generate comprehensive statistics report
Core Patterns
Loading Data
# CSV (most common)
df = pl.read_csv("data.csv")
# Lazy loading for large files
df = pl.scan_csv("large.csv").filter(col("x") > 0).collect()
# Parquet (recommended for large datasets)
df = pl.read_parquet("data.parquet")
# JSON
df = pl.read_json("data.json")
df = pl.read_ndjson("data.ndjson") # Newline-delimited
Filtering and Selection
# Select columns
df.select("col1", "col2")
df.select(col("name"), col("value") * 2)
# Filter rows
df.filter(col("age") > 25)
df.filter((col("status") == "active") & (col("value") > 100))
df.filter(col("name").str.contains("Smith"))
Transformations
# Add/modify columns
df = df.with_columns(
(col("price") * col("qty")).alias("total"),
col("date_str").str.to_date("%Y-%m-%d").alias("date"),
)
# Conditional values
df = df.with_columns(
pl.when(col("score") >= 90).then(pl.lit("A"))
.when(col("score") >= 80).then(pl.lit("B"))
.otherwise(pl.lit("C"))
.alias("grade")
)
Aggregations
# Group by
df.group_by("category").agg(
col("value").sum().alias("total"),
col("value").mean().alias("avg"),
pl.len().alias("count"),
)
# Window functions
df.with_columns(
col("value").sum().over("group").alias("group_total"),
col("value").rank().over("group").alias("rank_in_group"),
)
Exporting
df.write_csv("output.csv")
df.write_parquet("output.parquet")
df.write_json("output.json", row_oriented=True)
Best Practices
- Use lazy evaluation for large datasets:
pl.scan_csv()+.collect() - Filter early to reduce data volume before expensive operations
- Select only needed columns to minimize memory usage
- Prefer Parquet for storage - faster I/O, better compression
- Use
.explain()to understand and optimize query plans
Discussion
Product Hunt–style comments (not star reviews)- No comments yet — start the thread.
Ratings
4.8★★★★★39 reviews- ★★★★★Nia Anderson· Dec 24, 2024
Keeps context tight: data_analysis is the kind of skill you can hand to a new teammate without a long onboarding doc.
- ★★★★★Kofi Abebe· Dec 20, 2024
Useful defaults in data_analysis — fewer surprises than typical one-off scripts, and it plays nicely with `npx skills` flows.
- ★★★★★Xiao Kapoor· Dec 20, 2024
data_analysis has been reliable in day-to-day use. Documentation quality is above average for community skills.
- ★★★★★Maya Malhotra· Dec 16, 2024
data_analysis fits our agent workflows well — practical, well scoped, and easy to wire into existing repos.
- ★★★★★Rahul Santra· Nov 15, 2024
Useful defaults in data_analysis — fewer surprises than typical one-off scripts, and it plays nicely with `npx skills` flows.
- ★★★★★Advait Mehta· Nov 15, 2024
data_analysis is among the better-maintained entries we tried; worth keeping pinned for repeat workflows.
- ★★★★★Min Malhotra· Nov 11, 2024
Solid pick for teams standardizing on skills: data_analysis is focused, and the summary matches what you get after install.
- ★★★★★Nia Bansal· Nov 7, 2024
I recommend data_analysis for anyone iterating fast on agent tooling; clear intent and a small, reviewable surface area.
- ★★★★★Kofi Taylor· Oct 26, 2024
Solid pick for teams standardizing on skills: data_analysis is focused, and the summary matches what you get after install.
- ★★★★★Kofi Diallo· Oct 22, 2024
Useful defaults in data_analysis — fewer surprises than typical one-off scripts, and it plays nicely with `npx skills` flows.
showing 1-10 of 39