profiling-tables

astronomer/agents · updated Apr 8, 2026

$npx skills add https://github.com/astronomer/agents --skill profiling-tables
0 commentsdiscussion
summary

Comprehensive statistical and quality analysis of database tables with structured profiling output.

  • Generates column-level statistics tailored to data type: min/max/percentiles for numeric columns, length metrics for strings, date ranges for timestamps
  • Performs cardinality analysis to identify categorical vs. high-cardinality columns and detect skewed distributions
  • Assesses data quality across five dimensions: completeness (NULL rates), uniqueness (duplicates), freshness (update time
skill.md

Data Profile

Generate a comprehensive profile of a table that a new team member could use to understand the data.

Step 1: Basic Metadata

Query column metadata:

SELECT COLUMN_NAME, DATA_TYPE, COMMENT
FROM <database>.INFORMATION_SCHEMA.COLUMNS
WHERE TABLE_SCHEMA = '<schema>' AND TABLE_NAME = '<table>'
ORDER BY ORDINAL_POSITION

If the table name isn't fully qualified, search INFORMATION_SCHEMA.TABLES to locate it first.

Step 2: Size and Shape

Run via run_sql:

SELECT
    COUNT(*) as total_rows,
    COUNT(*) / 1000000.0 as millions_of_rows
FROM <table>

Step 3: Column-Level Statistics

For each column, gather appropriate statistics based on data type:

Numeric Columns

SELECT
    MIN(column_name) as min_val,
    MAX(column_name) as max_val,
    AVG(column_name) as avg_val,
    STDDEV(column_name) as std_dev,
    PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY column_name) as median,
    SUM(CASE WHEN column_name IS NULL THEN 1 ELSE 0 END) as null_count,
    COUNT(DISTINCT column_name) as distinct_count
FROM <table>

String Columns

SELECT
    MIN(LEN(column_name)) as min_length,
    MAX(LEN(column_name)) as max_length,
    AVG(LEN(column_name)) as avg_length,
    SUM(CASE WHEN column_name IS NULL OR column_name = '' THEN 1 ELSE 0 END) as empty_count,
    COUNT(DISTINCT column_name) as distinct_count
FROM <table>

Date/Timestamp Columns

SELECT
    MIN(column_name) as earliest,
    MAX(column_name) as latest,
    DATEDIFF('day', MIN(column_name), MAX(column_name)) as date_range_days,
    SUM(CASE WHEN column_name IS NULL THEN 1 ELSE 0 END) as null_count
FROM <table>

Step 4: Cardinality Analysis

For columns that look like categorical/dimension keys:

SELECT
    column_name,
    COUNT(*) as frequency,
    ROUND(COUNT(*) * 100.0 / SUM(COUNT(*)) OVER(), 2) as percentage
FROM <table>
GROUP BY column_name
ORDER BY frequency DESC
LIMIT 20

This reveals:

  • High-cardinality columns (likely IDs or unique values)
  • Low-cardinality columns (likely categories or status fields)
  • Skewed distributions (one value dominates)

Step 5: Sample Data

Get representative rows:

SELECT *
FROM <table>
LIMIT 10

If the table is large and you want variety, sample from different time periods or categories.

Step 6: Data Quality Assessment

Summarize quality across dimensions:

Completeness

  • Which columns have NULLs? What percentage?
  • Are NULLs expected or problematic?

Uniqueness

  • Does the apparent primary key have duplicates?
  • Are there unexpected duplicate rows?

Freshness

  • When was data last updated? (MAX of timestamp columns)
  • Is the update frequency as expected?

Validity

  • Are there values outside expected ranges?
  • Are there invalid formats (dates, emails, etc.)?
  • Are there orphaned foreign keys?

Consistency

  • Do related columns make sense together?
  • Are there logical contradictions?

Step 7: Output Summary

Provide a structured profile:

Overview

2-3 sentences describing what this table contains, who uses it, and how fresh it is.

Schema

Column Type Nulls% Distinct Description
... ... ... ... ...

Key Statistics

  • Row count: X
  • Date range: Y to Z
  • Last updated: timestamp

Data Quality Score

  • Completeness: X/10
  • Uniqueness: X/10
  • Freshness: X/10
  • Overall: X/10

Potential Issues

List any data quality concerns discovered.

Recommended Queries

3-5 useful queries for common questions about this data.

Discussion

Product Hunt–style comments (not star reviews)
  • No comments yet — start the thread.
general reviews

Ratings

4.667 reviews
  • Chaitanya Patil· Dec 28, 2024

    Registry listing for profiling-tables matched our evaluation — installs cleanly and behaves as described in the markdown.

  • Mateo Patel· Dec 28, 2024

    Useful defaults in profiling-tables — fewer surprises than typical one-off scripts, and it plays nicely with `npx skills` flows.

  • Hana Brown· Dec 20, 2024

    profiling-tables fits our agent workflows well — practical, well scoped, and easy to wire into existing repos.

  • Diya Agarwal· Dec 20, 2024

    Registry listing for profiling-tables matched our evaluation — installs cleanly and behaves as described in the markdown.

  • Ishan Chawla· Dec 8, 2024

    We added profiling-tables from the explainx registry; install was straightforward and the SKILL.md answered most questions upfront.

  • Soo Thomas· Nov 27, 2024

    Keeps context tight: profiling-tables is the kind of skill you can hand to a new teammate without a long onboarding doc.

  • Piyush G· Nov 19, 2024

    profiling-tables reduced setup friction for our internal harness; good balance of opinion and flexibility.

  • Maya Liu· Nov 11, 2024

    I recommend profiling-tables for anyone iterating fast on agent tooling; clear intent and a small, reviewable surface area.

  • Charlotte Flores· Nov 11, 2024

    profiling-tables reduced setup friction for our internal harness; good balance of opinion and flexibility.

  • Dev Flores· Nov 11, 2024

    profiling-tables is among the better-maintained entries we tried; worth keeping pinned for repeat workflows.

showing 1-10 of 67

1 / 7