Generate a comprehensive, user-editable schema reference file for the data warehouse.
Works with
AI-first code editor with Composer
Before installing skills in Cursor, ensure your development environment meets these requirements:
node --versioninitExecute the skills CLI command in your project's root directory to begin installation:
Fetches init from astronomer/agents and configures it for Cursor.
The CLI shows a list of agents. Use arrow keys and space to select Cursor:
Confirm successful installation by checking the skill directory location:
Restart Cursor to activate init. Access via /init in your agent's command palette.
We perform automated surface-level scans (Gen AI Scanner, Socket, Snyk) during installation. These checks detect common vulnerabilities but do not guarantee complete security. Always review skill source code and verify the publisher's reputation before production use.
Skills execute code in your environment. Always review source, verify the publisher, and test in isolation before production.
Submit your Claude Code skill and start earning
Create detailed user stories, acceptance criteria, and feature specs
Example
Generate user stories for 'password reset feature' with acceptance criteria, edge cases, and test scenarios
Reduce spec writing time by 50%, ensure comprehensive coverage
Research competitors, compare features, identify gaps
Example
Analyze 5 competitor products, create feature comparison matrix, suggest differentiation opportunities
Complete competitive research in 2 hours instead of 2 days
Evaluate features using frameworks (RICE, ICE, Kano) and create prioritized backlogs
Example
Score 20 feature ideas using RICE framework, generate prioritized roadmap with rationale
0
total installs
0
this week
302
GitHub stars
0
upvotes
Run in your terminal
0
installs
0
this week
302
stars
Generate a comprehensive, user-editable schema reference file for the data warehouse.
Scripts: ../analyzing-data/scripts/ β All CLI commands below are relative to the analyzing-data skill's directory. Before running any scripts/cli.py command, cd to ../analyzing-data/ relative to this file.
.astro/warehouse.md - a version-controllable, team-shareable referencecat ~/.astro/agents/warehouse.yml
Get the list of databases to discover (e.g., databases: [HQ, ANALYTICS, RAW]).
Launch a subagent to find business context in code:
Task(
subagent_type="Explore",
prompt="""
Search for data model documentation in the codebase:
1. dbt models: **/models/**/*.yml, **/schema.yml
- Extract table descriptions, column descriptions
- Note primary keys and tests
2. Gusty/declarative SQL: **/dags/**/*.sql with YAML frontmatter
- Parse frontmatter for: description, primary_key, tests
- Note schema mappings
3. AGENTS.md or CLAUDE.md files with data layer documentation
Return a mapping of:
table_name -> {description, primary_key, important_columns, layer}
"""
)
Launch one subagent per database using the Task tool:
For each database in configured_databases:
Task(
subagent_type="general-purpose",
prompt="""
Discover all metadata for database {DATABASE}.
Use the CLI to run SQL queries:
# Scripts are relative to ../analyzing-data/
uv run scripts/cli.py exec "df = run_sql('...')"
uv run scripts/cli.py exec "print(df)"
1. Query schemas:
SELECT SCHEMA_NAME FROM {DATABASE}.INFORMATION_SCHEMA.SCHEMATA
2. Query tables with row counts:
SELECT TABLE_SCHEMA, TABLE_NAME, ROW_COUNT, COMMENT
FROM {DATABASE}.INFORMATION_SCHEMA.TABLES
ORDER BY TABLE_SCHEMA, TABLE_NAME
3. For important schemas (MODEL_*, METRICS_*, MART_*), query columns:
SELECT TABLE_NAME, COLUMN_NAME, DATA_TYPE, COMMENT
FROM {DATABASE}.INFORMATION_SCHEMA.COLUMNS
WHERE TABLE_SCHEMA = 'X'
Return a structured summary:
- Database name
- List of schemas with table counts
- For each table: name, row_count, key columns
- Flag any tables with >100M rows as "large"
"""
)
Run all subagents in parallel (single message with multiple Task calls).
For key categorical columns (like OPERATOR, STATUS, TYPE, FEATURE), discover value families:
uv run cli.py exec "df = run_sql('''
SELECT DISTINCT column_name, COUNT(*) as occurrences
FROM table
WHERE column_name IS NOT NULL
GROUP BY column_name
ORDER BY occurrences DESC
LIMIT 50
''')"
uv run cli.py exec "print(df)"
Group related values into families by common prefix/suffix (e.g., Export* for ExportCSV, ExportJSON, ExportParquet).
Combine warehouse metadata + codebase context:
Write the file to:
.astro/warehouse.md (default - project-specific, version-controllable)~/.astro/agents/warehouse.md (if --global flag)# Warehouse Schema
> Generated by `/data:init` on {DATE}. Edit freely to add business context.
## Quick Reference
| Concept | Table | Key Column | Date Column |
|---------|-------|------------|-------------|
| customers | HQ.MODEL_ASTRO.ORGANIZATIONS | ORG_ID | CREATED_AT |
<!-- Add your concept mappings here -->
## Categorical Columns
When filtering on these columns, explore value families first (values often have variants):
| Table | Column | Value Families |
|-------|--------|----------------|
| {TABLE} | {COLUMN} | `{PREFIX}*` ({VALUE1}, {VALUE2}, ...) |
<!-- Populated by /data:init from actual warehouse data -->
## Data Layer Hierarchy
Query downstream first: `reporting` > `mart_*` > `metric_*` > `model_*` > `IN_*`
| Layer | Prefix | Purpose |
|-------|--------|---------|
| Reporting | `reporting.*` | Dashboard-optimized |
| Mart | `mart_*` | Combined analytics |
| Metric | `metric_*` | KPIs at various grains |
| Model | `model_*` | Cleansed sources of truth |
| Raw | `IN_*` | Source data - avoid |
## {DATABASE} Database
### {SCHEMA} Schema
#### {TABLE_NAME}
{DESCRIPTION from code if found}
| Column | Type | Description |
|--------|------|-------------|
| COL1 | VARCHAR | {from code or inferred} |
- **Rows:** {ROW_COUNT}
- **Key column:** {PRIMARY_KEY from code or inferred}
{IF ROW_COUNT > 100M: - **β οΈ WARNING:** Large table - always add date filters}
## Relationships
{Inferred relationships based on column names like *_ID}
| Option | Effect |
|---|---|
/data:init |
Generate .astro/warehouse.md |
/data:init --refresh |
Regenerate, preserving user edits |
/data:init --database HQ |
Only discover specific database |
/data:init --global |
Write to ~/.astro/agents/ instead |
After generating warehouse.md, populate the concept cache:
# Scripts are relative to ../analyzing-data/
uv run cli.py concept import -p .astro/warehouse.md
uv run cli.py concept learn customers HQ.MART_CUST.CURRENT_ASTRO_CUSTS -k ACCT_ID
Ask the user:
Would you like to add the Quick Reference table to your CLAUDE.md file?
This ensures the schema mappings are always in context for data queries, improving accuracy from ~25% to ~100% for complex queries.
Options:
- Yes, add to CLAUDE.md (Recommended) - Append Quick Reference section
- No, skip - Use warehouse.md and cache only
If user chooses Yes:
.claude/CLAUDE.md or CLAUDE.md exists.claude/CLAUDE.md with just the Quick ReferenceQuick Reference section to add:
## Data Warehouse Quick Reference
When querying the warehouse, use these table mappings:
| Concept | Table | Key Column | Date Column |
|---------|-------|------------|-------------|
{rows from warehouse.md Quick Reference}
**Large tables (always filter by date):** {list tables with >100M rows}
> Auto-generated by `/data:init`. Run `/data:init --refresh` to update.
If yes: Append the Quick Reference section to .claude/CLAUDE.md or CLAUDE.md.
Tell the user:
Generated .astro/warehouse.md
Summary:
- {N} databases, {N} schemas, {N} tables
- {N} tables enriched with code descriptions
- {N} concepts cached for instant lookup
Next steps:
1. Edit .astro/warehouse.md to add business context
2. Commit to version control
3. Run /data:init --refresh when schema changes
When --refresh is sp
Make data-driven prioritization decisions faster
Draft PRDs, status updates, and stakeholder presentations
Example
Create executive summary of Q3 roadmap, monthly progress report, feature launch announcement
Save 3-5 hours/week on communication overhead
Prerequisites
Time Estimate
30-60 minutes to see productivity improvements
Steps
Common Pitfalls
β Do
β Don't
π‘ Pro Tips
β Use when
Use for user story writing, competitive research, roadmap prioritization, stakeholder communication, and PRD drafting. Best for reducing repetitive documentation and research work.
β Avoid when
Avoid for strategic product vision (requires deep customer empathy), pricing decisions (needs market and financial expertise), or when face-to-face customer discovery is more valuable than speed.
mattpocock/skills
parcadei/continuous-claude-v3
cursor/plugins
pproenca/dot-skills
ailabs-393/ai-labs-claude-skills
mattpocock/skills
init fits our agent workflows well β practical, well scoped, and easy to wire into existing repos.
We added init from the explainx registry; install was straightforward and the SKILL.md answered most questions upfront.
init reduced setup friction for our internal harness; good balance of opinion and flexibility.
Useful defaults in init β fewer surprises than typical one-off scripts, and it plays nicely with `npx skills` flows.
init has been reliable in day-to-day use. Documentation quality is above average for community skills.
Registry listing for init matched our evaluation β installs cleanly and behaves as described in the markdown.
Useful defaults in init β fewer surprises than typical one-off scripts, and it plays nicely with `npx skills` flows.
Registry listing for init matched our evaluation β installs cleanly and behaves as described in the markdown.
Useful defaults in init β fewer surprises than typical one-off scripts, and it plays nicely with `npx skills` flows.
Registry listing for init matched our evaluation β installs cleanly and behaves as described in the markdown.
showing 1-10 of 73