How do I install data-engineering-data-pipeline?

Run `npx skills add https://github.com/sickn33/antigravity-awesome-skills --skill data-engineering-data-pipeline` in your terminal. You need to have run `npx skills init` once in your project first.

Which agent frameworks does data-engineering-data-pipeline support?

data-engineering-data-pipeline works with any agent framework supported by the Skills registry, including Claude Code, Cursor, GitHub Copilot, Cline, Codex, and Gemini CLI.

Is data-engineering-data-pipeline free to use?

Yes. data-engineering-data-pipeline is free to install and use. It is available from the open explainx.ai skill registry published by sickn33.

Where can I read ratings and reviews for data-engineering-data-pipeline?

Community ratings and review text appear on this explainx.ai skill page below the description. Reviews use a 1–5 scale and may include short written feedback from signed-in members.

How do I install data-engineering-data-pipeline?

Run `npx skills add https://github.com/sickn33/antigravity-awesome-skills --skill data-engineering-data-pipeline` in your terminal. You need to have run `npx skills init` once in your project first.

Which agent frameworks does data-engineering-data-pipeline support?

data-engineering-data-pipeline works with any agent framework supported by the Skills registry, including Claude Code, Cursor, GitHub Copilot, Cline, Codex, and Gemini CLI.

Is data-engineering-data-pipeline free to use?

Yes. data-engineering-data-pipeline is free to install and use. It is available from the open explainx.ai skill registry published by sickn33.

Where can I read ratings and reviews for data-engineering-data-pipeline?

Community ratings and review text appear on this explainx.ai skill page below the description. Reviews use a 1–5 scale and may include short written feedback from signed-in members.

How do I install data-engineering-data-pipeline?

Run `npx skills add https://github.com/sickn33/antigravity-awesome-skills --skill data-engineering-data-pipeline` in your terminal. You need to have run `npx skills init` once in your project first.

Which agent frameworks does data-engineering-data-pipeline support?

data-engineering-data-pipeline works with any agent framework supported by the Skills registry, including Claude Code, Cursor, GitHub Copilot, Cline, Codex, and Gemini CLI.

Is data-engineering-data-pipeline free to use?

Yes. data-engineering-data-pipeline is free to install and use. It is available from the open explainx.ai skill registry published by sickn33.

Where can I read ratings and reviews for data-engineering-data-pipeline?

Community ratings and review text appear on this explainx.ai skill page below the description. Reviews use a 1–5 scale and may include short written feedback from signed-in members.

How do I install data-engineering-data-pipeline?

Run `npx skills add https://github.com/sickn33/antigravity-awesome-skills --skill data-engineering-data-pipeline` in your terminal. You need to have run `npx skills init` once in your project first.

Which agent frameworks does data-engineering-data-pipeline support?

data-engineering-data-pipeline works with any agent framework supported by the Skills registry, including Claude Code, Cursor, GitHub Copilot, Cline, Codex, and Gemini CLI.

Is data-engineering-data-pipeline free to use?

Yes. data-engineering-data-pipeline is free to install and use. It is available from the open explainx.ai skill registry published by sickn33.

Where can I read ratings and reviews for data-engineering-data-pipeline?

Community ratings and review text appear on this explainx.ai skill page below the description. Reviews use a 1–5 scale and may include short written feedback from signed-in members.

How do I install data-engineering-data-pipeline?

Run `npx skills add https://github.com/sickn33/antigravity-awesome-skills --skill data-engineering-data-pipeline` in your terminal. You need to have run `npx skills init` once in your project first.

Which agent frameworks does data-engineering-data-pipeline support?

data-engineering-data-pipeline works with any agent framework supported by the Skills registry, including Claude Code, Cursor, GitHub Copilot, Cline, Codex, and Gemini CLI.

Is data-engineering-data-pipeline free to use?

Yes. data-engineering-data-pipeline is free to install and use. It is available from the open explainx.ai skill registry published by sickn33.

Where can I read ratings and reviews for data-engineering-data-pipeline?

Community ratings and review text appear on this explainx.ai skill page below the description. Reviews use a 1–5 scale and may include short written feedback from signed-in members.

How do I install data-engineering-data-pipeline?

Run `npx skills add https://github.com/sickn33/antigravity-awesome-skills --skill data-engineering-data-pipeline` in your terminal. You need to have run `npx skills init` once in your project first.

Which agent frameworks does data-engineering-data-pipeline support?

data-engineering-data-pipeline works with any agent framework supported by the Skills registry, including Claude Code, Cursor, GitHub Copilot, Cline, Codex, and Gemini CLI.

Is data-engineering-data-pipeline free to use?

Yes. data-engineering-data-pipeline is free to install and use. It is available from the open explainx.ai skill registry published by sickn33.

Where can I read ratings and reviews for data-engineering-data-pipeline?

Community ratings and review text appear on this explainx.ai skill page below the description. Reviews use a 1–5 scale and may include short written feedback from signed-in members.

How do I install data-engineering-data-pipeline?

Run `npx skills add https://github.com/sickn33/antigravity-awesome-skills --skill data-engineering-data-pipeline` in your terminal. You need to have run `npx skills init` once in your project first.

Which agent frameworks does data-engineering-data-pipeline support?

data-engineering-data-pipeline works with any agent framework supported by the Skills registry, including Claude Code, Cursor, GitHub Copilot, Cline, Codex, and Gemini CLI.

Is data-engineering-data-pipeline free to use?

Yes. data-engineering-data-pipeline is free to install and use. It is available from the open explainx.ai skill registry published by sickn33.

Where can I read ratings and reviews for data-engineering-data-pipeline?

Community ratings and review text appear on this explainx.ai skill page below the description. Reviews use a 1–5 scale and may include short written feedback from signed-in members.

How do I install data-engineering-data-pipeline?

Run `npx skills add https://github.com/sickn33/antigravity-awesome-skills --skill data-engineering-data-pipeline` in your terminal. You need to have run `npx skills init` once in your project first.

Which agent frameworks does data-engineering-data-pipeline support?

data-engineering-data-pipeline works with any agent framework supported by the Skills registry, including Claude Code, Cursor, GitHub Copilot, Cline, Codex, and Gemini CLI.

Is data-engineering-data-pipeline free to use?

Yes. data-engineering-data-pipeline is free to install and use. It is available from the open explainx.ai skill registry published by sickn33.

Where can I read ratings and reviews for data-engineering-data-pipeline?

Community ratings and review text appear on this explainx.ai skill page below the description. Reviews use a 1–5 scale and may include short written feedback from signed-in members.

How do I install data-engineering-data-pipeline?

Run `npx skills add https://github.com/sickn33/antigravity-awesome-skills --skill data-engineering-data-pipeline` in your terminal. You need to have run `npx skills init` once in your project first.

Which agent frameworks does data-engineering-data-pipeline support?

data-engineering-data-pipeline works with any agent framework supported by the Skills registry, including Claude Code, Cursor, GitHub Copilot, Cline, Codex, and Gemini CLI.

Is data-engineering-data-pipeline free to use?

Yes. data-engineering-data-pipeline is free to install and use. It is available from the open explainx.ai skill registry published by sickn33.

Where can I read ratings and reviews for data-engineering-data-pipeline?

Community ratings and review text appear on this explainx.ai skill page below the description. Reviews use a 1–5 scale and may include short written feedback from signed-in members.

How do I install data-engineering-data-pipeline?

Run `npx skills add https://github.com/sickn33/antigravity-awesome-skills --skill data-engineering-data-pipeline` in your terminal. You need to have run `npx skills init` once in your project first.

Which agent frameworks does data-engineering-data-pipeline support?

data-engineering-data-pipeline works with any agent framework supported by the Skills registry, including Claude Code, Cursor, GitHub Copilot, Cline, Codex, and Gemini CLI.

Is data-engineering-data-pipeline free to use?

Yes. data-engineering-data-pipeline is free to install and use. It is available from the open explainx.ai skill registry published by sickn33.

Where can I read ratings and reviews for data-engineering-data-pipeline?

Community ratings and review text appear on this explainx.ai skill page below the description. Reviews use a 1–5 scale and may include short written feedback from signed-in members.

How do I install data-engineering-data-pipeline?

Run `npx skills add https://github.com/sickn33/antigravity-awesome-skills --skill data-engineering-data-pipeline` in your terminal. You need to have run `npx skills init` once in your project first.

Productivity

data-engineering-data-pipeline▌

sickn33/antigravity-awesome-skills · updated Apr 8, 2026

MDX-style export adds YAML metadata + attribution linking explainx.ai and this canonical listing URL.

$npx skills add https://github.com/sickn33/antigravity-awesome-skills --skill data-engineering-data-pipeline

0 commentsdiscussion

summary

You are a data pipeline architecture expert specializing in scalable, reliable, and cost-effective data pipelines for batch and streaming data processing.

skill.md

Data Pipeline Architecture

You are a data pipeline architecture expert specializing in scalable, reliable, and cost-effective data pipelines for batch and streaming data processing.

Use this skill when

Working on data pipeline architecture tasks or workflows
Needing guidance, best practices, or checklists for data pipeline architecture

Do not use this skill when

The task is unrelated to data pipeline architecture
You need a different domain or tool outside this scope

Requirements

$ARGUMENTS

Core Capabilities

Design ETL/ELT, Lambda, Kappa, and Lakehouse architectures
Implement batch and streaming data ingestion
Build workflow orchestration with Airflow/Prefect
Transform data using dbt and Spark
Manage Delta Lake/Iceberg storage with ACID transactions
Implement data quality frameworks (Great Expectations, dbt tests)
Monitor pipelines with CloudWatch/Prometheus/Grafana
Optimize costs through partitioning, lifecycle policies, and compute optimization

Instructions

1. Architecture Design

Assess: sources, volume, latency requirements, targets
Select pattern: ETL (transform before load), ELT (load then transform), Lambda (batch + speed layers), Kappa (stream-only), Lakehouse (unified)
Design flow: sources → ingestion → processing → storage → serving
Add observability touchpoints

2. Ingestion Implementation

Batch

Incremental loading with watermark columns
Retry logic with exponential backoff
Schema validation and dead letter queue for invalid records
Metadata tracking (_extracted_at, _source)

Streaming

Kafka consumers with exactly-once semantics
Manual offset commits within transactions
Windowing for time-based aggregations
Error handling and replay capability

3. Orchestration

Airflow

Task groups for logical organization
XCom for inter-task communication
SLA monitoring and email alerts
Incremental execution with execution_date
Retry with exponential backoff

Prefect

Task caching for idempotency
Parallel execution with .submit()
Artifacts for visibility
Automatic retries with configurable delays

4. Transformation with dbt

Staging layer: incremental materialization, deduplication, late-arriving data handling
Marts layer: dimensional models, aggregations, business logic
Tests: unique, not_null, relationships, accepted_values, custom data quality tests
Sources: freshness checks, loaded_at_field tracking
Incremental strategy: merge or delete+insert

5. Data Quality Framework

Great Expectations

Table-level: row count, column count
Column-level: uniqueness, nullability, type validation, value sets, ranges
Checkpoints for validation execution
Data docs for documentation
Failure notifications

dbt Tests

Schema tests in YAML
Custom data quality tests with dbt-expectations
Test results tracked in metadata

6. Storage Strategy

Delta Lake

ACID transactions with append/overwrite/merge modes
Upsert with predicate-based matching
Time travel for historical queries
Optimize: compact small files, Z-order clustering
Vacuum to remove old files

Apache Iceberg

Partitioning and sort order optimization
MERGE INTO for upserts
Snapshot isolation and time travel
File compaction with binpack strategy
Snapshot expiration for cleanup

7. Monitoring & Cost Optimization

Monitoring

Track: records processed/failed, data size, execution time, success/failure rates
CloudWatch metrics and custom namespaces
SNS alerts for critical/warning/info events
Data freshness checks
Performance trend analysis

Cost Optimization

Partitioning: date/entity-based, avoid over-partitioning (keep >1GB)
File sizes: 512MB-1GB for Parquet
Lifecycle policies: hot (Standard) → warm (IA) → cold (Glacier)
Compute: spot instances for batch, on-demand for streaming, serverless for adhoc
Query optimization: partition pruning, clustering, predicate pushdown

Example: Minimal Batch Pipeline

# Batch ingestion with validation
from batch_ingestion import BatchDataIngester
from storage.delta_lake_manager import DeltaLakeManager
from data_quality.expectations_suite import DataQualityFramework

ingester = BatchDataIngester(config={})

# Extract with incremental loading
df = ingester.extract_from_database(
    connection_string='postgresql://host:5432/db',
    query='SELECT * FROM orders',
    watermark_column='updated_at',
    last_watermark=last_run_timestamp
)

# Validate
schema = {'required_fields': ['id', 'user_id'], 'dtypes': {'id': 'int64'}}
df = ingester.validate_and_clean(df, schema)

# Data quality checks
dq = DataQualityFramework()
result = dq.validate_dataframe(df, suite_name='orders_suite', data_asset_name='orders')

# Write to Delta Lake
delta_mgr = DeltaLakeManager(storage_path='s3://lake')
delta_mgr.create_or_update_table(
    df=df,
    table_name='orders',
    partition_columns=['order_date'],
    mode='append'
)

# Save failed records
ingester.save_dead_letter_queue('s3://lake/dlq/orders')

Output Deliverables

1. Architecture Documentation

Architecture diagram with data flow
Technology stack with justification
Scalability analysis and growth patterns
Failure modes and recovery strategies

2. Implementation Code

Ingestion: batch/streaming with error handling
Transformation: dbt models (staging → marts) or Spark jobs
Orchestration: Airflow/Prefect DAGs with dependencies
Storage: Delta/Iceberg table management
Data quality: Great Expectations suites and dbt tests

3. Configuration Files

Orchestration: DAG definitions, schedules, retry policies
dbt: models, sources, tests, project config
Infrastructure: Docker Compose, K8s manifests, Terraform
Environment: dev/staging/prod configs

4. Monitoring & Observability

Metrics: execution time, records processed, quality scores
Alerts: failures, performance degradation, data freshness
Dashboards: Grafana/CloudWatch for pipeline health
Logging: structured logs with correlation IDs

5. Operations Guide

Deployment procedures and rollback strategy
Troubleshooting guide for common issues
Scaling guide for increased volume
Cost optimization strategies and savings
Disaster recovery and backup procedures

Success Criteria

Pipeline meets defined SLA (latency, throughput)
Data quality checks pass with >99% success rate
Automatic retry and alerting on failures
Comprehensive monitoring shows health and performance
Documentation enables team maintenance
Cost optimization reduces infrastructure costs by 30-50%
Schema evolution without downtime
End-to-end data lineage tracked

how to use data-engineering-data-pipeline

How to use data-engineering-data-pipeline on Cursor

AI-first code editor with Composer

Prerequisites

Before installing skills in Cursor, ensure your development environment meets these requirements:

›Cursor installed and configured on your development machine
›Node.js version 16.0+ with npm package manager (verify with node --version)
›Active project directory or workspace where you want to add data-engineering-data-pipeline

Execute installation command

Execute the skills CLI command in your project's root directory to begin installation:

$npx skills add https://github.com/sickn33/antigravity-awesome-skills --skill data-engineering-data-pipeline

The skills CLI fetches data-engineering-data-pipeline from GitHub repository sickn33/antigravity-awesome-skills and configures it for Cursor.

Select Cursor when prompted

The CLI will show a list of available agents. Use arrow keys to navigate and space to select Cursor:

◆ Which agents do you want to install to?

│

│ ── Universal (.agents/skills) ── always included ────

│ • Amp

│ • Antigravity

│ • Cline

│ • Codex

│ ●Cursor(selected)

│ • Cursor

│ • Windsurf

Verify installation

Confirm successful installation by checking the skill directory location:

.cursor/skills/data-engineering-data-pipeline

Reload or restart Cursor to activate data-engineering-data-pipeline. Access the skill through slash commands (e.g., /data-engineering-data-pipeline) or your agent's skill management interface.

⚠

Security & Verification Notice

We perform automated surface-level scans (Gen AI Scanner, Socket, Snyk) during installation. These checks detect common vulnerabilities but do not guarantee complete security. Always review skill source code and verify the publisher's reputation before production use.

Skills execute code in your development environment. Always verify the publisher's identity, review recent commits, and test in isolated environments before production deployment.

Additional Resources

›View source code on GitHub ›Skills CLI documentation ›Learn more about Cursor ›What are agent skills?

List & Monetize Your Skill

Submit your Claude Code skill and start earning

GET_STARTED →

Use Cases▌

User Story & Requirements Generation

Create detailed user stories, acceptance criteria, and feature specs

Example

Generate user stories for 'password reset feature' with acceptance criteria, edge cases, and test scenarios

✓

Reduce spec writing time by 50%, ensure comprehensive coverage

Competitive Analysis

Research competitors, compare features, identify gaps

Example

Analyze 5 competitor products, create feature comparison matrix, suggest differentiation opportunities

✓

Complete competitive research in 2 hours instead of 2 days

Roadmap Prioritization

Evaluate features using frameworks (RICE, ICE, Kano) and create prioritized backlogs

Example

Score 20 feature ideas using RICE framework, generate prioritized roadmap with rationale

✓

Make data-driven prioritization decisions faster

Stakeholder Communication

Draft PRDs, status updates, and stakeholder presentations

Example

Create executive summary of Q3 roadmap, monthly progress report, feature launch announcement

✓

Save 3-5 hours/week on communication overhead

Implementation Guide▌

Prerequisites

›Claude Desktop or compatible AI client
›Access to product documentation and roadmap tools (Jira, Notion, etc.)
›Understanding of product management frameworks (RICE, Jobs-to-be-Done, etc.)
›Stakeholder contact information and communication channels

Time Estimate

30-60 minutes to see productivity improvements

Installation Steps

1.Install product management skill
2.Start with user story generation for known feature
3.Progress to competitive analysis: research 2-3 competitors
4.Use for roadmap prioritization: apply RICE/ICE scoring
5.Draft stakeholder communications and refine based on feedback
6.Build template library for recurring PM tasks
7.Share effective prompts with product team

Common Pitfalls

⚠Not validating competitive research—verify facts before sharing
⚠Accepting user stories without involving engineering team
⚠Over-relying on frameworks without qualitative judgment
⚠Not customizing outputs to company culture and communication style
⚠Skipping stakeholder validation of generated requirements

Best Practices▌

✓ Do

+Validate research and competitive analysis with real data
+Collaborate with engineering when generating technical requirements
+Customize frameworks and templates to your company context
+Use skill for first drafts, refine with stakeholder input
+Document successful prompt patterns for PM tasks
+Combine AI efficiency with human judgment and intuition

✗ Don't

−Don't publish competitive analysis without fact-checking
−Don't finalize user stories without engineering review
−Don't make prioritization decisions solely on AI scoring
−Don't skip customer validation of generated requirements
−Don't ignore company-specific context and culture

💡 Pro Tips

★Provide context: company goals, constraints, customer feedback
★Ask for alternatives: 'Show 3 ways to prioritize this roadmap'
★Request stakeholder-specific formatting: 'Executive summary vs. engineering spec'
★Use skill for 70% generation + 30% customization to company needs

When to Use This▌

✓ Use When

Use for user story writing, competitive research, roadmap prioritization, stakeholder communication, and PRD drafting. Best for reducing repetitive documentation and research work.

✗ Avoid When

Avoid for strategic product vision (requires deep customer empathy), pricing decisions (needs market and financial expertise), or when face-to-face customer discovery is more valuable than speed.

Learning Path▌

1Basic: user stories, feature specs, status updates
2Intermediate: competitive analysis, prioritization frameworks, PRDs
3Advanced: product strategy, go-to-market planning, OKR setting
4Expert: product vision, market positioning, business model innovation

Discussion

Product Hunt–style comments (not star reviews)

No comments yet — start the thread.

general reviews

Ratings

4.6★★★★★56 reviews

★★★★★Shikha Mishra· Dec 28, 2024
Keeps context tight: data-engineering-data-pipeline is the kind of skill you can hand to a new teammate without a long onboarding doc.
★★★★★Isabella Huang· Dec 28, 2024
data-engineering-data-pipeline is among the better-maintained entries we tried; worth keeping pinned for repeat workflows.
★★★★★Ganesh Mohane· Dec 24, 2024
Useful defaults in data-engineering-data-pipeline — fewer surprises than typical one-off scripts, and it plays nicely with `npx skills` flows.
★★★★★Luis Iyer· Dec 12, 2024
data-engineering-data-pipeline is among the better-maintained entries we tried; worth keeping pinned for repeat workflows.
★★★★★Sophia Ramirez· Dec 12, 2024
I recommend data-engineering-data-pipeline for anyone iterating fast on agent tooling; clear intent and a small, reviewable surface area.
★★★★★Yusuf Ghosh· Dec 8, 2024
Keeps context tight: data-engineering-data-pipeline is the kind of skill you can hand to a new teammate without a long onboarding doc.
★★★★★Noor Martinez· Nov 27, 2024
Registry listing for data-engineering-data-pipeline matched our evaluation — installs cleanly and behaves as described in the markdown.
★★★★★Yash Thakker· Nov 19, 2024
Registry listing for data-engineering-data-pipeline matched our evaluation — installs cleanly and behaves as described in the markdown.
★★★★★Omar Johnson· Nov 19, 2024
Useful defaults in data-engineering-data-pipeline — fewer surprises than typical one-off scripts, and it plays nicely with `npx skills` flows.
★★★★★Sophia Robinson· Nov 3, 2024
data-engineering-data-pipeline reduced setup friction for our internal harness; good balance of opinion and flexibility.

showing 1-10 of 56

1 / 6