senior-data-engineer▌
alirezarezvani/claude-skills · updated Apr 8, 2026
Production-grade data engineering skill for building scalable, reliable data systems.
Senior Data Engineer
Production-grade data engineering skill for building scalable, reliable data systems.
Table of Contents
- Trigger Phrases
- Quick Start
- Workflows
- Architecture Decision Framework
- Tech Stack
- Reference Documentation
- Troubleshooting
Trigger Phrases
Activate this skill when you see:
Pipeline Design:
- "Design a data pipeline for..."
- "Build an ETL/ELT process..."
- "How should I ingest data from..."
- "Set up data extraction from..."
Architecture:
- "Should I use batch or streaming?"
- "Lambda vs Kappa architecture"
- "How to handle late-arriving data"
- "Design a data lakehouse"
Data Modeling:
- "Create a dimensional model..."
- "Star schema vs snowflake"
- "Implement slowly changing dimensions"
- "Design a data vault"
Data Quality:
- "Add data validation to..."
- "Set up data quality checks"
- "Monitor data freshness"
- "Implement data contracts"
Performance:
- "Optimize this Spark job"
- "Query is running slow"
- "Reduce pipeline execution time"
- "Tune Airflow DAG"
Quick Start
Core Tools
# Generate pipeline orchestration config
python scripts/pipeline_orchestrator.py generate \
--type airflow \
--source postgres \
--destination snowflake \
--schedule "0 5 * * *"
# Validate data quality
python scripts/data_quality_validator.py validate \
--input data/sales.parquet \
--schema schemas/sales.json \
--checks freshness,completeness,uniqueness
# Optimize ETL performance
python scripts/etl_performance_optimizer.py analyze \
--query queries/daily_aggregation.sql \
--engine spark \
--recommend
Workflows
→ See references/workflows.md for details
Architecture Decision Framework
Use this framework to choose the right approach for your data pipeline.
Batch vs Streaming
| Criteria | Batch | Streaming |
|---|---|---|
| Latency requirement | Hours to days | Seconds to minutes |
| Data volume | Large historical datasets | Continuous event streams |
| Processing complexity | Complex transformations, ML | Simple aggregations, filtering |
| Cost sensitivity | More cost-effective | Higher infrastructure cost |
| Error handling | Easier to reprocess | Requires careful design |
Decision Tree:
Is real-time insight required?
├── Yes → Use streaming
│ └── Is exactly-once semantics needed?
│ ├── Yes → Kafka + Flink/Spark Structured Streaming
│ └── No → Kafka + consumer groups
└── No → Use batch
└── Is data volume > 1TB daily?
├── Yes → Spark/Databricks
└── No → dbt + warehouse compute
Lambda vs Kappa Architecture
| Aspect | Lambda | Kappa |
|---|---|---|
| Complexity | Two codebases (batch + stream) | Single codebase |
| Maintenance | Higher (sync batch/stream logic) | Lower |
| Reprocessing | Native batch layer | Replay from source |
| Use case | ML training + real-time serving | Pure event-driven |
When to choose Lambda:
- Need to train ML models on historical data
- Complex batch transformations not feasible in streaming
- Existing batch infrastructure
When to choose Kappa:
- Event-sourced architecture
- All processing can be expressed as stream operations
- Starting fresh without legacy systems
Data Warehouse vs Data Lakehouse
| Feature | Warehouse (Snowflake/BigQuery) | Lakehouse (Delta/Iceberg) |
|---|---|---|
| Best for | BI, SQL analytics | ML, unstructured data |
| Storage cost | Higher (proprietary format) | Lower (open formats) |
| Flexibility | Schema-on-write | Schema-on-read |
| Performance | Excellent for SQL | Good, improving |
| Ecosystem | Mature BI tools | Growing ML tooling |
Tech Stack
| Category | Technologies |
|---|---|
| Languages | Python, SQL, Scala |
| Orchestration | Airflow, Prefect, Dagster |
| Transformation | dbt, Spark, Flink |
| Streaming | Kafka, Kinesis, Pub/Sub |
| Storage | S3, GCS, Delta Lake, Iceberg |
| Warehouses | Snowflake, BigQuery, Redshift, Databricks |
| Quality | Great Expectations, dbt tests, Monte Carlo |
| Monitoring | Prometheus, Grafana, Datadog |
Reference Documentation
1. Data Pipeline Architecture
See references/data_pipeline_architecture.md for:
- Lambda vs Kappa architecture patterns
- Batch processing with Spark and Airflow
- Stream processing with Kafka and Flink
- Exactly-once semantics implementation
- Error handling and dead letter queues
2. Data Modeling Patterns
See references/data_modeling_patterns.md for:
- Dimensional modeling (Star/Snowflake)
- Slowly Changing Dimensions (SCD Types 1-6)
- Data Vault modeling
- dbt best practices
- Partitioning and clustering
3. DataOps Best Practices
See references/dataops_best_practices.md for:
- Data testing frameworks
- Data contracts and schema validation
- CI/CD for data pipelines
- Observability and lineage
- Incident response
Troubleshooting
→ See references/troubleshooting.md for details
Discussion
Product Hunt–style comments (not star reviews)- No comments yet — start the thread.
Ratings
4.5★★★★★74 reviews- ★★★★★Maya Rao· Dec 20, 2024
senior-data-engineer reduced setup friction for our internal harness; good balance of opinion and flexibility.
- ★★★★★Alexander Kapoor· Dec 20, 2024
I recommend senior-data-engineer for anyone iterating fast on agent tooling; clear intent and a small, reviewable surface area.
- ★★★★★Dhruvi Jain· Dec 16, 2024
We added senior-data-engineer from the explainx registry; install was straightforward and the SKILL.md answered most questions upfront.
- ★★★★★Isabella Torres· Dec 12, 2024
senior-data-engineer has been reliable in day-to-day use. Documentation quality is above average for community skills.
- ★★★★★Kiara Taylor· Dec 8, 2024
Registry listing for senior-data-engineer matched our evaluation — installs cleanly and behaves as described in the markdown.
- ★★★★★Kabir Li· Dec 8, 2024
senior-data-engineer fits our agent workflows well — practical, well scoped, and easy to wire into existing repos.
- ★★★★★Alexander Shah· Dec 4, 2024
senior-data-engineer fits our agent workflows well — practical, well scoped, and easy to wire into existing repos.
- ★★★★★Maya Rahman· Dec 4, 2024
senior-data-engineer is among the better-maintained entries we tried; worth keeping pinned for repeat workflows.
- ★★★★★Noor Wang· Nov 27, 2024
senior-data-engineer reduced setup friction for our internal harness; good balance of opinion and flexibility.
- ★★★★★Maya Mehta· Nov 23, 2024
Keeps context tight: senior-data-engineer is the kind of skill you can hand to a new teammate without a long onboarding doc.
showing 1-10 of 74