spark▌
5 indexed skills · max 10 per page
spark-engineer
jeffallan/claude-skills · Productivity
Expert Apache Spark engineer for distributed data processing, ETL pipeline optimization, and production-grade big data applications. \n \n Covers DataFrame API, Spark SQL, RDD operations, and structured streaming with explicit schema definitions and lazy evaluation patterns \n Provides partitioning strategies, broadcast join optimization, data skew handling via salting, and caching best practices for large-scale workloads \n Includes performance tuning guidance: shuffle partition configuration,
spark-optimization
sickn33/antigravity-awesome-skills · Productivity
Production patterns for optimizing Apache Spark jobs including partitioning strategies, memory management, shuffle optimization, and performance tuning.
super-swarm-spark
am-will/codex-skills · Productivity
Orchestrates parallel task execution across up to 12 concurrent Sparky subagents using a rolling pool scheduler. \n \n Parses markdown plan files, extracts task definitions, and launches subagents continuously as slots open without waiting for batch completion \n Maintains canonical file paths and naming constraints across parallel tasks to prevent filename drift and cross-task conflicts \n Validates each subagent result, updates the plan file with completion logs, and immediately schedules the
parallel-task-spark
am-will/codex-skills · Productivity
Orchestrate parallel development tasks with dependency management and test-driven validation. \n \n Parses markdown plan files to extract task definitions, dependencies, and acceptance criteria, then launches unblocked tasks in parallel waves using Sparky subagents \n Enforces test-driven development (RED phase first) for testable tasks, with fallback to documented non-testable verification (manual, static, or runtime checks) \n Manages task dependencies automatically, blocking tasks until their
spark-optimization
wshobson/agents · Productivity
Apache Spark job optimization through partitioning, memory tuning, shuffle reduction, and join strategies. \n \n Covers partitioning strategies, broadcast joins, bucketed joins, and skew handling with salting techniques to minimize shuffle overhead \n Includes caching and persistence patterns with storage level selection, checkpointing for complex lineages, and memory configuration breakdown \n Provides data format optimization for Parquet and Delta Lake, column pruning, predicate pushdown, and