dummy-dataset

phuryn/pm-skills · updated Apr 8, 2026

$npx skills add https://github.com/phuryn/pm-skills --skill dummy-dataset
0 commentsdiscussion
summary

Generate realistic dummy datasets for testing with customizable columns, constraints, and output formats (CSV, JSON, SQL, Python script). Creates executable scripts or direct data files for immediate use.

skill.md

Dummy Dataset Generation

Generate realistic dummy datasets for testing with customizable columns, constraints, and output formats (CSV, JSON, SQL, Python script). Creates executable scripts or direct data files for immediate use.

Use when: Creating test data, generating sample datasets, building realistic mock data for development, or populating test environments.

Arguments:

  • $PRODUCT: The product or system name
  • $DATASET_TYPE: Type of data (e.g., customer feedback, transactions, user profiles)
  • $ROWS: Number of rows to generate (default: 100)
  • $COLUMNS: Specific columns or fields to include
  • $FORMAT: Output format (CSV, JSON, SQL, Python script)
  • $CONSTRAINTS: Additional constraints or business rules

Step-by-Step Process

  1. Identify dataset type - Understand the data domain
  2. Define column specifications - Names, data types, and value ranges
  3. Determine row count - How many sample records needed
  4. Select output format - CSV, JSON, SQL INSERT, or Python script
  5. Apply realistic patterns - Ensure data looks authentic and valid
  6. Add business constraints - Respect business logic and relationships
  7. Generate or script data - Create executable output
  8. Validate output - Ensure data quality and completeness

Template: Python Script Output

import csv
import json
from datetime import datetime, timedelta
import random

# Configuration
ROWS = $ROWS
FILENAME = "$DATASET_TYPE.csv"

# Column definitions with realistic value generators
columns = {
    "id": "auto-increment",
    "name": "first_last_name",
    "email": "email",
    "created_at": "timestamp",
    # Add more columns...
}

def generate_dataset():
    """Generate realistic dummy dataset"""
    data = []
    for i in range(1, ROWS + 1):
        record = {
            "id": f"U{i:06d}",
            # Generate values based on column definitions
        }
        data.append(record)
    return data

def save_as_csv(data, filename):
    """Save dataset as CSV"""
    with open(filename, 'w', newline='') as f:
        writer = csv.DictWriter(f, fieldnames=data[0].keys())
        writer.writeheader()
        writer.writerows(data)

if __name__ == "__main__":
    dataset = generate_dataset()
    save_as_csv(dataset, FILENAME)
    print(f"Generated {len(dataset)} records in {FILENAME}")

Example Dataset Specification

Dataset Type: Customer Feedback

Columns:

  • feedback_id (auto-increment, U001, U002...)
  • customer_name (realistic names)
  • email (valid email format)
  • feedback_date (dates last 90 days)
  • rating (1-5 stars)
  • category (Bug, Feature Request, Complaint, Praise)
  • text (realistic feedback)
  • product (electronics, clothing, home)

Constraints:

  • Ratings skewed: 40% 5-star, 30% 4-star, 20% 3-star, 10% 1-2 star
  • Bug category only with ratings 1-3
  • Feature requests only with ratings 3-5
  • Email domains realistic (gmail, yahoo, company.com)

Output Deliverables

  • Ready-to-execute Python script OR direct data file
  • CSV file with proper headers and formatting
  • JSON file with valid structure and types
  • SQL INSERT statements for database population
  • Data validation and constraint compliance
  • Realistic, business-appropriate values
  • Documentation of data generation logic
  • Quick-start instructions for using the dataset

Output Formats

CSV: Flat tabular format, easy to import into spreadsheets and databases

JSON: Nested structure, ideal for APIs and NoSQL databases

SQL: INSERT statements, directly executable on relational databases

Python Script: Executable generator for custom or large datasets

Discussion

Product Hunt–style comments (not star reviews)
  • No comments yet — start the thread.
general reviews

Ratings

4.639 reviews
  • Ren Taylor· Dec 28, 2024

    Keeps context tight: dummy-dataset is the kind of skill you can hand to a new teammate without a long onboarding doc.

  • Dhruvi Jain· Dec 24, 2024

    Registry listing for dummy-dataset matched our evaluation — installs cleanly and behaves as described in the markdown.

  • Maya Harris· Dec 24, 2024

    I recommend dummy-dataset for anyone iterating fast on agent tooling; clear intent and a small, reviewable surface area.

  • Maya Gonzalez· Dec 4, 2024

    dummy-dataset fits our agent workflows well — practical, well scoped, and easy to wire into existing repos.

  • Hana Kim· Nov 19, 2024

    dummy-dataset is among the better-maintained entries we tried; worth keeping pinned for repeat workflows.

  • Oshnikdeep· Nov 15, 2024

    Solid pick for teams standardizing on skills: dummy-dataset is focused, and the summary matches what you get after install.

  • Omar Rao· Nov 15, 2024

    Useful defaults in dummy-dataset — fewer surprises than typical one-off scripts, and it plays nicely with `npx skills` flows.

  • Sophia Tandon· Oct 10, 2024

    dummy-dataset reduced setup friction for our internal harness; good balance of opinion and flexibility.

  • Ganesh Mohane· Oct 6, 2024

    I recommend dummy-dataset for anyone iterating fast on agent tooling; clear intent and a small, reviewable surface area.

  • Yuki Rao· Oct 6, 2024

    Registry listing for dummy-dataset matched our evaluation — installs cleanly and behaves as described in the markdown.

showing 1-10 of 39

1 / 4