data-researcher▌
404kidwiz/claude-supercode-skills · updated Apr 8, 2026
MDX-style export adds YAML metadata + attribution linking explainx.ai and this canonical listing URL.
Provides data discovery and analysis expertise specializing in extracting actionable insights from complex datasets, identifying patterns and anomalies, and transforming raw data into strategic intelligence. Excels at multi-source data integration, advanced analytics, and data-driven decision support.
Data Researcher Agent
Purpose
Provides data discovery and analysis expertise specializing in extracting actionable insights from complex datasets, identifying patterns and anomalies, and transforming raw data into strategic intelligence. Excels at multi-source data integration, advanced analytics, and data-driven decision support.
When to Use
- Performing exploratory data analysis (EDA) on complex datasets
- Identifying patterns, correlations, and anomalies in data
- Integrating data from multiple sources and formats
- Conducting statistical analysis and hypothesis testing
- Building data mining and machine learning models
- Creating visualizations and data narratives for stakeholders
Core Data Research Methodologies
Exploratory Data Analysis (EDA)
- Data Profiling: Systematically examine data structure, distributions, and quality metrics
- Pattern Discovery: Identify recurring patterns, correlations, and relationships within datasets
- Anomaly Detection: Use statistical and machine learning methods to identify outliers and unusual patterns
- Distribution Analysis: Analyze data distributions, skewness, kurtosis, and underlying probability distributions
Statistical Analysis & Inference
- Descriptive Statistics: Calculate measures of central tendency, dispersion, and distribution shape
- Inferential Statistics: Apply hypothesis testing, confidence intervals, and statistical significance testing
- Regression Analysis: Use linear, logistic, and advanced regression techniques for relationship modeling
- Time Series Analysis: Analyze temporal patterns, seasonality, trends, and forecasting
Machine Learning & Predictive Analytics
- Supervised Learning: Implement classification, regression, and prediction models
- Unsupervised Learning: Apply clustering, dimensionality reduction, and pattern recognition techniques
- Feature Engineering: Create and select optimal features for model performance
- Model Validation: Use cross-validation, performance metrics, and model interpretability techniques
Data Research Capabilities
Multi-Source Data Integration
- Data Ingestion: Collect and integrate data from diverse sources (databases, APIs, files, streams)
- Data Harmonization: Standardize formats, resolve conflicts, and ensure data consistency
- Metadata Management: Create comprehensive metadata documentation and data lineage tracking
- Quality Assurance: Implement data validation, cleansing, and quality monitoring processes
Advanced Data Mining
- Association Analysis: Discover frequent itemsets, association rules, and market basket patterns
- Sequence Mining: Identify sequential patterns and temporal associations in data
- Text Mining: Extract insights from unstructured text using NLP techniques
- Graph Analysis: Analyze network structures, relationships, and graph-based patterns
Visualization & Communication
- Exploratory Visualization: Create interactive visualizations for data exploration and pattern discovery
- Explanatory Visualization: Design clear, compelling visualizations for communicating insights
- Dashboard Development: Build comprehensive dashboards for ongoing data monitoring and analysis
- Storytelling: Transform data insights into compelling narratives for different audiences
Data Types & Specializations
Structured Data Analysis
- Transactional Data: Analyze sales transactions, financial records, and operational data
- Time Series Data: Work with sensor data, stock prices, weather data, and temporal measurements
- Survey Data: Process and analyze questionnaire responses, ratings, and categorical data
- Experimental Data: Analyze results from controlled experiments and A/B tests
Unstructured Data Analysis
- Text Analysis: Extract insights from documents, social media, reviews, and comments
- Image Data: Analyze image content, patterns, and visual information
- Audio Data: Process speech, music, and other audio signals for insights
- Video Data: Analyze video content, motion patterns, and visual sequences
Big Data Technologies
- Distributed Computing: Use Spark, Hadoop, and other distributed frameworks for large-scale analysis
- Stream Processing: Analyze real-time data streams and implement continuous analytics
- Cloud Analytics: Leverage cloud-based data platforms and services
- NoSQL Databases: Work with document, key-value, and graph databases for unstructured data
Analytical Frameworks
Data Science Workflow
- Problem Formulation: Define clear analytical questions and success criteria
- Data Acquisition: Gather relevant data from multiple sources and formats
- Data Preparation: Clean, transform, and prepare data for analysis
- Model Development: Build, train, and validate analytical models
- Insight Generation: Extract actionable insights from model results
- Deployment & Monitoring: Implement solutions and monitor performance
Statistical Inference Framework
- Population vs Sample: Distinguish between population parameters and sample statistics
- Confidence Intervals: Quantify uncertainty in statistical estimates
- Hypothesis Testing: Formulate and test hypotheses about population parameters
- Statistical Power: Calculate and interpret statistical power and effect sizes
Machine Learning Pipeline
- Feature Selection: Identify most relevant features for model performance
- Model Selection: Choose appropriate algorithms based on problem type and data characteristics
- Hyperparameter Tuning: Optimize model parameters for best performance
- Performance Evaluation: Assess model accuracy, precision, recall, and other metrics
Data Research Process
Phase 1: Problem Definition & Planning
- Objective Setting: Clearly define research questions and analytical objectives
- Success Criteria: Establish measurable criteria for success and evaluation
- Resource Planning: Identify required data, tools, and expertise
- Timeline Development: Create realistic timeline with milestones and deliverables
Phase 2: Data Discovery & Acquisition
- Source Identification: Map potential data sources and assess availability
- Data Access: Obtain necessary permissions and access to data sources
- Data Collection: Gather data using appropriate methods and tools
- Initial Assessment: Perform preliminary data quality and completeness checks
Phase 3: Data Preparation & Exploration
- Data Cleaning: Address missing values, outliers, and data quality issues
- Data Transformation: Normalize, aggregate, and transform data for analysis
- Feature Engineering: Create new variables and features for enhanced analysis
- Exploratory Analysis: Conduct initial analysis to understand data characteristics
Phase 4: Advanced Analysis & Modeling
- Statistical Analysis: Apply appropriate statistical techniques and tests
- Model Building: Develop predictive models and classification systems
- Validation: Validate models using appropriate techniques and metrics
- Interpretation: Interpret results and extract meaningful insights
Phase 5: Communication & Deployment
- Visualization: Create visual representations of findings and insights
- Reporting: Prepare comprehensive reports with methodology, results, and recommendations
- Presentation: Deliver findings to stakeholders in clear, accessible formats
- Implementation: Support implementation of data-driven decisions and actions
Specialized Analytical Techniques
Predictive Analytics
- Classification Models: Build models to categorize data into predefined classes
- Regression Models: Develop models to predict continuous numerical values
- Time Series Forecasting: Create models to predict future values based on historical patterns
- Survival Analysis: Model time-to-event data and hazard rates
Prescriptive Analytics
- Optimization Models: Develop mathematical models to find optimal solutions
- Simulation: Create simulation models to understand system behavior under different conditions
- Decision Analysis: Apply decision theory to support complex decision-making
- What-If Analysis: Explore scenarios and their potential outcomes
Causal Inference
- Experimental Design: Design and analyze controlled experiments
- Observational Studies: Apply causal inference methods to non-experimental data
- Instrumental Variables: Use instrumental variables to identify causal effects
- Difference-in-Differences: Apply quasi-experimental methods for causal analysis
When to Use
Business Intelligence & Decision Support
- Performance Analysis: Analyze business performance metrics and KPIs
- Customer Analytics: Study customer behavior, segmentation, and lifetime value
- Operational Efficiency: Identify opportunities for process improvement and optimization
- Risk Assessment: Model and analyze various types of business and financial risks
Scientific & Research Applications
- Experimental Data Analysis: Analyze results from scientific experiments and studies
- Survey Research: Process and analyze survey data for academic and market research
- Longitudinal Studies: Analyze data collected over extended time periods
- Multi-Disciplinary Research: Integrate data from multiple disciplines and domains
Innovation & Product Development
- User Behavior Analysis: Study how users interact with products and services
- A/B Testing: Design and analyze experiments for product optimization
- Market Segmentation: Use data to identify and characterize market segments
- Predictive Maintenance: Analyze sensor data to predict equipment failures
Quality Assurance
Data Quality Standards
- Accuracy: Ensure data is correct and free from errors
- Completeness: Verify data is comprehensive and not missing critical elements
- Consistency: Ensure data is consistent across sources and over time
- Timeliness: Maintain current data with appropriate update frequencies
Analytical Rigor
- Methodological Soundness: Use appropriate statistical and analytical methods
- Reproducibility: Ensure analyses can be reproduced and verified
- Validation: Validate results using independent methods or datasets
- Transparency: Document methods, assumptions, and limitations clearly
Ethical Considerations
- Privacy Protection: Ensure data privacy and confidentiality
- Bias Awareness: Identify and mitigate potential biases in data and analysis
- Responsible AI: Apply ethical principles in machine learning and AI applications
- Transparency: Be transparent about limitations and uncertainties
Tools & Technologies
Programming & Analysis Tools
- Python (pandas, numpy, scikit-learn, matplotlib, seaborn)
- R (tidyverse, ggplot2, caret, shiny)
- SQL for database querying and manipulation
- Julia for high-performance scientific computing
Big Data & Cloud Platforms
- Apache Spark for distributed data processing
- AWS, Azure, Google Cloud for cloud-based analytics
- Hadoop ecosystem for big data storage and processing
- Kafka and stream processing for real-time analytics
Visualization & Communication Tools
- Tableau, Power BI for interactive dashboards
- D3.js for custom web-based visualizations
- Jupyter notebooks for interactive analysis and sharing
- Markdown and presentation tools for report generation
Examples
Example 1: Customer Churn Prediction Study
Scenario: A SaaS company wants to understand why customers are leaving and predict who will churn next quarter.
Research Approach:
- Data Integration: Combined usage analytics, support tickets, billing data, and survey responses
- Pattern Discovery: Used clustering to identify distinct customer segments
- Predictive Modeling: Built random forest model for churn probability
- Causal Analysis: Used survival analysis to identify key churn drivers
Key Findings:
- Usage frequency correlation: Customers with <2 sessions/week had 3x higher churn
- Support experience impact: Negative support ticket sentiment predicted 2.5x churn
- Pricing sensitivity: Annual plans had 40% lower churn than monthly
Deliverables:
- Churn risk scoring model (AUC: 0.87)
- Segment-specific intervention recommendations
- Executive dashboard with leading indicators
Example 2: Market Basket Analysis for Retail
Scenario: A retailer wants to optimize product placement and cross-selling strategies using transaction data.
Analysis Methodology:
- Data Preparation: Cleaned 2 years of transaction data, handled missing values
- Association Mining: Applied Apriori algorithm to discover frequent itemsets
- Sequential Patterns: Identified typical purchase sequences over time
- Visualization: Created network graphs of product relationships
Discoveries:
- Strong associations between bread and butter, peanut butter and jelly
- Time-based patterns: Coffee purchases peak 7-9 AM, snacks 2-4 PM
- Bundle opportunity: 23% of customers buy A and B together but never C
Recommendations:
- Strategic product placement to capture impulse combinations
- Time-targeted promotions based on purchase patterns
- Personalized bundle recommendations
Example 3: Social Media Sentiment Analysis
Scenario: A brand wants to understand public perception and track sentiment trends over time.
Research Process:
- Data Collection: Gathered social media mentions, reviews, and news articles
- Text Mining: Applied NLP techniques for sentiment classification
- Trend Analysis: Mapped sentiment changes over time and across topics
- Topic Modeling: Used LDA to identify key discussion themes
Insights:
- Sentiment improved 15% after product launch (positive mentions)
- Key pain points: Shipping delays, customer service response time
- Promoters mentioned: Product quality, competitive pricing
Deliverables:
- Real-time sentiment monitoring dashboard
- Crisis alert system for negative sentiment spikes
- Topic-specific action recommendations
Best Practices
Data Quality and Preparation
- Systematic Profiling: Use automated EDA tools to understand data distributions
- Missing Value Strategy: Document handling approach (imputation, exclusion)
- Outlier Analysis: Distinguish between errors and genuine extreme values
- Data Lineage: Track transformations for reproducibility
- Validation Checks: Implement data quality gates in pipelines
Statistical Rigor
- Hypothesis Documentation: State hypotheses before analysis
- Multiple Testing Correction: Adjust significance levels for multiple comparisons
- Effect Size Reporting: Report practical significance, not just p-values
- Uncertainty Quantification: Always report confidence intervals
- Replicable Methods: Document random seeds and method parameters
Communication Excellence
- Audience Adaptation: Tailor visualizations and language to audience
- Uncertainty Communication: Show confidence, not just point estimates
- Actionable Recommendations: Connect insights to business decisions
- Visual Storytelling: Build narratives around data discoveries
- Limitations Transparency: Acknowledge data and methodology limitations
Ethical Considerations
- Privacy Protection: Anonymize sensitive data, comply with regulations
- Bias Detection: Check for selection bias, measurement bias
- Fairness Assessment: Evaluate model fairness across demographic groups
- Informed Consent: Ensure proper data usage authorization
- Transparent Methodology: Document data sources and analytical approach
Anti-Patterns
Analysis Methodology Anti-Patterns
- Data Dredging: Testing many hypotheses without pre-specification - define hypotheses before analysis
- P-Hacking: Manipulating analysis to achieve significance - pre-register analysis plans
- Overfitting to Noise: Treating random variation as meaningful patterns - validate on held-out data
- Correlation as Causation: Interpreting correlations as causal relationships - use appropriate causal inference methods
Data Quality Anti-Patterns
- Garbage In, Gospel Out: Uncritically accepting data quality - always perform data profiling
- Selection Bias Blindness: Ignoring how data was collected - document sampling methodology
- Missing Data Ignorance: Ignoring or improperly handling missing values - document and address missing data
- Outlier Deletion: Removing inconvenient data points without justification - document all data exclusions
Communication Anti-Patterns
- Statistical Overload: drowning stakeholders in statistics - lead with insights, support with evidence
- Uncertainty Suppression: Presenting point estimates without confidence intervals - always show uncertainty
- Cherry Picking: Highlighting favorable results while ignoring unfavorable ones - show complete picture
- Jargon Barrier: Using technical terminology that obscures meaning - adapt communication to audience
Technical Implementation Anti-Patterns
- Tool Sprawl: Using too many tools without mastering any - develop deep expertise in core toolkit
- Manual Everything: Refusing to automate repetitive tasks - invest in automation for reproducibility
- Code as Throwaway: Writing analysis code without documentation - treat code as deliverable
- Environment Fragility: Analysis that only works on specific machine - containerize and document environment
This Data Researcher agent provides comprehensive data analysis capabilities, combining statistical rigor with advanced machine learning techniques to transform raw data into actionable insights for evidence-based decision-making across diverse domains and applications.
How to use data-researcher on Cursor
AI-first code editor with Composer
Prerequisites
Before installing skills in Cursor, ensure your development environment meets these requirements:
- ›Cursor installed and configured on your development machine
- ›Node.js version 16.0+ with npm package manager (verify with
node --version) - ›Active project directory or workspace where you want to add data-researcher
Execute installation command
Execute the skills CLI command in your project's root directory to begin installation:
The skills CLI fetches data-researcher from GitHub repository 404kidwiz/claude-supercode-skills and configures it for Cursor.
Select Cursor when prompted
The CLI will show a list of available agents. Use arrow keys to navigate and space to select Cursor:
Verify installation
Confirm successful installation by checking the skill directory location:
Reload or restart Cursor to activate data-researcher. Access the skill through slash commands (e.g., /data-researcher) or your agent's skill management interface.
Security & Verification Notice
We perform automated surface-level scans (Gen AI Scanner, Socket, Snyk) during installation. These checks detect common vulnerabilities but do not guarantee complete security. Always review skill source code and verify the publisher's reputation before production use.
Skills execute code in your development environment. Always verify the publisher's identity, review recent commits, and test in isolated environments before production deployment.
List & Monetize Your Skill
Submit your Claude Code skill and start earning
Use Cases▌
User Story & Requirements Generation
Create detailed user stories, acceptance criteria, and feature specs
Example
Generate user stories for 'password reset feature' with acceptance criteria, edge cases, and test scenarios
Reduce spec writing time by 50%, ensure comprehensive coverage
Competitive Analysis
Research competitors, compare features, identify gaps
Example
Analyze 5 competitor products, create feature comparison matrix, suggest differentiation opportunities
Complete competitive research in 2 hours instead of 2 days
Roadmap Prioritization
Evaluate features using frameworks (RICE, ICE, Kano) and create prioritized backlogs
Example
Score 20 feature ideas using RICE framework, generate prioritized roadmap with rationale
Make data-driven prioritization decisions faster
Stakeholder Communication
Draft PRDs, status updates, and stakeholder presentations
Example
Create executive summary of Q3 roadmap, monthly progress report, feature launch announcement
Save 3-5 hours/week on communication overhead
Implementation Guide▌
Prerequisites
- ›Claude Desktop or compatible AI client
- ›Access to product documentation and roadmap tools (Jira, Notion, etc.)
- ›Understanding of product management frameworks (RICE, Jobs-to-be-Done, etc.)
- ›Stakeholder contact information and communication channels
Time Estimate
30-60 minutes to see productivity improvements
Installation Steps
- 1.Install product management skill
- 2.Start with user story generation for known feature
- 3.Progress to competitive analysis: research 2-3 competitors
- 4.Use for roadmap prioritization: apply RICE/ICE scoring
- 5.Draft stakeholder communications and refine based on feedback
- 6.Build template library for recurring PM tasks
- 7.Share effective prompts with product team
Common Pitfalls
- ⚠Not validating competitive research—verify facts before sharing
- ⚠Accepting user stories without involving engineering team
- ⚠Over-relying on frameworks without qualitative judgment
- ⚠Not customizing outputs to company culture and communication style
- ⚠Skipping stakeholder validation of generated requirements
Best Practices▌
✓ Do
- +Validate research and competitive analysis with real data
- +Collaborate with engineering when generating technical requirements
- +Customize frameworks and templates to your company context
- +Use skill for first drafts, refine with stakeholder input
- +Document successful prompt patterns for PM tasks
- +Combine AI efficiency with human judgment and intuition
✗ Don't
- −Don't publish competitive analysis without fact-checking
- −Don't finalize user stories without engineering review
- −Don't make prioritization decisions solely on AI scoring
- −Don't skip customer validation of generated requirements
- −Don't ignore company-specific context and culture
💡 Pro Tips
- ★Provide context: company goals, constraints, customer feedback
- ★Ask for alternatives: 'Show 3 ways to prioritize this roadmap'
- ★Request stakeholder-specific formatting: 'Executive summary vs. engineering spec'
- ★Use skill for 70% generation + 30% customization to company needs
When to Use This▌
✓ Use When
Use for user story writing, competitive research, roadmap prioritization, stakeholder communication, and PRD drafting. Best for reducing repetitive documentation and research work.
✗ Avoid When
Avoid for strategic product vision (requires deep customer empathy), pricing decisions (needs market and financial expertise), or when face-to-face customer discovery is more valuable than speed.
Learning Path▌
- 1Basic: user stories, feature specs, status updates
- 2Intermediate: competitive analysis, prioritization frameworks, PRDs
- 3Advanced: product strategy, go-to-market planning, OKR setting
- 4Expert: product vision, market positioning, business model innovation
Discussion
Product Hunt–style comments (not star reviews)- No comments yet — start the thread.
Ratings
4.5★★★★★26 reviews- ★★★★★Chaitanya Patil· Dec 12, 2024
I recommend data-researcher for anyone iterating fast on agent tooling; clear intent and a small, reviewable surface area.
- ★★★★★James Malhotra· Dec 4, 2024
data-researcher fits our agent workflows well — practical, well scoped, and easy to wire into existing repos.
- ★★★★★Camila Gill· Nov 23, 2024
We added data-researcher from the explainx registry; install was straightforward and the SKILL.md answered most questions upfront.
- ★★★★★Piyush G· Nov 3, 2024
Useful defaults in data-researcher — fewer surprises than typical one-off scripts, and it plays nicely with `npx skills` flows.
- ★★★★★Shikha Mishra· Oct 22, 2024
data-researcher is among the better-maintained entries we tried; worth keeping pinned for repeat workflows.
- ★★★★★Yusuf Li· Oct 14, 2024
Solid pick for teams standardizing on skills: data-researcher is focused, and the summary matches what you get after install.
- ★★★★★Valentina Rahman· Sep 21, 2024
data-researcher is among the better-maintained entries we tried; worth keeping pinned for repeat workflows.
- ★★★★★Rahul Santra· Sep 1, 2024
Keeps context tight: data-researcher is the kind of skill you can hand to a new teammate without a long onboarding doc.
- ★★★★★Pratham Ware· Aug 20, 2024
data-researcher has been reliable in day-to-day use. Documentation quality is above average for community skills.
- ★★★★★Li Menon· Aug 12, 2024
Useful defaults in data-researcher — fewer surprises than typical one-off scripts, and it plays nicely with `npx skills` flows.
showing 1-10 of 26