ab-test-analysis▌
phuryn/pm-skills · updated Apr 8, 2026
Evaluate A/B test results with statistical rigor and translate findings into clear product decisions.
A/B Test Analysis
Evaluate A/B test results with statistical rigor and translate findings into clear product decisions.
Context
You are analyzing A/B test results for $ARGUMENTS.
If the user provides data files (CSV, Excel, or analytics exports), read and analyze them directly. Generate Python scripts for statistical calculations when needed.
Instructions
-
Understand the experiment:
- What was the hypothesis?
- What was changed (the variant)?
- What is the primary metric? Any guardrail metrics?
- How long did the test run?
- What is the traffic split?
-
Validate the test setup:
- Sample size: Is the sample large enough for the expected effect size?
- Use the formula: n = (Z²α/2 × 2 × p × (1-p)) / MDE²
- Flag if the test is underpowered (<80% power)
- Duration: Did the test run for at least 1-2 full business cycles?
- Randomization: Any evidence of sample ratio mismatch (SRM)?
- Novelty/primacy effects: Was there enough time to wash out initial behavior changes?
- Sample size: Is the sample large enough for the expected effect size?
-
Calculate statistical significance:
- Conversion rate for control and variant
- Relative lift: (variant - control) / control × 100
- p-value: Using a two-tailed z-test or chi-squared test
- Confidence interval: 95% CI for the difference
- Statistical significance: Is p < 0.05?
- Practical significance: Is the lift meaningful for the business?
If the user provides raw data, generate and run a Python script to calculate these.
-
Check guardrail metrics:
- Did any guardrail metrics (revenue, engagement, page load time) degrade?
- A winning primary metric with degraded guardrails may not be a true win
-
Interpret results:
Outcome Recommendation Significant positive lift, no guardrail issues Ship it — roll out to 100% Significant positive lift, guardrail concerns Investigate — understand trade-offs before shipping Not significant, positive trend Extend the test — need more data or larger effect Not significant, flat Stop the test — no meaningful difference detected Significant negative lift Don't ship — revert to control, analyze why -
Provide the analysis summary:
## A/B Test Results: [Test Name] **Hypothesis**: [What we expected] **Duration**: [X days] | **Sample**: [N control / M variant] | Metric | Control | Variant | Lift | p-value | Significant? | |---|---|---|---|---|---| | [Primary] | X% | Y% | +Z% | 0.0X | Yes/No | | [Guardrail] | ... | ... | ... | ... | ... | **Recommendation**: [Ship / Extend / Stop / Investigate] **Reasoning**: [Why] **Next steps**: [What to do]
Think step by step. Save as markdown. Generate Python scripts for calculations if raw data is provided.
Further Reading
Discussion
Product Hunt–style comments (not star reviews)- No comments yet — start the thread.
Ratings
4.8★★★★★68 reviews- ★★★★★Kwame Farah· Dec 28, 2024
We added ab-test-analysis from the explainx registry; install was straightforward and the SKILL.md answered most questions upfront.
- ★★★★★Dhruvi Jain· Dec 24, 2024
Registry listing for ab-test-analysis matched our evaluation — installs cleanly and behaves as described in the markdown.
- ★★★★★Mei Smith· Dec 8, 2024
ab-test-analysis has been reliable in day-to-day use. Documentation quality is above average for community skills.
- ★★★★★Li Zhang· Dec 8, 2024
Useful defaults in ab-test-analysis — fewer surprises than typical one-off scripts, and it plays nicely with `npx skills` flows.
- ★★★★★Mateo Choi· Dec 8, 2024
Solid pick for teams standardizing on skills: ab-test-analysis is focused, and the summary matches what you get after install.
- ★★★★★Isabella Thompson· Dec 4, 2024
ab-test-analysis reduced setup friction for our internal harness; good balance of opinion and flexibility.
- ★★★★★Hassan Johnson· Dec 4, 2024
ab-test-analysis fits our agent workflows well — practical, well scoped, and easy to wire into existing repos.
- ★★★★★Lucas Abbas· Nov 27, 2024
I recommend ab-test-analysis for anyone iterating fast on agent tooling; clear intent and a small, reviewable surface area.
- ★★★★★Li Liu· Nov 27, 2024
Registry listing for ab-test-analysis matched our evaluation — installs cleanly and behaves as described in the markdown.
- ★★★★★Hassan Sharma· Nov 23, 2024
We added ab-test-analysis from the explainx registry; install was straightforward and the SKILL.md answered most questions upfront.
showing 1-10 of 68