ab-test-setup

coreyhaines31/marketingskills · updated Apr 8, 2026

MDX-style export adds YAML metadata + attribution linking explainx.ai and this canonical listing URL.

$npx skills add https://github.com/coreyhaines31/marketingskills --skill ab-test-setup
0 commentsdiscussion
summary

Expert guidance for designing statistically valid A/B tests and experiments.

  • Provides a structured hypothesis framework, sample size calculations, and metrics selection (primary, secondary, guardrail) to ensure rigorous test design
  • Covers test types (A/B, A/B/n, MVT, split URL), traffic allocation strategies, and implementation approaches (client-side vs. server-side)
  • Includes pre-launch checklists, guidance on avoiding common pitfalls like early peeking, and frameworks for analyzing
skill.md

A/B Test Setup

You are an expert in experimentation and A/B testing. Your goal is to help design tests that produce statistically valid, actionable results.

Initial Assessment

Check for product marketing context first: If .agents/product-marketing-context.md exists (or .claude/product-marketing-context.md in older setups), read it before asking questions. Use that context and only ask for information not already covered or specific to this task.

Before designing a test, understand:

  1. Test Context - What are you trying to improve? What change are you considering?
  2. Current State - Baseline conversion rate? Current traffic volume?
  3. Constraints - Technical complexity? Timeline? Tools available?

Core Principles

1. Start with a Hypothesis

  • Not just "let's see what happens"
  • Specific prediction of outcome
  • Based on reasoning or data

2. Test One Thing

  • Single variable per test
  • Otherwise you don't know what worked

3. Statistical Rigor

  • Pre-determine sample size
  • Don't peek and stop early
  • Commit to the methodology

4. Measure What Matters

  • Primary metric tied to business value
  • Secondary metrics for context
  • Guardrail metrics to prevent harm

Hypothesis Framework

Structure

Because [observation/data],
we believe [change]
will cause [expected outcome]
for [audience].
We'll know this is true when [metrics].

Example

Weak: "Changing the button color might increase clicks."

Strong: "Because users report difficulty finding the CTA (per heatmaps and feedback), we believe making the button larger and using contrasting color will increase CTA clicks by 15%+ for new visitors. We'll measure click-through rate from page view to signup start."


Test Types

Type Description Traffic Needed
A/B Two versions, single change Moderate
A/B/n Multiple variants Higher
MVT Multiple changes in combinations Very high
Split URL Different URLs for variants Moderate

Sample Size

Quick Reference

Baseline 10% Lift 20% Lift 50% Lift
1% 150k/variant 39k/variant 6k/variant
3% 47k/variant 12k/variant 2k/variant
5% 27k/variant 7k/variant 1.2k/variant
10% 12k/variant 3k/variant 550/variant

Calculators:

For detailed sample size tables and duration calculations: See references/sample-size-guide.md


Metrics Selection

Primary Metric

  • Single metric that matters most
  • Directly tied to hypothesis
  • What you'll use to call the test

Secondary Metrics

  • Support primary metric interpretation
  • Explain why/how the change worked

Guardrail Metrics

  • Things that shouldn't get worse
  • Stop test if significantly negative

Example: Pricing Page Test

  • Primary: Plan selection rate
  • Secondary: Time on page, plan distribution
  • Guardrail: Support tickets, refund rate

Designing Variants

What to Vary

Category Examples
Headlines/Copy Message angle, value prop, specificity, tone
Visual Design Layout, color, images, hierarchy
CTA Button copy, size, placement, number
Content Information included, order, amount, social proof

Best Practices

  • Single, meaningful change
  • Bold enough to make a difference
  • True to the hypothesis

Traffic Allocation

Approach Split When to Use
Standard 50/50 Default for A/B
Conservative 90/10, 80/20 Limit risk of bad variant
Ramping Start small, increase Technical risk mitigation

Considerations:

  • Consistency: Users see same variant on return
  • Balanced exposure across time of day/week

Implementation

Client-Side

  • JavaScript modifies page after load
  • Quick to implement, can cause flicker
  • Tools: PostHog, Optimizely, VWO

Server-Side

  • Variant determined before render
  • No flicker, requires dev work
  • Tools: PostHog, LaunchDarkly, Split

Running the Test

Pre-Launch Checklist

  • Hypothesis documented
  • Primary metric defined
  • Sample size calculated
  • Variants implemented correctly
  • Tracking verified
  • QA completed on all variants

During the Test

DO:

  • Monitor for technical issues
  • Check segment quality
  • Document external factors

Avoid:

  • Peek at results and stop early
  • Make changes to variants
  • Add traffic from new sources

The Peeking Problem

Looking at results before reaching sample size and stopping early leads to false positives and wrong decisions. Pre-commit to sample size and trust the process.


Analyzing Results

Statistical Significance

  • 95% confidence = p-value < 0.05
  • Means <5% chance result is random
  • Not a guarantee—just a threshold

Analysis Checklist

  1. Reach sample size? If not, result is preliminary
  2. Statistically significant? Check confidence intervals
  3. Effect size meaningful? Compare to MDE, project impact
  4. Secondary metrics consistent? Support the primary?
  5. Guardrail concerns? Anything get worse?
  6. Segment differences? Mobile vs. desktop? New vs. returning?

Interpreting Results

Result Conclusion
Significant winner Implement variant
Significant loser Keep control, learn why
No significant difference Need more traffic or bolder test
Mixed signals Dig deeper, maybe segment

Documentation

Document every test with:

  • Hypothesis
  • Variants (with screenshots)
  • Results (sample, metrics, significance)
  • Decision and learnings

For templates: See references/test-templates.md


Common Mistakes

Test Design

  • Testing too small a change (undetectable)
  • Testing too many things (can't isolate)
  • No clear hypothesis

Execution

  • Stopping early
  • Changing things mid-test
  • Not checking implementation

Analysis

  • Ignoring confidence intervals
  • Cherry-picking segments
  • Over-interpreting inconclusive results

Task-Specific Questions

  1. What's your current conversion rate?
  2. How much traffic does this page get?
  3. What change are you considering and why?
  4. What's the smallest improvement worth detecting?
  5. What tools do you have for testing?
  6. Have you tested this area before?

Related Skills

  • page-cro: For generating test ideas based on CRO principles
  • analytics-tracking: For setting up test measurement
  • copywriting: For creating variant copy
how to use ab-test-setup

How to use ab-test-setup on Cursor

AI-first code editor with Composer

1

Prerequisites

Before installing skills in Cursor, ensure your development environment meets these requirements:

  • Cursor installed and configured on your development machine
  • Node.js version 16.0+ with npm package manager (verify with node --version)
  • Active project directory or workspace where you want to add ab-test-setup
2

Execute installation command

Execute the skills CLI command in your project's root directory to begin installation:

$npx skills add https://github.com/coreyhaines31/marketingskills --skill ab-test-setup

The skills CLI fetches ab-test-setup from GitHub repository coreyhaines31/marketingskills and configures it for Cursor.

3

Select Cursor when prompted

The CLI will show a list of available agents. Use arrow keys to navigate and space to select Cursor:

◆ Which agents do you want to install to?
│ ── Universal (.agents/skills) ── always included ────
│ • Amp
│ • Antigravity
│ • Cline
│ • Codex
│ ●Cursor(selected)
│ • Cursor
│ • Windsurf
4

Verify installation

Confirm successful installation by checking the skill directory location:

.cursor/skills/ab-test-setup

Reload or restart Cursor to activate ab-test-setup. Access the skill through slash commands (e.g., /ab-test-setup) or your agent's skill management interface.

Security & Verification Notice

We perform automated surface-level scans (Gen AI Scanner, Socket, Snyk) during installation. These checks detect common vulnerabilities but do not guarantee complete security. Always review skill source code and verify the publisher's reputation before production use.

Skills execute code in your development environment. Always verify the publisher's identity, review recent commits, and test in isolated environments before production deployment.

List & Monetize Your Skill

Submit your Claude Code skill and start earning

GET_STARTED →

Use Cases

Task Automation & Efficiency

Automate repetitive workflows and reduce manual effort

Example

Generate reports, summarize documents, draft communications

Save 3-5 hours per week on routine tasks

Knowledge Enhancement

Learn new skills, understand complex topics, get expert guidance

Example

Explain concepts, provide examples, suggest learning resources

Accelerate learning and skill development by 2x

Quality Improvement

Enhance output quality through reviews, suggestions, and refinements

Example

Review drafts, suggest improvements, catch errors

Improve work quality by 30-40% with less effort

Implementation Guide

Prerequisites

  • Claude Desktop or compatible AI client with skill support
  • Clear understanding of task or problem to solve
  • Willingness to iterate and refine outputs

Time Estimate

15-45 minutes depending on use case complexity

Installation Steps

  1. 1.Install skill using provided installation command
  2. 2.Test with simple use case relevant to your work
  3. 3.Evaluate output quality and relevance
  4. 4.Iterate on prompts to improve results
  5. 5.Integrate into regular workflow if valuable

Common Pitfalls

  • Expecting perfect results without iteration
  • Not providing enough context in prompts
  • Using skill for tasks outside its intended scope
  • Accepting outputs without review and validation

Best Practices

✓ Do

  • +Start with clear, specific prompts
  • +Provide relevant context and constraints
  • +Review and refine all outputs before using
  • +Iterate to improve output quality
  • +Document successful prompt patterns

✗ Don't

  • Don't use without understanding skill limitations
  • Don't skip validation of outputs
  • Don't share sensitive information in prompts
  • Don't expect skill to replace human judgment

💡 Pro Tips

  • Be specific about desired format and style
  • Ask for multiple options to choose from
  • Request explanations to understand reasoning
  • Combine AI efficiency with human expertise

When to Use This

✓ Use When

Use when skill capabilities match your task, clear ROI on time saved, and you can validate outputs. Best for repetitive tasks, learning, and quality improvement.

✗ Avoid When

Avoid when task requires deep expertise you can't validate, involves sensitive decisions, or when learning process is more valuable than speed of completion.

Learning Path

  1. 1Familiarize yourself with skill capabilities and limitations
  2. 2Start with low-risk, non-critical tasks
  3. 3Progress to more complex and valuable use cases
  4. 4Build expertise through regular use and experimentation

Discussion

Product Hunt–style comments (not star reviews)
  • No comments yet — start the thread.
general reviews

Ratings

4.873 reviews
  • Alexander Iyer· Dec 28, 2024

    Solid pick for teams standardizing on skills: ab-test-setup is focused, and the summary matches what you get after install.

  • Benjamin Chawla· Dec 24, 2024

    ab-test-setup fits our agent workflows well — practical, well scoped, and easy to wire into existing repos.

  • Hiroshi Reddy· Dec 24, 2024

    ab-test-setup is among the better-maintained entries we tried; worth keeping pinned for repeat workflows.

  • Valentina Chen· Dec 20, 2024

    Keeps context tight: ab-test-setup is the kind of skill you can hand to a new teammate without a long onboarding doc.

  • Chaitanya Patil· Dec 16, 2024

    Solid pick for teams standardizing on skills: ab-test-setup is focused, and the summary matches what you get after install.

  • Amelia Jain· Dec 16, 2024

    Useful defaults in ab-test-setup — fewer surprises than typical one-off scripts, and it plays nicely with `npx skills` flows.

  • Hiroshi Taylor· Dec 8, 2024

    Registry listing for ab-test-setup matched our evaluation — installs cleanly and behaves as described in the markdown.

  • Hiroshi Sethi· Nov 27, 2024

    Keeps context tight: ab-test-setup is the kind of skill you can hand to a new teammate without a long onboarding doc.

  • Amelia Malhotra· Nov 27, 2024

    I recommend ab-test-setup for anyone iterating fast on agent tooling; clear intent and a small, reviewable surface area.

  • Kiara Ndlovu· Nov 19, 2024

    We added ab-test-setup from the explainx registry; install was straightforward and the SKILL.md answered most questions upfront.

showing 1-10 of 73

1 / 8