tooluniverse-polygenic-risk-score

mims-harvard/tooluniverse · updated Apr 8, 2026

MDX-style export adds YAML metadata + attribution linking explainx.ai and this canonical listing URL.

$npx skills add https://github.com/mims-harvard/tooluniverse --skill tooluniverse-polygenic-risk-score
0 commentsdiscussion
summary

Build and interpret polygenic risk scores for complex diseases using genome-wide association study (GWAS) data.

skill.md

Polygenic Risk Score (PRS) Builder

Build and interpret polygenic risk scores for complex diseases using genome-wide association study (GWAS) data.

Reasoning Strategy

A polygenic risk score predicts genetic risk, not disease. A high PRS means elevated risk relative to the population — it does not mean the person will develop the condition, and a low PRS does not confer immunity. PRS performance varies dramatically across ancestries: a European-derived PRS applied to a West African population can lose 50–70% of its predictive power because the underlying GWAS was trained on European allele frequencies and LD patterns. Effect sizes from discovery GWAS are subject to winner's curse (overestimation in single studies); always prefer weights from large meta-analyses or validated PGS Catalog models. PRS should always be interpreted in the context of non-genetic risk factors — for most complex diseases, environmental factors contribute as much or more than genetics.

LOOK UP DON'T GUESS: Do not assume effect sizes, allele frequencies, or which SNPs are genome-wide significant for a trait — always query GWAS Catalog (gwas_get_associations_for_trait) for actual data. Do not assume a validated PRS model exists for a trait; check PGS Catalog via PubMed search.

Overview

Use Cases:

  • "Calculate my genetic risk for type 2 diabetes"
  • "Build a polygenic risk score for coronary artery disease"
  • "What's my genetic predisposition to Alzheimer's disease?"
  • "Interpret my PRS percentile for breast cancer risk"

What This Skill Does:

  • Extracts genome-wide significant variants (p < 5e-8) from GWAS Catalog
  • Builds weighted PRS models using effect sizes (beta coefficients)
  • Calculates individual risk scores from genotype data
  • Interprets PRS as population percentiles and risk categories

What This Skill Does NOT Do:

  • Diagnose disease (PRS is probabilistic, not deterministic)
  • Replace clinical assessment or genetic counseling
  • Account for non-genetic factors (lifestyle, environment)
  • Provide treatment recommendations

Methodology

PRS Calculation Formula

A polygenic risk score is calculated as a weighted sum across genetic variants:

PRS = Σ (dosage_i × effect_size_i)

Where:

  • dosage_i: Number of effect alleles at SNP i (0, 1, or 2)
  • effect_size_i: Beta coefficient or log(odds ratio) from GWAS

Standardization

Raw PRS is standardized to z-scores for interpretation:

z-score = (PRS - population_mean) / population_std

This allows comparison to population distribution and percentile calculation.

Significance Thresholds

  • Genome-wide significance: p < 5×10⁻⁸ (default threshold)
  • This corrects for ~1 million independent tests across the genome
  • Relaxed thresholds (e.g., p < 1×10⁻⁵) can include more SNPs but may add noise

Effect Size Handling

  • Continuous traits (e.g., height, BMI): Beta coefficient (units of trait per allele)
  • Binary traits (e.g., disease): Odds ratio converted to log-odds (beta = ln(OR))
  • Missing effect sizes or non-significant SNPs are excluded

Data Sources

This skill uses ToolUniverse GWAS tools to query:

  1. GWAS Catalog (EMBL-EBI)

    • Curated GWAS associations, 5000+ studies
    • Tools: gwas_search_associations (param: disease_trait, size; also gwas_get_associations_for_trait), gwas_get_snps_for_gene (param: gene_symbol), dbsnp_get_variant_by_rsid
    • Note: disease_trait search returns associations where the trait is one of potentially several linked EFO traits. For precise filtering, use EFO IDs via efo_trait param.
  2. Open Targets Genetics

    • Integrated genetics platform with fine-mapped credible sets
    • Tools: OpenTargets_search_gwas_studies_by_disease, EnsemblVEP_annotate_hgvs (for variant consequence/frequency)
  3. Variant Annotation

    • gnomad_search_variants + gnomad_get_variant — population allele frequencies (ancestry-specific via VEP colocated_variants)
    • MyVariant_query_variants — CADD, SIFT, PolyPhen, ClinVar, gnomAD in one call
    • gnomad_get_gene_constraints — gene constraint metrics (pLI, oe_lof) for target prioritization

Key Concepts

Polygenic Risk Scores (PRS)

Polygenic risk scores aggregate the effects of many genetic variants to estimate an individual's genetic predisposition to a trait or disease. Unlike Mendelian diseases caused by single mutations, complex diseases involve hundreds to thousands of variants, each with small effects.

Key Properties:

  • Continuous distribution: PRS forms a bell curve in populations
  • Relative risk: Compares individual to population average
  • Probabilistic: High PRS doesn't guarantee disease, low PRS doesn't guarantee protection
  • Ancestry-specific: PRS accuracy depends on matching GWAS and target ancestry

GWAS (Genome-Wide Association Studies)

GWAS compare allele frequencies between cases and controls (or correlate with trait values) across millions of SNPs to identify disease-associated variants.

Study Design:

  • Discovery cohort: Initial identification of associations
  • Replication cohort: Validation in independent samples
  • Sample size: Larger studies detect smaller effects (power ∝ √N)
  • Multiple testing correction: Bonferroni-type correction for ~1M tests

Effect Sizes and Odds Ratios

  • Beta (β): Change in trait per copy of effect allele
    • Example: β = 0.5 kg/m² means each allele increases BMI by 0.5 units
  • Odds Ratio (OR): Multiplicative change in disease odds
    • OR = 1.5 means 50% increased odds per allele
    • Convert to beta: β = ln(OR)

Linkage Disequilibrium (LD) and Clumping

Nearby variants are often inherited together (LD). To avoid double-counting:

  • LD clumping: Select independent variants (r² < 0.1 within 1 Mb windows)
  • Fine-mapping: Statistical methods to identify causal variants
  • This skill uses raw associations; production PRS should include LD pruning

Population Stratification

GWAS and PRS are most accurate when ancestries match:

  • Population structure: Different ancestries have different allele frequencies
  • Transferability: European-trained PRS perform worse in non-European populations
  • Solution: Train PRS on diverse cohorts or use ancestry-matched references

Applications

Clinical Risk Assessment

PRS can stratify individuals for:

  • Screening programs: Target high-risk individuals (e.g., mammography, colonoscopy)
  • Prevention strategies: Lifestyle interventions for high genetic risk
  • Drug response: Pharmacogenomics based on metabolism genes

Example: Khera et al. (2018) showed PRS identifies 3× more individuals at >3-fold coronary artery disease risk than monogenic mutations.

Research Applications

  • Gene discovery: PRS-based phenome-wide association studies (PheWAS)
  • Genetic correlation: Compare PRS across traits
  • Causal inference: Mendelian randomization using PRS as instruments
  • Simulation studies: Model polygenic architecture

Personal Genomics

Consumer genetic testing (23andMe, Ancestry DNA) provides raw genotypes. Users can:

  • Calculate PRS for traits not reported
  • Compare to published PRS models
  • Understand genetic contribution vs. lifestyle factors

Caution: Personal PRS should not replace medical advice. Results may cause anxiety if not properly contextualized.

Limitations and Considerations

  • Heritability gap: PRS explains only a fraction of genetic heritability (T2D: ~50% heritable, PRS explains ~10–20%). Rare variants, epistasis, and gene-environment interactions are not captured.
  • Ancestry bias: European-derived PRS performance drops substantially in non-European populations. Use multi-ancestry GWAS weights when available.
  • Winner's curse: Discovery effect sizes are overestimated; use meta-analysis weights or PGS Catalog validated models.
  • Not diagnostic: High PRS does not guarantee disease; low PRS does not guarantee protection. Environmental factors contribute equally or more for most complex diseases.
  • Actionability varies: Alzheimer's PRS has limited actionable interventions; cardiovascular PRS can guide statin or lifestyle decisions. Always consider what the person can do with the information.
  • Ethical: Genetic data is permanent and familial. GINA protects employment/health insurance in the US, but not life insurance. Provide genetic counseling context.

Workflow

1. Trait Selection

Identify the disease or trait of interest:

  • Use standard terminology (e.g., "type 2 diabetes" not "T2D")
  • Check GWAS Catalog for availability
  • Verify sufficient GWAS studies exist (n > 10,000 samples ideal)

2. Association Collection

Query GWAS databases for genome-wide significant associations:

prs = build_polygenic_risk_score(
    trait="coronary artery disease",
    p_threshold=5e-8,  # Genome-wide significance
    max_snps=1000
)

Considerations:

  • P-value threshold: 5e-8 is conservative, 1e-5 includes more variants
  • LD clumping: Production systems should prune correlated SNPs
  • Study quality: Prefer large meta-analyses over small studies

3. Effect Size Extraction

Extract beta coefficients or odds ratios:

  • Beta for continuous traits (direct use)
  • OR for binary traits (convert to log-odds)
  • Handle missing values (exclude or impute from meta-analysis)

4. SNP Filtering

Quality control filters:

  • MAF filter: Exclude rare variants (MAF < 0.01) for robustness
  • Genotype QC: Remove SNPs with high missingness (> 10%)
  • Hardy-Weinberg: Exclude SNPs violating HWE (p < 1e-6)
  • Ambiguous SNPs: Remove A/T and G/C SNPs (strand ambiguity)

5. Score Calculation

Calculate weighted sum of genotype dosages:

result = calculate_personal_prs(
    prs_weights=prs,
    genotypes=my_genotypes,
    population_mean=0.0,
    population_std=1.0
)

Genotype Sources:

  • 23andMe raw data export
  • Ancestry DNA raw data
  • Whole genome sequencing (VCF files)
  • SNP array data (Illumina, Affymetrix)

6. Risk Interpretation

Convert to percentiles and risk categories:

result = interpret_prs_percentile(result)
print(f"Percentile: {result.percentile:.1f}%")
print(f"Risk: {result.risk_category}")

Risk Categories:

  • Low risk: < 20th percentile (genetic protection)
  • Average risk: 20-80th percentile (typical genetic predisposition)
  • Elevated risk: 80-95th percentile (moderately increased risk)
  • High risk: > 95th percentile (substantially increased risk)

Clinical Interpretation:

  • Percentiles assume normal distribution
  • Relative risk vs. average (not absolute risk)
  • Combine with family history, clinical risk factors
  • PRS is NOT diagnostic - many high-risk individuals never develop disease

Best Practices

  • Use validated PRS from PGS Catalog when available (externally validated, includes LD clumping and ancestry-specific weights)
  • Match ancestries between GWAS and target population; use multi-ancestry GWAS when available
  • For highly polygenic traits (height, education), relaxed p-value thresholds capture more signal; for oligogenic traits (IBD, T1D), strict thresholds are better
  • Combine PRS with clinical risk scores (Framingham, QRISK) for integrated prediction
  • In research: document SNP selection criteria, LD clumping parameters, and ancestry of GWAS; validate in held-out cohorts; report R² or AUC stratified by ancestry

Disclaimer

This skill is for educational and research purposes only.

  • Not for clinical diagnosis or treatment decisions
  • Not validated for clinical use - use PGS Catalog models for clinical-grade PRS
  • Requires genetic counseling - interpretation requires expertise
  • Does not account for family history, environment, or lifestyle factors
  • Ancestry-specific - accuracy depends on matching GWAS ancestry

For clinical genetic testing, consult:

  • Genetic counselors (certified by ABGC/ABMGG)
  • Medical geneticists
  • Healthcare providers with genomics training

PRS is a rapidly evolving field. Guidelines and best practices will continue to change as research progresses.

Regulatory Status:

  • FDA does not currently regulate PRS (as of 2024)
  • Some countries restrict direct-to-consumer genetic risk reporting
  • Check local regulations before clinical implementation
how to use tooluniverse-polygenic-risk-score

How to use tooluniverse-polygenic-risk-score on Cursor

AI-first code editor with Composer

1

Prerequisites

Before installing skills in Cursor, ensure your development environment meets these requirements:

  • Cursor installed and configured on your development machine
  • Node.js version 16.0+ with npm package manager (verify with node --version)
  • Active project directory or workspace where you want to add tooluniverse-polygenic-risk-score
2

Execute installation command

Execute the skills CLI command in your project's root directory to begin installation:

$npx skills add https://github.com/mims-harvard/tooluniverse --skill tooluniverse-polygenic-risk-score

The skills CLI fetches tooluniverse-polygenic-risk-score from GitHub repository mims-harvard/tooluniverse and configures it for Cursor.

3

Select Cursor when prompted

The CLI will show a list of available agents. Use arrow keys to navigate and space to select Cursor:

◆ Which agents do you want to install to?
│ ── Universal (.agents/skills) ── always included ────
│ • Amp
│ • Antigravity
│ • Cline
│ • Codex
│ ●Cursor(selected)
│ • Cursor
│ • Windsurf
4

Verify installation

Confirm successful installation by checking the skill directory location:

.cursor/skills/tooluniverse-polygenic-risk-score

Reload or restart Cursor to activate tooluniverse-polygenic-risk-score. Access the skill through slash commands (e.g., /tooluniverse-polygenic-risk-score) or your agent's skill management interface.

Security & Verification Notice

We perform automated surface-level scans (Gen AI Scanner, Socket, Snyk) during installation. These checks detect common vulnerabilities but do not guarantee complete security. Always review skill source code and verify the publisher's reputation before production use.

Skills execute code in your development environment. Always verify the publisher's identity, review recent commits, and test in isolated environments before production deployment.

List & Monetize Your Skill

Submit your Claude Code skill and start earning

GET_STARTED →

Use Cases

User Story & Requirements Generation

Create detailed user stories, acceptance criteria, and feature specs

Example

Generate user stories for 'password reset feature' with acceptance criteria, edge cases, and test scenarios

Reduce spec writing time by 50%, ensure comprehensive coverage

Competitive Analysis

Research competitors, compare features, identify gaps

Example

Analyze 5 competitor products, create feature comparison matrix, suggest differentiation opportunities

Complete competitive research in 2 hours instead of 2 days

Roadmap Prioritization

Evaluate features using frameworks (RICE, ICE, Kano) and create prioritized backlogs

Example

Score 20 feature ideas using RICE framework, generate prioritized roadmap with rationale

Make data-driven prioritization decisions faster

Stakeholder Communication

Draft PRDs, status updates, and stakeholder presentations

Example

Create executive summary of Q3 roadmap, monthly progress report, feature launch announcement

Save 3-5 hours/week on communication overhead

Implementation Guide

Prerequisites

  • Claude Desktop or compatible AI client
  • Access to product documentation and roadmap tools (Jira, Notion, etc.)
  • Understanding of product management frameworks (RICE, Jobs-to-be-Done, etc.)
  • Stakeholder contact information and communication channels

Time Estimate

30-60 minutes to see productivity improvements

Installation Steps

  1. 1.Install product management skill
  2. 2.Start with user story generation for known feature
  3. 3.Progress to competitive analysis: research 2-3 competitors
  4. 4.Use for roadmap prioritization: apply RICE/ICE scoring
  5. 5.Draft stakeholder communications and refine based on feedback
  6. 6.Build template library for recurring PM tasks
  7. 7.Share effective prompts with product team

Common Pitfalls

  • Not validating competitive research—verify facts before sharing
  • Accepting user stories without involving engineering team
  • Over-relying on frameworks without qualitative judgment
  • Not customizing outputs to company culture and communication style
  • Skipping stakeholder validation of generated requirements

Best Practices

✓ Do

  • +Validate research and competitive analysis with real data
  • +Collaborate with engineering when generating technical requirements
  • +Customize frameworks and templates to your company context
  • +Use skill for first drafts, refine with stakeholder input
  • +Document successful prompt patterns for PM tasks
  • +Combine AI efficiency with human judgment and intuition

✗ Don't

  • Don't publish competitive analysis without fact-checking
  • Don't finalize user stories without engineering review
  • Don't make prioritization decisions solely on AI scoring
  • Don't skip customer validation of generated requirements
  • Don't ignore company-specific context and culture

💡 Pro Tips

  • Provide context: company goals, constraints, customer feedback
  • Ask for alternatives: 'Show 3 ways to prioritize this roadmap'
  • Request stakeholder-specific formatting: 'Executive summary vs. engineering spec'
  • Use skill for 70% generation + 30% customization to company needs

When to Use This

✓ Use When

Use for user story writing, competitive research, roadmap prioritization, stakeholder communication, and PRD drafting. Best for reducing repetitive documentation and research work.

✗ Avoid When

Avoid for strategic product vision (requires deep customer empathy), pricing decisions (needs market and financial expertise), or when face-to-face customer discovery is more valuable than speed.

Learning Path

  1. 1Basic: user stories, feature specs, status updates
  2. 2Intermediate: competitive analysis, prioritization frameworks, PRDs
  3. 3Advanced: product strategy, go-to-market planning, OKR setting
  4. 4Expert: product vision, market positioning, business model innovation

Discussion

Product Hunt–style comments (not star reviews)
  • No comments yet — start the thread.
general reviews

Ratings

4.758 reviews
  • Amina Okafor· Dec 24, 2024

    Useful defaults in tooluniverse-polygenic-risk-score — fewer surprises than typical one-off scripts, and it plays nicely with `npx skills` flows.

  • Liam Sharma· Dec 24, 2024

    tooluniverse-polygenic-risk-score is among the better-maintained entries we tried; worth keeping pinned for repeat workflows.

  • Shikha Mishra· Dec 16, 2024

    tooluniverse-polygenic-risk-score has been reliable in day-to-day use. Documentation quality is above average for community skills.

  • Kaira Ramirez· Dec 16, 2024

    tooluniverse-polygenic-risk-score reduced setup friction for our internal harness; good balance of opinion and flexibility.

  • Tariq Verma· Dec 12, 2024

    We added tooluniverse-polygenic-risk-score from the explainx registry; install was straightforward and the SKILL.md answered most questions upfront.

  • Kaira Perez· Nov 15, 2024

    tooluniverse-polygenic-risk-score is among the better-maintained entries we tried; worth keeping pinned for repeat workflows.

  • Tariq Srinivasan· Nov 15, 2024

    Useful defaults in tooluniverse-polygenic-risk-score — fewer surprises than typical one-off scripts, and it plays nicely with `npx skills` flows.

  • Yash Thakker· Nov 7, 2024

    Keeps context tight: tooluniverse-polygenic-risk-score is the kind of skill you can hand to a new teammate without a long onboarding doc.

  • Tariq Gonzalez· Nov 7, 2024

    Registry listing for tooluniverse-polygenic-risk-score matched our evaluation — installs cleanly and behaves as described in the markdown.

  • Olivia Bansal· Nov 3, 2024

    tooluniverse-polygenic-risk-score fits our agent workflows well — practical, well scoped, and easy to wire into existing repos.

showing 1-10 of 58

1 / 6