tooluniverse-polygenic-risk-score▌
mims-harvard/tooluniverse · updated Apr 8, 2026
MDX-style export adds YAML metadata + attribution linking explainx.ai and this canonical listing URL.
Build and interpret polygenic risk scores for complex diseases using genome-wide association study (GWAS) data.
Polygenic Risk Score (PRS) Builder
Build and interpret polygenic risk scores for complex diseases using genome-wide association study (GWAS) data.
Reasoning Strategy
A polygenic risk score predicts genetic risk, not disease. A high PRS means elevated risk relative to the population — it does not mean the person will develop the condition, and a low PRS does not confer immunity. PRS performance varies dramatically across ancestries: a European-derived PRS applied to a West African population can lose 50–70% of its predictive power because the underlying GWAS was trained on European allele frequencies and LD patterns. Effect sizes from discovery GWAS are subject to winner's curse (overestimation in single studies); always prefer weights from large meta-analyses or validated PGS Catalog models. PRS should always be interpreted in the context of non-genetic risk factors — for most complex diseases, environmental factors contribute as much or more than genetics.
LOOK UP DON'T GUESS: Do not assume effect sizes, allele frequencies, or which SNPs are genome-wide significant for a trait — always query GWAS Catalog (gwas_get_associations_for_trait) for actual data. Do not assume a validated PRS model exists for a trait; check PGS Catalog via PubMed search.
Overview
Use Cases:
- "Calculate my genetic risk for type 2 diabetes"
- "Build a polygenic risk score for coronary artery disease"
- "What's my genetic predisposition to Alzheimer's disease?"
- "Interpret my PRS percentile for breast cancer risk"
What This Skill Does:
- Extracts genome-wide significant variants (p < 5e-8) from GWAS Catalog
- Builds weighted PRS models using effect sizes (beta coefficients)
- Calculates individual risk scores from genotype data
- Interprets PRS as population percentiles and risk categories
What This Skill Does NOT Do:
- Diagnose disease (PRS is probabilistic, not deterministic)
- Replace clinical assessment or genetic counseling
- Account for non-genetic factors (lifestyle, environment)
- Provide treatment recommendations
Methodology
PRS Calculation Formula
A polygenic risk score is calculated as a weighted sum across genetic variants:
PRS = Σ (dosage_i × effect_size_i)
Where:
- dosage_i: Number of effect alleles at SNP i (0, 1, or 2)
- effect_size_i: Beta coefficient or log(odds ratio) from GWAS
Standardization
Raw PRS is standardized to z-scores for interpretation:
z-score = (PRS - population_mean) / population_std
This allows comparison to population distribution and percentile calculation.
Significance Thresholds
- Genome-wide significance: p < 5×10⁻⁸ (default threshold)
- This corrects for ~1 million independent tests across the genome
- Relaxed thresholds (e.g., p < 1×10⁻⁵) can include more SNPs but may add noise
Effect Size Handling
- Continuous traits (e.g., height, BMI): Beta coefficient (units of trait per allele)
- Binary traits (e.g., disease): Odds ratio converted to log-odds (beta = ln(OR))
- Missing effect sizes or non-significant SNPs are excluded
Data Sources
This skill uses ToolUniverse GWAS tools to query:
-
GWAS Catalog (EMBL-EBI)
- Curated GWAS associations, 5000+ studies
- Tools:
gwas_search_associations(param:disease_trait,size; alsogwas_get_associations_for_trait),gwas_get_snps_for_gene(param:gene_symbol),dbsnp_get_variant_by_rsid - Note:
disease_traitsearch returns associations where the trait is one of potentially several linked EFO traits. For precise filtering, use EFO IDs viaefo_traitparam.
-
Open Targets Genetics
- Integrated genetics platform with fine-mapped credible sets
- Tools:
OpenTargets_search_gwas_studies_by_disease,EnsemblVEP_annotate_hgvs(for variant consequence/frequency)
-
Variant Annotation
gnomad_search_variants+gnomad_get_variant— population allele frequencies (ancestry-specific via VEP colocated_variants)MyVariant_query_variants— CADD, SIFT, PolyPhen, ClinVar, gnomAD in one callgnomad_get_gene_constraints— gene constraint metrics (pLI, oe_lof) for target prioritization
Key Concepts
Polygenic Risk Scores (PRS)
Polygenic risk scores aggregate the effects of many genetic variants to estimate an individual's genetic predisposition to a trait or disease. Unlike Mendelian diseases caused by single mutations, complex diseases involve hundreds to thousands of variants, each with small effects.
Key Properties:
- Continuous distribution: PRS forms a bell curve in populations
- Relative risk: Compares individual to population average
- Probabilistic: High PRS doesn't guarantee disease, low PRS doesn't guarantee protection
- Ancestry-specific: PRS accuracy depends on matching GWAS and target ancestry
GWAS (Genome-Wide Association Studies)
GWAS compare allele frequencies between cases and controls (or correlate with trait values) across millions of SNPs to identify disease-associated variants.
Study Design:
- Discovery cohort: Initial identification of associations
- Replication cohort: Validation in independent samples
- Sample size: Larger studies detect smaller effects (power ∝ √N)
- Multiple testing correction: Bonferroni-type correction for ~1M tests
Effect Sizes and Odds Ratios
- Beta (β): Change in trait per copy of effect allele
- Example: β = 0.5 kg/m² means each allele increases BMI by 0.5 units
- Odds Ratio (OR): Multiplicative change in disease odds
- OR = 1.5 means 50% increased odds per allele
- Convert to beta: β = ln(OR)
Linkage Disequilibrium (LD) and Clumping
Nearby variants are often inherited together (LD). To avoid double-counting:
- LD clumping: Select independent variants (r² < 0.1 within 1 Mb windows)
- Fine-mapping: Statistical methods to identify causal variants
- This skill uses raw associations; production PRS should include LD pruning
Population Stratification
GWAS and PRS are most accurate when ancestries match:
- Population structure: Different ancestries have different allele frequencies
- Transferability: European-trained PRS perform worse in non-European populations
- Solution: Train PRS on diverse cohorts or use ancestry-matched references
Applications
Clinical Risk Assessment
PRS can stratify individuals for:
- Screening programs: Target high-risk individuals (e.g., mammography, colonoscopy)
- Prevention strategies: Lifestyle interventions for high genetic risk
- Drug response: Pharmacogenomics based on metabolism genes
Example: Khera et al. (2018) showed PRS identifies 3× more individuals at >3-fold coronary artery disease risk than monogenic mutations.
Research Applications
- Gene discovery: PRS-based phenome-wide association studies (PheWAS)
- Genetic correlation: Compare PRS across traits
- Causal inference: Mendelian randomization using PRS as instruments
- Simulation studies: Model polygenic architecture
Personal Genomics
Consumer genetic testing (23andMe, Ancestry DNA) provides raw genotypes. Users can:
- Calculate PRS for traits not reported
- Compare to published PRS models
- Understand genetic contribution vs. lifestyle factors
Caution: Personal PRS should not replace medical advice. Results may cause anxiety if not properly contextualized.
Limitations and Considerations
- Heritability gap: PRS explains only a fraction of genetic heritability (T2D: ~50% heritable, PRS explains ~10–20%). Rare variants, epistasis, and gene-environment interactions are not captured.
- Ancestry bias: European-derived PRS performance drops substantially in non-European populations. Use multi-ancestry GWAS weights when available.
- Winner's curse: Discovery effect sizes are overestimated; use meta-analysis weights or PGS Catalog validated models.
- Not diagnostic: High PRS does not guarantee disease; low PRS does not guarantee protection. Environmental factors contribute equally or more for most complex diseases.
- Actionability varies: Alzheimer's PRS has limited actionable interventions; cardiovascular PRS can guide statin or lifestyle decisions. Always consider what the person can do with the information.
- Ethical: Genetic data is permanent and familial. GINA protects employment/health insurance in the US, but not life insurance. Provide genetic counseling context.
Workflow
1. Trait Selection
Identify the disease or trait of interest:
- Use standard terminology (e.g., "type 2 diabetes" not "T2D")
- Check GWAS Catalog for availability
- Verify sufficient GWAS studies exist (n > 10,000 samples ideal)
2. Association Collection
Query GWAS databases for genome-wide significant associations:
prs = build_polygenic_risk_score(
trait="coronary artery disease",
p_threshold=5e-8, # Genome-wide significance
max_snps=1000
)
Considerations:
- P-value threshold: 5e-8 is conservative, 1e-5 includes more variants
- LD clumping: Production systems should prune correlated SNPs
- Study quality: Prefer large meta-analyses over small studies
3. Effect Size Extraction
Extract beta coefficients or odds ratios:
- Beta for continuous traits (direct use)
- OR for binary traits (convert to log-odds)
- Handle missing values (exclude or impute from meta-analysis)
4. SNP Filtering
Quality control filters:
- MAF filter: Exclude rare variants (MAF < 0.01) for robustness
- Genotype QC: Remove SNPs with high missingness (> 10%)
- Hardy-Weinberg: Exclude SNPs violating HWE (p < 1e-6)
- Ambiguous SNPs: Remove A/T and G/C SNPs (strand ambiguity)
5. Score Calculation
Calculate weighted sum of genotype dosages:
result = calculate_personal_prs(
prs_weights=prs,
genotypes=my_genotypes,
population_mean=0.0,
population_std=1.0
)
Genotype Sources:
- 23andMe raw data export
- Ancestry DNA raw data
- Whole genome sequencing (VCF files)
- SNP array data (Illumina, Affymetrix)
6. Risk Interpretation
Convert to percentiles and risk categories:
result = interpret_prs_percentile(result)
print(f"Percentile: {result.percentile:.1f}%")
print(f"Risk: {result.risk_category}")
Risk Categories:
- Low risk: < 20th percentile (genetic protection)
- Average risk: 20-80th percentile (typical genetic predisposition)
- Elevated risk: 80-95th percentile (moderately increased risk)
- High risk: > 95th percentile (substantially increased risk)
Clinical Interpretation:
- Percentiles assume normal distribution
- Relative risk vs. average (not absolute risk)
- Combine with family history, clinical risk factors
- PRS is NOT diagnostic - many high-risk individuals never develop disease
Best Practices
- Use validated PRS from PGS Catalog when available (externally validated, includes LD clumping and ancestry-specific weights)
- Match ancestries between GWAS and target population; use multi-ancestry GWAS when available
- For highly polygenic traits (height, education), relaxed p-value thresholds capture more signal; for oligogenic traits (IBD, T1D), strict thresholds are better
- Combine PRS with clinical risk scores (Framingham, QRISK) for integrated prediction
- In research: document SNP selection criteria, LD clumping parameters, and ancestry of GWAS; validate in held-out cohorts; report R² or AUC stratified by ancestry
Disclaimer
This skill is for educational and research purposes only.
- Not for clinical diagnosis or treatment decisions
- Not validated for clinical use - use PGS Catalog models for clinical-grade PRS
- Requires genetic counseling - interpretation requires expertise
- Does not account for family history, environment, or lifestyle factors
- Ancestry-specific - accuracy depends on matching GWAS ancestry
For clinical genetic testing, consult:
- Genetic counselors (certified by ABGC/ABMGG)
- Medical geneticists
- Healthcare providers with genomics training
PRS is a rapidly evolving field. Guidelines and best practices will continue to change as research progresses.
Regulatory Status:
- FDA does not currently regulate PRS (as of 2024)
- Some countries restrict direct-to-consumer genetic risk reporting
- Check local regulations before clinical implementation
How to use tooluniverse-polygenic-risk-score on Cursor
AI-first code editor with Composer
Prerequisites
Before installing skills in Cursor, ensure your development environment meets these requirements:
- ›Cursor installed and configured on your development machine
- ›Node.js version 16.0+ with npm package manager (verify with
node --version) - ›Active project directory or workspace where you want to add tooluniverse-polygenic-risk-score
Execute installation command
Execute the skills CLI command in your project's root directory to begin installation:
The skills CLI fetches tooluniverse-polygenic-risk-score from GitHub repository mims-harvard/tooluniverse and configures it for Cursor.
Select Cursor when prompted
The CLI will show a list of available agents. Use arrow keys to navigate and space to select Cursor:
Verify installation
Confirm successful installation by checking the skill directory location:
Reload or restart Cursor to activate tooluniverse-polygenic-risk-score. Access the skill through slash commands (e.g., /tooluniverse-polygenic-risk-score) or your agent's skill management interface.
Security & Verification Notice
We perform automated surface-level scans (Gen AI Scanner, Socket, Snyk) during installation. These checks detect common vulnerabilities but do not guarantee complete security. Always review skill source code and verify the publisher's reputation before production use.
Skills execute code in your development environment. Always verify the publisher's identity, review recent commits, and test in isolated environments before production deployment.
List & Monetize Your Skill
Submit your Claude Code skill and start earning
Use Cases▌
User Story & Requirements Generation
Create detailed user stories, acceptance criteria, and feature specs
Example
Generate user stories for 'password reset feature' with acceptance criteria, edge cases, and test scenarios
Reduce spec writing time by 50%, ensure comprehensive coverage
Competitive Analysis
Research competitors, compare features, identify gaps
Example
Analyze 5 competitor products, create feature comparison matrix, suggest differentiation opportunities
Complete competitive research in 2 hours instead of 2 days
Roadmap Prioritization
Evaluate features using frameworks (RICE, ICE, Kano) and create prioritized backlogs
Example
Score 20 feature ideas using RICE framework, generate prioritized roadmap with rationale
Make data-driven prioritization decisions faster
Stakeholder Communication
Draft PRDs, status updates, and stakeholder presentations
Example
Create executive summary of Q3 roadmap, monthly progress report, feature launch announcement
Save 3-5 hours/week on communication overhead
Implementation Guide▌
Prerequisites
- ›Claude Desktop or compatible AI client
- ›Access to product documentation and roadmap tools (Jira, Notion, etc.)
- ›Understanding of product management frameworks (RICE, Jobs-to-be-Done, etc.)
- ›Stakeholder contact information and communication channels
Time Estimate
30-60 minutes to see productivity improvements
Installation Steps
- 1.Install product management skill
- 2.Start with user story generation for known feature
- 3.Progress to competitive analysis: research 2-3 competitors
- 4.Use for roadmap prioritization: apply RICE/ICE scoring
- 5.Draft stakeholder communications and refine based on feedback
- 6.Build template library for recurring PM tasks
- 7.Share effective prompts with product team
Common Pitfalls
- ⚠Not validating competitive research—verify facts before sharing
- ⚠Accepting user stories without involving engineering team
- ⚠Over-relying on frameworks without qualitative judgment
- ⚠Not customizing outputs to company culture and communication style
- ⚠Skipping stakeholder validation of generated requirements
Best Practices▌
✓ Do
- +Validate research and competitive analysis with real data
- +Collaborate with engineering when generating technical requirements
- +Customize frameworks and templates to your company context
- +Use skill for first drafts, refine with stakeholder input
- +Document successful prompt patterns for PM tasks
- +Combine AI efficiency with human judgment and intuition
✗ Don't
- −Don't publish competitive analysis without fact-checking
- −Don't finalize user stories without engineering review
- −Don't make prioritization decisions solely on AI scoring
- −Don't skip customer validation of generated requirements
- −Don't ignore company-specific context and culture
💡 Pro Tips
- ★Provide context: company goals, constraints, customer feedback
- ★Ask for alternatives: 'Show 3 ways to prioritize this roadmap'
- ★Request stakeholder-specific formatting: 'Executive summary vs. engineering spec'
- ★Use skill for 70% generation + 30% customization to company needs
When to Use This▌
✓ Use When
Use for user story writing, competitive research, roadmap prioritization, stakeholder communication, and PRD drafting. Best for reducing repetitive documentation and research work.
✗ Avoid When
Avoid for strategic product vision (requires deep customer empathy), pricing decisions (needs market and financial expertise), or when face-to-face customer discovery is more valuable than speed.
Learning Path▌
- 1Basic: user stories, feature specs, status updates
- 2Intermediate: competitive analysis, prioritization frameworks, PRDs
- 3Advanced: product strategy, go-to-market planning, OKR setting
- 4Expert: product vision, market positioning, business model innovation
Discussion
Product Hunt–style comments (not star reviews)- No comments yet — start the thread.
Ratings
4.7★★★★★58 reviews- ★★★★★Amina Okafor· Dec 24, 2024
Useful defaults in tooluniverse-polygenic-risk-score — fewer surprises than typical one-off scripts, and it plays nicely with `npx skills` flows.
- ★★★★★Liam Sharma· Dec 24, 2024
tooluniverse-polygenic-risk-score is among the better-maintained entries we tried; worth keeping pinned for repeat workflows.
- ★★★★★Shikha Mishra· Dec 16, 2024
tooluniverse-polygenic-risk-score has been reliable in day-to-day use. Documentation quality is above average for community skills.
- ★★★★★Kaira Ramirez· Dec 16, 2024
tooluniverse-polygenic-risk-score reduced setup friction for our internal harness; good balance of opinion and flexibility.
- ★★★★★Tariq Verma· Dec 12, 2024
We added tooluniverse-polygenic-risk-score from the explainx registry; install was straightforward and the SKILL.md answered most questions upfront.
- ★★★★★Kaira Perez· Nov 15, 2024
tooluniverse-polygenic-risk-score is among the better-maintained entries we tried; worth keeping pinned for repeat workflows.
- ★★★★★Tariq Srinivasan· Nov 15, 2024
Useful defaults in tooluniverse-polygenic-risk-score — fewer surprises than typical one-off scripts, and it plays nicely with `npx skills` flows.
- ★★★★★Yash Thakker· Nov 7, 2024
Keeps context tight: tooluniverse-polygenic-risk-score is the kind of skill you can hand to a new teammate without a long onboarding doc.
- ★★★★★Tariq Gonzalez· Nov 7, 2024
Registry listing for tooluniverse-polygenic-risk-score matched our evaluation — installs cleanly and behaves as described in the markdown.
- ★★★★★Olivia Bansal· Nov 3, 2024
tooluniverse-polygenic-risk-score fits our agent workflows well — practical, well scoped, and easy to wire into existing repos.
showing 1-10 of 58