pathway-enrichment▌
K-Dense-AI/scientific-agent-skills · updated Jun 4, 2026
MDX-style export adds YAML metadata + attribution linking explainx.ai and this canonical listing URL.
### Pathway Enrichment
- ›name: "pathway-enrichment"
- ›description: "Run pathway and gene-set enrichment analysis on gene lists or ranked gene data, then interpret the results. Use whenever the user has a set of genes (differentially expressed genes from PyDESeq2/Scanp..."
| name | pathway-enrichment |
| description | Run pathway and gene-set enrichment analysis on gene lists or ranked gene data, then interpret the results. Use whenever the user has a set of genes (differentially expressed genes from PyDESeq2/Scanpy, CRISPR-screen hits, cluster marker genes, proteomics hits) and wants to know which biological pathways, GO terms, or gene sets are over-represented or enriched. Covers over-representation analysis (ORA / Enrichr / Fisher / hypergeometric), ranked Gene Set Enrichment Analysis (GSEA / preranked), single-sample scoring (ssGSEA/GSVA), and functional profiling via gseapy, g:Profiler, Enrichr libraries, MSigDB, GO, KEGG, Reactome, and WikiPathways — plus gene-ID mapping, choosing the right background universe, multiple-testing correction, redundancy reduction, dotplots/enrichment maps, and publication-ready tables. Use this for "pathway analysis", "enrichment analysis", "GO enrichment", "KEGG/Reactome pathways", "GSEA", "over-representation", "functional annotation", or "what pathways are my genes in". |
| license | MIT |
| metadata | version: "1.0" skill-author: K-Dense Inc. |
Pathway Enrichment
Overview
Enrichment analysis answers "what biology is over-represented in my genes?" It is the standard last step after differential expression, a screen, or clustering. There are two core methods, and choosing correctly is the single most important decision:
- ORA (over-representation analysis) — take a thresholded gene list (e.g., padj < 0.05) and test which gene sets it overlaps more than chance, using Fisher's exact / hypergeometric tests. Tools: Enrichr, g:Profiler.
- GSEA (gene set enrichment analysis) — take the whole ranked list of genes (no threshold) and test whether each gene set is concentrated toward the top or bottom. Preranked GSEA uses a per-gene score (e.g., the DESeq2
stat). Better when effects are broad and subtle.
This skill orchestrates these analyses, the gene-set databases behind them, and the interpretation pitfalls that make results wrong or unpublishable.
When to Use This Skill
Use this skill when the user wants to:
- Find enriched GO terms / KEGG / Reactome / WikiPathways / MSigDB Hallmark sets in a gene list.
- Run GSEA / preranked GSEA on DESeq2, edgeR, limma, or Scanpy
rank_genes_groupsoutput. - Score pathway activity per sample/cell (ssGSEA, GSVA).
- Interpret, deduplicate, and visualize enrichment results, or build a publication table/figure.
- Decide between ORA and GSEA, pick gene-set libraries, choose a background, or fix gene-ID problems.
For quick one-off Enrichr lookups the gget skill (gget enrichr) is lighter weight; for raw pathway/interaction APIs (Reactome, KEGG, STRING) see the database-lookup skill. Use this skill for full, defensible enrichment workflows.
Choosing the Right Method
| Situation | Method | Tool / entry point |
|---|---|---|
| You have a discrete hit list (DE genes, screen hits, cluster markers) | ORA | gp.enrichr(...) or g:Profiler |
| You have a full ranked list (every tested gene + a score) | Preranked GSEA | gp.prerank(...) |
| You have an expression matrix + class labels | GSEA | gp.gsea(...) |
| You want a pathway score per sample/cell | ssGSEA / GSVA | gp.ssgsea(...), gp.gsva(...) |
| You need a custom background or 500+ organisms | ORA with custom domain | g:Profiler (domain_scope='custom') |
| You want TF / signaling activity (PROGENy, DoRothEA) | activity inference | see references/databases-and-gene-sets.md (decoupler) |
When in doubt: a thresholded list → ORA; a ranked table with scores → GSEA. Never threshold a list and then feed it to GSEA — that discards the ranking GSEA depends on.
Setup
uv pip install gseapy gprofiler-official
# gseapy pulls pandas, numpy, scipy, matplotlib. Network access is needed for
# Enrichr, g:Profiler, and MSigDB downloads. For fully offline ORA, use a local
# GMT file with gp.enrich() (see references/gseapy.md).
Verify and list available gene-set libraries (names change over time — never hardcode blindly):
import gseapy as gp
names = gp.get_library_name(organism="human") # 200+ Enrichr libraries
print([n for n in names if "Reactome" in n or "KEGG" in n or "Hallmark" in n])
Quick Start
ORA on a hit list (gseapy + Enrichr)
import gseapy as gp
# Enrichr libraries expect HGNC gene SYMBOLS (human: UPPERCASE). Map IDs first if needed.
genes = [g.strip() for g in open("deg_symbols.txt") if g.strip()]
enr = gp.enrichr(
gene_list=genes,
gene_sets=["MSigDB_Hallmark_2020", "GO_Biological_Process_2023",
"KEGG_2021_Human", "Reactome_2022"],
organism="human",
outdir=None, # in-memory; set a path to also write tables/plots
)
res = enr.results
sig = res[res["Adjusted P-value"] < 0.05].sort_values("Adjusted P-value")
print(sig[["Gene_set", "Term", "Overlap", "Adjusted P-value", "Combined Score", "Genes"]].head(20))
Preranked GSEA from DESeq2 results
import gseapy as gp
import pandas as pd
res = pd.read_csv("deseq2_results.csv", index_col=0) # index = gene symbols
# Rank by the test statistic (sign = direction, magnitude = evidence). This is
# more stable than ranking by log2FoldChange, which is noisy for low-count genes.
rnk = res["stat"].dropna().sort_values(ascending=False)
rnk.index = rnk.index.str.upper()
rnk = rnk[~rnk.index.duplicated(keep="first")]
pre = gp.prerank(
rnk=rnk,
gene_sets=["MSigDB_Hallmark_2020", "GO_Biological_Process_2023"],
min_size=15, max_size=500, # drop tiny/huge sets (noisy or generic)
permutation_num=1000, seed=123, # seed = reproducible p-values
threads=4, outdir=None,
)
out = pre.res2d.sort_values("FDR q-val")
print(out[["Term", "ES", "NES", "NOM p-val", "FDR q-val", "Lead_genes"]].head(20))
If you have no stat column, build the rank from sign(log2FoldChange) * -log10(pvalue).
Core Workflow
For a defensible analysis, work through these steps. The middle steps (ID type, background) are where results most often silently go wrong.
Step 1 — Pin down inputs and pick the method
Confirm: which genes, what organism, is there a per-gene score (→ GSEA) or just a list (→ ORA), and what comparison they represent (direction matters for interpretation).
Step 2 — Get gene IDs into the right namespace
Enrichr/MSigDB libraries are keyed by gene symbols (human UPPERCASE, mouse Title-case). If you have Ensembl/Entrez IDs, convert first. See references/databases-and-gene-sets.md for gp.Biomart, g:Profiler g:Convert, and mygene. A silent ID mismatch is the #1 cause of "nothing is significant".
Step 3 — Choose gene-set libraries to match the question
Hallmark (broad themes) → GO:BP (mechanism) → KEGG/Reactome/WikiPathways (curated pathways) → C7 (immune), etc. Don't run 50 libraries; pick 2–4 that fit the biology. Catalog and selection guidance: references/databases-and-gene-sets.md.
Step 4 — Set the background universe (ORA only)
The background must be the genes that could have been detected in your assay (e.g., all expressed/tested genes), not the whole genome. The wrong background inflates significance. Enrichr uses a fixed background; when background matters, use g:Profiler with domain_scope='custom' + your background, or gp.enrich() with an explicit background. Rationale in references/interpretation.md.
Step 5 — Run the analysis
Use the Quick Start patterns or the bundled scripts/run_enrichment.py. For GSEA always set a seed and report permutation_num.
Step 6 — Filter on adjusted p-values
Use Adjusted P-value (ORA, Benjamini–Hochberg) or FDR q-val (GSEA), not raw p-values. Typical cutoff 0.05; also check the overlap/gene count so a "hit" isn't 1 gene out of a 2000-gene set.
Step 7 — Visualize
Dotplots, bar plots, enrichment maps, and GSEA running-score plots are built into gseapy (gp.dotplot, gp.barplot, gp.enrichment_map, gp.gseaplot). See references/gseapy.md.
Step 8 — Reduce redundancy and interpret
GO especially returns many near-duplicate terms. Collapse with an enrichment map (term–term similarity), leading-edge overlap, or parent terms, and report representative terms. Interpretation framework and a publication-table format are in references/interpretation.md.
Helper Script
scripts/run_enrichment.py runs ORA or GSEA end-to-end and writes a results table plus a dotplot, handling the boilerplate (symbol cleanup, dedup, NA removal, rank construction from a DESeq2 table, per-library FDR filtering).
# ORA from a hit list (one gene symbol per line)
python scripts/run_enrichment.py ora \
--genes deg_symbols.txt \
--libraries MSigDB_Hallmark_2020 GO_Biological_Process_2023 KEGG_2021_Human \
--organism human --outdir results/
# Preranked GSEA from a DESeq2 results CSV (auto-builds the rank from `stat`)
python scripts/run_enrichment.py gsea \
--deseq2 deseq2_results.csv \
--libraries MSigDB_Hallmark_2020 GO_Biological_Process_2023 \
--organism human --outdir results/ --seed 123
# Preranked GSEA from an explicit 2-column rank file (gene,score)
python scripts/run_enrichment.py gsea --rnk ranked_genes.csv --outdir results/
Run python scripts/run_enrichment.py --help for all options (background file, FDR cutoff, min/max set size, permutations).
Common Pitfalls
These cause most wrong or irreproducible results:
- Gene-ID / organism mismatch — symbols vs Ensembl, human vs mouse casing. Map IDs and set
organismcorrectly, or matches silently drop to ~zero. - Wrong background (ORA) — using the whole genome instead of the tested/expressed gene set inflates p-values. Set a custom background when it matters.
- Thresholding before GSEA — GSEA needs the full ranked list; only ORA uses a cut list.
- Ranking GSEA by log2FoldChange alone — unstable for low-count genes; prefer
statorsign(LFC) * -log10(p). - Multiple-testing across libraries — FDR is computed within a library; running many libraries multiplies tests. Report per-library FDR and stay conservative.
- Redundant GO terms — don't report 40 variants of the same term; collapse and show representatives.
- Significance ≠ relevance — check the overlap count and gene-set size; tiny sets reach significance trivially.
- List too short/long for ORA — <10 genes is underpowered; >2000 loses specificity (consider GSEA instead).
- No reproducibility metadata — Enrichr/GO libraries are versioned and drift over time. Record library names+date and set a GSEA
seed.
Integration with Other Skills
- Upstream (where genes come from):
pydeseq2(DE genes +statfor GSEA),scanpy(rank_genes_groupsmarkers / scores),depmap/pytdc(screen hits), proteomics skills (pyopenms,matchms). - Databases / IDs:
database-lookup(Reactome, KEGG, STRING, Gene Ontology APIs),gget(gget enrichrquick path,gget infofor ID mapping),bioservices. - Downstream:
scientific-visualization(custom figures),networkx(enrichment-map graphs),scientific-writing/literature-review(interpret and cite),statistical-analysis(multiple-testing details).
Reference Files
Read the relevant file when you need depth:
references/gseapy.md— full gseapy API:enrichr, offlineenrich,prerank,gsea,ssgsea,gsva,Msigdb,Biomart,get_library_name/read_gmt, every plot, result-column meanings, GMT/offline usage, and troubleshooting (rate limits, empty results).references/databases-and-gene-sets.md— GO, KEGG, Reactome, WikiPathways, MSigDB collections, Enrichr library naming, g:Profiler sources, organism handling, gene-ID conversion, library selection by question, and pointers to Reactome/STRING APIs and decoupler activity inference.references/interpretation.md— ORA vs GSEA statistics, background-universe choice, multiple-testing methods (BH vs g:SCS vs Bonferroni), leading-edge genes, redundancy reduction, effect vs significance, a publication-table template, and reproducibility checklist.
Resources
- gseapy docs: https://gseapy.readthedocs.io/ · repo: https://github.com/zqfang/GSEApy
- g:Profiler: https://biit.cs.ut.ee/gprofiler/ · Python client: https://pypi.org/project/gprofiler-official/
- Enrichr: https://maayanlab.cloud/Enrichr/ · MSigDB: https://www.gsea-msigdb.org/gsea/msigdb/
- GSEA method: Subramanian et al. (2005) PNAS, DOI: 10.1073/pnas.0506580102
How to use pathway-enrichment on Cursor
AI-first code editor with Composer
Prerequisites
Before installing skills in Cursor, ensure your development environment meets these requirements:
- ›Cursor installed and configured on your development machine
- ›Node.js version 16.0+ with npm package manager (verify with
node --version) - ›Active project directory or workspace where you want to add pathway-enrichment
Execute installation command
Execute the skills CLI command in your project's root directory to begin installation:
The skills CLI fetches pathway-enrichment from GitHub repository K-Dense-AI/scientific-agent-skills and configures it for Cursor.
Select Cursor when prompted
The CLI will show a list of available agents. Use arrow keys to navigate and space to select Cursor:
Verify installation
Confirm successful installation by checking the skill directory location:
Reload or restart Cursor to activate pathway-enrichment. Access the skill through slash commands (e.g., /pathway-enrichment) or your agent's skill management interface.
Security & Verification Notice
We perform automated surface-level scans (Gen AI Scanner, Socket, Snyk) during installation. These checks detect common vulnerabilities but do not guarantee complete security. Always review skill source code and verify the publisher's reputation before production use.
Skills execute code in your development environment. Always verify the publisher's identity, review recent commits, and test in isolated environments before production deployment.
List & Monetize Your Skill
Submit your Claude Code skill and start earning
Use Cases▌
Task Automation & Efficiency
Automate repetitive workflows and reduce manual effort
Example
Generate reports, summarize documents, draft communications
Save 3-5 hours per week on routine tasks
Knowledge Enhancement
Learn new skills, understand complex topics, get expert guidance
Example
Explain concepts, provide examples, suggest learning resources
Accelerate learning and skill development by 2x
Quality Improvement
Enhance output quality through reviews, suggestions, and refinements
Example
Review drafts, suggest improvements, catch errors
Improve work quality by 30-40% with less effort
Implementation Guide▌
Prerequisites
- ›Claude Desktop or compatible AI client with skill support
- ›Clear understanding of task or problem to solve
- ›Willingness to iterate and refine outputs
Time Estimate
15-45 minutes depending on use case complexity
Installation Steps
- 1.Install skill using provided installation command
- 2.Test with simple use case relevant to your work
- 3.Evaluate output quality and relevance
- 4.Iterate on prompts to improve results
- 5.Integrate into regular workflow if valuable
Common Pitfalls
- ⚠Expecting perfect results without iteration
- ⚠Not providing enough context in prompts
- ⚠Using skill for tasks outside its intended scope
- ⚠Accepting outputs without review and validation
Best Practices▌
✓ Do
- +Start with clear, specific prompts
- +Provide relevant context and constraints
- +Review and refine all outputs before using
- +Iterate to improve output quality
- +Document successful prompt patterns
✗ Don't
- −Don't use without understanding skill limitations
- −Don't skip validation of outputs
- −Don't share sensitive information in prompts
- −Don't expect skill to replace human judgment
💡 Pro Tips
- ★Be specific about desired format and style
- ★Ask for multiple options to choose from
- ★Request explanations to understand reasoning
- ★Combine AI efficiency with human expertise
When to Use This▌
✓ Use When
Use when skill capabilities match your task, clear ROI on time saved, and you can validate outputs. Best for repetitive tasks, learning, and quality improvement.
✗ Avoid When
Avoid when task requires deep expertise you can't validate, involves sensitive decisions, or when learning process is more valuable than speed of completion.
Learning Path▌
- 1Familiarize yourself with skill capabilities and limitations
- 2Start with low-risk, non-critical tasks
- 3Progress to more complex and valuable use cases
- 4Build expertise through regular use and experimentation
Discussion
Product Hunt–style comments (not star reviews)- No comments yet — start the thread.
Ratings
4.7★★★★★53 reviews- ★★★★★Anaya Khan· Dec 28, 2024
Useful defaults in pathway-enrichment — fewer surprises than typical one-off scripts, and it plays nicely with `npx skills` flows.
- ★★★★★Anika Haddad· Dec 4, 2024
Registry listing for pathway-enrichment matched our evaluation — installs cleanly and behaves as described in the markdown.
- ★★★★★Noah Sethi· Nov 23, 2024
pathway-enrichment reduced setup friction for our internal harness; good balance of opinion and flexibility.
- ★★★★★Anaya Haddad· Nov 23, 2024
I recommend pathway-enrichment for anyone iterating fast on agent tooling; clear intent and a small, reviewable surface area.
- ★★★★★Rahul Santra· Nov 19, 2024
Keeps context tight: pathway-enrichment is the kind of skill you can hand to a new teammate without a long onboarding doc.
- ★★★★★Nia Verma· Nov 3, 2024
Keeps context tight: pathway-enrichment is the kind of skill you can hand to a new teammate without a long onboarding doc.
- ★★★★★Anaya Torres· Oct 22, 2024
I recommend pathway-enrichment for anyone iterating fast on agent tooling; clear intent and a small, reviewable surface area.
- ★★★★★Noah Flores· Oct 14, 2024
We added pathway-enrichment from the explainx registry; install was straightforward and the SKILL.md answered most questions upfront.
- ★★★★★Harper Tandon· Oct 14, 2024
Keeps context tight: pathway-enrichment is the kind of skill you can hand to a new teammate without a long onboarding doc.
- ★★★★★Pratham Ware· Oct 10, 2024
I recommend pathway-enrichment for anyone iterating fast on agent tooling; clear intent and a small, reviewable surface area.
showing 1-10 of 53