Productivity

tooluniverse-disease-research

mims-harvard/tooluniverse · updated Apr 8, 2026

$npx skills add https://github.com/mims-harvard/tooluniverse --skill tooluniverse-disease-research
summary

Generate a comprehensive disease research report with full source citations. The report is created as a markdown file and progressively updated during research.

skill.md

ToolUniverse Disease Research

Generate a comprehensive disease research report with full source citations. The report is created as a markdown file and progressively updated during research.

IMPORTANT: Always use English disease names and search terms in tool calls. Respond in the user's language.


LOOK UP, DON'T GUESS

When asked about a disease, query Orphanet/OMIM/DisGeNET FIRST. Don't rely on memory for prevalence, genetics, or treatment — these change over time. When you're not sure about a fact, your first instinct should be to SEARCH for it using tools, not to reason harder from memory.


When to Use

  • User asks about any disease, syndrome, or medical condition
  • Needs comprehensive disease intelligence or a detailed research report
  • Asks "what do we know about [disease]?"

Core Workflow: Report-First Approach

DO NOT show the search process to the user. Instead:

  1. Create report file first - Initialize {disease_name}_research_report.md
  2. Research each dimension - Use all relevant tools
  3. Update report progressively - Write findings after each dimension
  4. Include citations - Every fact must reference its source tool

Disease Mechanism Reasoning

When synthesizing disease etiology, trace the full pathogenic cascade:

  1. Genetic basis - Which variants (rare or common) confer risk, and in which genes?
  2. Molecular mechanism - How do those variants alter protein function, expression, or regulation?
  3. Cellular effect - What downstream cellular processes are disrupted (signaling, metabolism, stress response)?
  4. Tissue/organ manifestation - How does cellular dysfunction present as organ-level pathology?

This chain structures the Genetic & Molecular Basis (Section 3) and Biological Pathways (Section 5) sections.


10 Research Dimensions

Dim Section Key Tools
1 Identity & Classification OSL_get_efo_id_by_disease_name, ols_search_efo_terms, ols_get_efo_term, umls_search_concepts, icd_search_codes, snomed_search_concepts
2 Clinical Presentation OpenTargets phenotypes, HPO lookup, MedlinePlus
3 Genetic & Molecular Basis OpenTargets targets, ClinVar variants, GWAS associations, gnomAD
4 Treatment Landscape OpenTargets drugs, clinical trials, GtoPdb
5 Biological Pathways Reactome pathways, humanbase_ppi_analysis, GTEx expression, HPA
6 Epidemiology & Literature PubMed, OpenAlex, Europe PMC, Semantic Scholar
7 Similar Diseases OpenTargets similar entities
8 Cancer-Specific (if applicable) CIViC genes/variants/therapies
9 Pharmacology GtoPdb targets/interactions/ligands
10 Drug Safety OpenTargets warnings, clinical trial AEs, FAERS

See: tool_usage_details.md for complete tool calls per section.


Report Template

Create this file structure at the start:

# Disease Research Report: {Disease Name}

**Report Generated**: {date}
**Disease Identifiers**: (to be filled)

---

## Executive Summary
(Brief 3-5 sentence overview - fill after all research complete)

---

## 1. Disease Identity & Classification
### Ontology Identifiers
| System | ID | Source |

### Synonyms & Alternative Names
### Disease Hierarchy

---

## 2. Clinical Presentation
### Phenotypes (HPO)
| HPO ID | Phenotype | Description | Source |

### Symptoms & Signs
### Diagnostic Criteria

---

## 3. Genetic & Molecular Basis
### Associated Genes
| Gene | Score | Ensembl ID | Evidence | Source |

### GWAS Associations
| SNP | P-value | Odds Ratio | Study | Source |

### Pathogenic Variants (ClinVar)

---

## 4. Treatment Landscape
### Approved Drugs
| Drug | ChEMBL ID | Mechanism | Phase | Target | Source |

### Clinical Trials
| NCT ID | Title | Phase | Status | Source |

---

## 5. Biological Pathways & Mechanisms

## 6. Epidemiology & Risk Factors

## 7. Literature & Research Activity

## 8. Similar Diseases & Comorbidities

## 9. Cancer-Specific Information (if applicable)

## 10. Drug Safety & Adverse Events

---

## References
### Tools Used
| # | Tool | Parameters | Section | Items Retrieved |

Citation Format

Every piece of data MUST include its source:

In tables: Add a Source column with tool name In lists: - Finding [Source: tool_name] In prose: (Source: tool_name, query: "...") References section: Complete tool usage log with parameters


Progressive Update Pattern

# After each dimension's research:
# 1. Read current report
# 2. Replace placeholder with formatted content
# 3. Write back immediately
# 4. Continue to next dimension

Evidence Grading & Interpretation

Every finding in the report should be graded:

Grade Criteria Example
T1 (Strong) Replicated genetic evidence (GWAS, rare variants), FDA-approved therapy BRCA1 → breast cancer; trastuzumab for HER2+
T2 (Moderate) Single genetic study, phase II+ trial data, strong biological evidence FOXO3 → longevity (centenarian studies)
T3 (Association) Observational data, gene expression changes, pathway membership IL-6 elevated in Alzheimer's CSF
T4 (Computational) Network proximity, text mining, predicted associations DisGeNET text-mined gene-disease link

Synthesis Questions (answer in Executive Summary)

After collecting data from all 10 dimensions, the report MUST answer:

  1. What causes this disease? Summarize the genetic architecture (monogenic vs polygenic, key loci, penetrance)
  2. What are the therapeutic options? Ranked by evidence level and approval status
  3. What biomarkers exist? For diagnosis, prognosis, and treatment selection
  4. What's the unmet need? What aspects lack effective treatment or understanding?
  5. What are the active research frontiers? Based on clinical trials and recent publications

Interpreting Cross-Database Concordance

When multiple databases provide different data for the same disease:

  • OpenTargets + DisGeNET + OMIM agree on a gene: T1 evidence — high confidence
  • Only OpenTargets reports an association: Check the datasource scores — genetic_association > literature > animal_model
  • DisGeNET score > 0.5 but not in OpenTargets: May be text-mined; verify with PubMed
  • Gene in GWAS but not OMIM: Likely a complex disease susceptibility locus, not Mendelian

Handling Conflicting Data

Conflict Resolution
Different prevalence estimates across sources Report range; note the most recent/largest study
Drug approved in one country but not another Note regulatory status per region
Gene-disease association in one DB but absent in another Grade by evidence type; text-mining alone is T4
Clinical trial results contradict label indications The trial result is newer evidence; note both

Final Report Quality Checklist

  • All 10 sections have content (or marked "No data available")
  • Every data point has a source citation
  • Executive summary reflects key findings
  • References section lists all tools used
  • Tables properly formatted
  • No placeholder text remains

Expected Output Scale

For a well-studied disease (e.g., Alzheimer's), the final report should include:

  • 5+ ontology IDs, 10+ synonyms, disease hierarchy
  • 20+ phenotypes with HPO IDs
  • 50+ genes, 30+ GWAS associations, 100+ ClinVar variants
  • 20+ drugs, 50+ clinical trials
  • 10+ pathways, PPI network, expression data
  • 100+ publications
  • 15+ similar diseases
  • Drug warnings and adverse events

Total: 500+ individual data points, each with source citation.


Cross-Skill References

For rare disease differential diagnosis, run: python3 skills/tooluniverse-rare-disease-diagnosis/scripts/clinical_patterns.py --type differential --symptoms 'symptom1,symptom2'


Reference Files