tooluniverse-disease-research▌
mims-harvard/tooluniverse · updated Apr 8, 2026
Generate a comprehensive disease research report with full source citations. The report is created as a markdown file and progressively updated during research.
ToolUniverse Disease Research
Generate a comprehensive disease research report with full source citations. The report is created as a markdown file and progressively updated during research.
IMPORTANT: Always use English disease names and search terms in tool calls. Respond in the user's language.
LOOK UP, DON'T GUESS
When asked about a disease, query Orphanet/OMIM/DisGeNET FIRST. Don't rely on memory for prevalence, genetics, or treatment — these change over time. When you're not sure about a fact, your first instinct should be to SEARCH for it using tools, not to reason harder from memory.
When to Use
- User asks about any disease, syndrome, or medical condition
- Needs comprehensive disease intelligence or a detailed research report
- Asks "what do we know about [disease]?"
Core Workflow: Report-First Approach
DO NOT show the search process to the user. Instead:
- Create report file first - Initialize
{disease_name}_research_report.md - Research each dimension - Use all relevant tools
- Update report progressively - Write findings after each dimension
- Include citations - Every fact must reference its source tool
Disease Mechanism Reasoning
When synthesizing disease etiology, trace the full pathogenic cascade:
- Genetic basis - Which variants (rare or common) confer risk, and in which genes?
- Molecular mechanism - How do those variants alter protein function, expression, or regulation?
- Cellular effect - What downstream cellular processes are disrupted (signaling, metabolism, stress response)?
- Tissue/organ manifestation - How does cellular dysfunction present as organ-level pathology?
This chain structures the Genetic & Molecular Basis (Section 3) and Biological Pathways (Section 5) sections.
10 Research Dimensions
| Dim | Section | Key Tools |
|---|---|---|
| 1 | Identity & Classification | OSL_get_efo_id_by_disease_name, ols_search_efo_terms, ols_get_efo_term, umls_search_concepts, icd_search_codes, snomed_search_concepts |
| 2 | Clinical Presentation | OpenTargets phenotypes, HPO lookup, MedlinePlus |
| 3 | Genetic & Molecular Basis | OpenTargets targets, ClinVar variants, GWAS associations, gnomAD |
| 4 | Treatment Landscape | OpenTargets drugs, clinical trials, GtoPdb |
| 5 | Biological Pathways | Reactome pathways, humanbase_ppi_analysis, GTEx expression, HPA |
| 6 | Epidemiology & Literature | PubMed, OpenAlex, Europe PMC, Semantic Scholar |
| 7 | Similar Diseases | OpenTargets similar entities |
| 8 | Cancer-Specific (if applicable) | CIViC genes/variants/therapies |
| 9 | Pharmacology | GtoPdb targets/interactions/ligands |
| 10 | Drug Safety | OpenTargets warnings, clinical trial AEs, FAERS |
See: tool_usage_details.md for complete tool calls per section.
Report Template
Create this file structure at the start:
# Disease Research Report: {Disease Name}
**Report Generated**: {date}
**Disease Identifiers**: (to be filled)
---
## Executive Summary
(Brief 3-5 sentence overview - fill after all research complete)
---
## 1. Disease Identity & Classification
### Ontology Identifiers
| System | ID | Source |
### Synonyms & Alternative Names
### Disease Hierarchy
---
## 2. Clinical Presentation
### Phenotypes (HPO)
| HPO ID | Phenotype | Description | Source |
### Symptoms & Signs
### Diagnostic Criteria
---
## 3. Genetic & Molecular Basis
### Associated Genes
| Gene | Score | Ensembl ID | Evidence | Source |
### GWAS Associations
| SNP | P-value | Odds Ratio | Study | Source |
### Pathogenic Variants (ClinVar)
---
## 4. Treatment Landscape
### Approved Drugs
| Drug | ChEMBL ID | Mechanism | Phase | Target | Source |
### Clinical Trials
| NCT ID | Title | Phase | Status | Source |
---
## 5. Biological Pathways & Mechanisms
## 6. Epidemiology & Risk Factors
## 7. Literature & Research Activity
## 8. Similar Diseases & Comorbidities
## 9. Cancer-Specific Information (if applicable)
## 10. Drug Safety & Adverse Events
---
## References
### Tools Used
| # | Tool | Parameters | Section | Items Retrieved |
Citation Format
Every piece of data MUST include its source:
In tables: Add a Source column with tool name
In lists: - Finding [Source: tool_name]
In prose: (Source: tool_name, query: "...")
References section: Complete tool usage log with parameters
Progressive Update Pattern
# After each dimension's research:
# 1. Read current report
# 2. Replace placeholder with formatted content
# 3. Write back immediately
# 4. Continue to next dimension
Evidence Grading & Interpretation
Every finding in the report should be graded:
| Grade | Criteria | Example |
|---|---|---|
| T1 (Strong) | Replicated genetic evidence (GWAS, rare variants), FDA-approved therapy | BRCA1 → breast cancer; trastuzumab for HER2+ |
| T2 (Moderate) | Single genetic study, phase II+ trial data, strong biological evidence | FOXO3 → longevity (centenarian studies) |
| T3 (Association) | Observational data, gene expression changes, pathway membership | IL-6 elevated in Alzheimer's CSF |
| T4 (Computational) | Network proximity, text mining, predicted associations | DisGeNET text-mined gene-disease link |
Synthesis Questions (answer in Executive Summary)
After collecting data from all 10 dimensions, the report MUST answer:
- What causes this disease? Summarize the genetic architecture (monogenic vs polygenic, key loci, penetrance)
- What are the therapeutic options? Ranked by evidence level and approval status
- What biomarkers exist? For diagnosis, prognosis, and treatment selection
- What's the unmet need? What aspects lack effective treatment or understanding?
- What are the active research frontiers? Based on clinical trials and recent publications
Interpreting Cross-Database Concordance
When multiple databases provide different data for the same disease:
- OpenTargets + DisGeNET + OMIM agree on a gene: T1 evidence — high confidence
- Only OpenTargets reports an association: Check the datasource scores — genetic_association > literature > animal_model
- DisGeNET score > 0.5 but not in OpenTargets: May be text-mined; verify with PubMed
- Gene in GWAS but not OMIM: Likely a complex disease susceptibility locus, not Mendelian
Handling Conflicting Data
| Conflict | Resolution |
|---|---|
| Different prevalence estimates across sources | Report range; note the most recent/largest study |
| Drug approved in one country but not another | Note regulatory status per region |
| Gene-disease association in one DB but absent in another | Grade by evidence type; text-mining alone is T4 |
| Clinical trial results contradict label indications | The trial result is newer evidence; note both |
Final Report Quality Checklist
- All 10 sections have content (or marked "No data available")
- Every data point has a source citation
- Executive summary reflects key findings
- References section lists all tools used
- Tables properly formatted
- No placeholder text remains
Expected Output Scale
For a well-studied disease (e.g., Alzheimer's), the final report should include:
- 5+ ontology IDs, 10+ synonyms, disease hierarchy
- 20+ phenotypes with HPO IDs
- 50+ genes, 30+ GWAS associations, 100+ ClinVar variants
- 20+ drugs, 50+ clinical trials
- 10+ pathways, PPI network, expression data
- 100+ publications
- 15+ similar diseases
- Drug warnings and adverse events
Total: 500+ individual data points, each with source citation.
Cross-Skill References
For rare disease differential diagnosis, run: python3 skills/tooluniverse-rare-disease-diagnosis/scripts/clinical_patterns.py --type differential --symptoms 'symptom1,symptom2'
Reference Files
- REPORT_TEMPLATE.md - Full report markdown template and citation format guide
- RESEARCH_PROTOCOL.md - Step-by-step code procedures, progressive update pattern, quality checklist
- tool_usage_details.md - Complete tool calls for each research dimension
- TOOLS_REFERENCE.md - Complete tool documentation
- EXAMPLES.md - Sample disease research reports