ToolUniverse Disease Research
Generate a comprehensive disease research report with full source citations. The report is created as a markdown file and progressively updated during research.
IMPORTANT: Always use English disease names and search terms in tool calls. Respond in the user's language.
LOOK UP, DON'T GUESS
When asked about a disease, query Orphanet/OMIM/DisGeNET FIRST. Don't rely on memory for prevalence, genetics, or treatment β these change over time. When you're not sure about a fact, your first instinct should be to SEARCH for it using tools, not to reason harder from memory.
When to Use
- User asks about any disease, syndrome, or medical condition
- Needs comprehensive disease intelligence or a detailed research report
- Asks "what do we know about [disease]?"
Core Workflow: Report-First Approach
DO NOT show the search process to the user. Instead:
- Create report file first - Initialize
{disease_name}_research_report.md
- Research each dimension - Use all relevant tools
- Update report progressively - Write findings after each dimension
- Include citations - Every fact must reference its source tool
Disease Mechanism Reasoning
When synthesizing disease etiology, trace the full pathogenic cascade:
- Genetic basis - Which variants (rare or common) confer risk, and in which genes?
- Molecular mechanism - How do those variants alter protein function, expression, or regulation?
- Cellular effect - What downstream cellular processes are disrupted (signaling, metabolism, stress response)?
- Tissue/organ manifestation - How does cellular dysfunction present as organ-level pathology?
This chain structures the Genetic & Molecular Basis (Section 3) and Biological Pathways (Section 5) sections.
10 Research Dimensions
| Dim |
Section |
Key Tools |
| 1 |
Identity & Classification |
OSL_get_efo_id_by_disease_name, ols_search_efo_terms, ols_get_efo_term, umls_search_concepts, icd_search_codes, snomed_search_concepts |
| 2 |
Clinical Presentation |
OpenTargets phenotypes, HPO lookup, MedlinePlus |
| 3 |
Genetic & Molecular Basis |
OpenTargets targets, ClinVar variants, GWAS associations, gnomAD |
| 4 |
Treatment Landscape |
OpenTargets drugs, clinical trials, GtoPdb |
| 5 |
Biological Pathways |
Reactome pathways, humanbase_ppi_analysis, GTEx expression, HPA |
| 6 |
Epidemiology & Literature |
PubMed, OpenAlex, Europe PMC, Semantic Scholar |
| 7 |
Similar Diseases |
OpenTargets similar entities |
| 8 |
Cancer-Specific (if applicable) |
CIViC genes/variants/therapies |
| 9 |
Pharmacology |
GtoPdb targets/interactions/ligands |
| 10 |
Drug Safety |
OpenTargets warnings, clinical trial AEs, FAERS |
See: tool_usage_details.md for complete tool calls per section.
Report Template
Create this file structure at the start:
# Disease Research Report: {Disease Name}
**Report Generated**: {date}
**Disease Identifiers**: (to be filled)
---
## Executive Summary
(Brief 3-5 sentence overview - fill after all research complete)
---
## 1. Disease Identity & Classification
### Ontology Identifiers
| System | ID | Source |
### Synonyms & Alternative Names
### Disease Hierarchy
---
## 2. Clinical Presentation
### Phenotypes (HPO)
| HPO ID | Phenotype | Description | Source |
### Symptoms & Signs
### Diagnostic Criteria
---
## 3. Genetic & Molecular Basis
### Associated Genes
| Gene | Score | Ensembl ID | Evidence | Source |
### GWAS Associations
| SNP | P-value | Odds Ratio | Study | Source |
### Pathogenic Variants (ClinVar)
---
## 4. Treatment Landscape
### Approved Drugs
| Drug | ChEMBL ID | Mechanism | Phase | Target | Source |
### Clinical Trials
| NCT ID | Title | Phase | Status | Source |
---
## 5. Biological Pathways & Mechanisms
## 6. Epidemiology & Risk Factors
## 7. Literature & Research Activity
## 8. Similar Diseases & Comorbidities
## 9. Cancer-Specific Information (if applicable)
## 10. Drug Safety & Adverse Events
---
## References
### Tools Used
| # | Tool | Parameters | Section | Items Retrieved |
Citation Format
Every piece of data MUST include its source:
In tables: Add a Source column with tool name
In lists: - Finding [Source: tool_name]
In prose: (Source: tool_name, query: "...")
References section: Complete tool usage log with parameters
Progressive Update Pattern
Evidence Grading & Interpretation
Every finding in the report should be graded:
| Grade |
Criteria |
Example |
| T1 (Strong) |
Replicated genetic evidence (GWAS, rare variants), FDA-approved therapy |
BRCA1 β breast cancer; trastuzumab for HER2+ |
| T2 (Moderate) |
Single genetic study, phase II+ trial data, strong biological evidence |
FOXO3 β longevity (centenarian studies) |
| T3 (Association) |
Observational data, gene expression changes, pathway membership |
IL-6 elevated in Alzheimer's CSF |
| T4 (Computational) |
Network proximity, text mining, predicted associations |
DisGeNET text-mined gene-disease link |
Synthesis Questions (answer in Executive Summary)
After collecting data from all 10 dimensions, the report MUST answer:
- What causes this disease? Summarize the genetic architecture (monogenic vs polygenic, key loci, penetrance)
- What are the therapeutic options? Ranked by evidence level and approval status
- What biomarkers exist? For diagnosis, prognosis, and treatment selection
- What's the unmet need? What aspects lack effective treatment or understanding?
- What are the active research frontiers? Based on clinical trials and recent publications
Interpreting Cross-Database Concordance
When multiple databases provide different data for the same disease:
- OpenTargets + DisGeNET + OMIM agree on a gene: T1 evidence β high confidence
- Only OpenTargets reports an association: Check the datasource scores β genetic_association > literature > animal_model
- DisGeNET score > 0.5 but not in OpenTargets: May be text-mined; verify with PubMed
- Gene in GWAS but not OMIM: Likely a complex disease susceptibility locus, not Mendelian
Handling Conflicting Data
| Conflict |
Resolution |
| Different prevalence estimates across sources |
Report range; note the most recent/largest study |
| Drug approved in one country but not another |
Note regulatory status per region |
| Gene-disease association in one DB but absent in another |
Grade by evidence type; text-mining alone is T4 |
| Clinical trial results contradict label indications |
The trial result is newer evidence; note both |
Final Report Quality Checklist
Expected Output Scale
For a well-studied disease (e.g., Alzheimer's), the final report should include:
- 5+ ontology IDs, 10+ synonyms, disease hierarchy
- 20+ phenotypes with HPO IDs
- 50+ genes, 30+ GWAS associations, 100+ ClinVar variants
- 20+ drugs, 50+ clinical trials
- 10+ pathways, PPI network, expression data
- 100+ publications
- 15+ similar diseases
- Drug warnings and adverse events
Total: 500+ individual data points, each with source citation.
Cross-Skill References
For rare disease differential diagnosis, run: python3 skills/tooluniverse-rare-disease-diagnosis/scripts/clinical_patterns.py --type differential --symptoms 'symptom1,symptom2'
Reference Files