tooluniverse-multiomic-disease-characterization▌
mims-harvard/tooluniverse · updated Apr 8, 2026
Characterize diseases across multiple molecular layers (genomics, transcriptomics, proteomics, pathways) to provide systems-level understanding of disease mechanisms, identify therapeutic opportunities, and discover biomarker candidates.
Multi-Omics Disease Characterization Pipeline
Characterize diseases across multiple molecular layers (genomics, transcriptomics, proteomics, pathways) to provide systems-level understanding of disease mechanisms, identify therapeutic opportunities, and discover biomarker candidates.
KEY PRINCIPLES:
- Report-first approach - Create report file FIRST, then populate progressively
- Disease disambiguation FIRST - Resolve all identifiers before omics analysis
- Layer-by-layer analysis - Systematically cover all omics layers
- Cross-layer integration - Identify genes/targets appearing in multiple layers
- Evidence grading - Grade all evidence as T1 (human/clinical) to T4 (computational)
- Tissue context - Emphasize disease-relevant tissues/organs
- Quantitative scoring - Multi-Omics Confidence Score (0-100)
- Druggable focus - Prioritize targets with therapeutic potential
- Biomarker identification - Highlight diagnostic/prognostic markers
- Mechanistic synthesis - Generate testable hypotheses
- Source references - Every statement must cite tool/database
- Completeness checklist - Mandatory section showing analysis coverage
- English-first queries - Always use English terms in tool calls. Respond in user's language
Multi-omics disease characterization asks: what molecular layers are dysregulated? Genomic mutations → transcriptomic changes → proteomic effects → metabolomic consequences. Concordance across layers strengthens the finding. Discordance reveals regulatory complexity.
LOOK UP, DON'T GUESS
When uncertain about any scientific fact, SEARCH databases first rather than reasoning from memory. A database-verified answer is always more reliable than a guess.
COMPUTE, DON'T DESCRIBE
When analysis requires computation (statistics, data processing, scoring, enrichment), write and run Python code via Bash. Don't describe what you would do — execute it and report actual results. Use ToolUniverse tools to retrieve data, then Python (pandas, scipy, statsmodels, matplotlib) to analyze it.
When to Use This Skill
Apply when users:
- Ask about disease mechanisms across omics layers
- Need multi-omics characterization of a disease
- Want to understand disease at the systems biology level
- Ask "What pathways/genes/proteins are involved in [disease]?"
- Need biomarker discovery for a disease
- Want to identify druggable targets from disease profiling
- Ask for integrated genomics + transcriptomics + proteomics analysis
- Need cross-layer concordance analysis
- Ask about disease network biology / hub genes
NOT for (use other skills instead):
- Single gene/target validation -> Use
tooluniverse-drug-target-validation - Drug safety profiling -> Use
tooluniverse-adverse-event-detection - General disease overview -> Use
tooluniverse-disease-research - Variant interpretation -> Use
tooluniverse-variant-interpretation - GWAS-specific analysis -> Use
tooluniverse-gwas-*skills - Pathway-only analysis -> Use
tooluniverse-systems-biology
Input Parameters
| Parameter | Required | Description | Example |
|---|---|---|---|
| disease | Yes | Disease name, OMIM ID, EFO ID, or MONDO ID | Alzheimer disease, MONDO_0004975 |
| tissue | No | Tissue/organ of interest | brain, liver, blood |
| focus_layers | No | Specific omics layers to emphasize | genomics, transcriptomics, pathways |
Pipeline Overview
The pipeline runs 9 phases sequentially. Each phase uses specific tools documented in detail in tool-reference.md.
Phase 0: Disease Disambiguation (ALWAYS FIRST)
Resolve disease to standard identifiers (MONDO/EFO) for all downstream queries.
- Primary tool:
OpenTargets_get_disease_id_description_by_name - Get description, synonyms, therapeutic areas, disease hierarchy, cross-references
- CRITICAL: Disease IDs use underscore format (e.g.,
MONDO_0004975), NOT colon - If ambiguous, present top 3-5 options and ask user to select
Phase 1: Genomics Layer
Identify genetic variants, GWAS associations, and genetically implicated genes.
- Tools:
gwas_search_associations(useefo_idfor precision, not free-textdisease_trait),gwas_get_snps_for_gene, ClinVar, OpenTargets associated targets gnomad_get_gene_constraints— gene constraint metrics (pLI, oe_lof) to interpret whether LoF variants are tolerated vs. haploinsufficient- Get top 10-15 genes with genetic evidence scores; track Ensembl IDs for downstream phases
Phase 2: Transcriptomics Layer
Identify differentially expressed genes, tissue-specific expression, and expression-based biomarkers.
GTEx_get_expression_summary— baseline expression across 54 tissues (acceptsgene_symboldirectly)- Tools: Expression Atlas, HPA (tissue expression), EuropePMC scores
- Check expression in disease-relevant tissues for top genes from Phase 1
Phase 3: Proteomics & Interaction Layer
Map protein-protein interactions, identify hub genes, and characterize interaction networks.
UniProt_get_function_by_accession— protein function narrative (essential for mechanistic context)- Tools:
STRING_get_network(param:identifiers,species=9606),intact_get_interactions, HumanBase - Build PPI network from top 15-20 genes; identify hub genes by degree centrality
Phase 4: Pathway & Network Layer
Identify enriched biological pathways and cross-pathway connections.
ReactomeAnalysis_pathway_enrichment— identifiers are newline-separated (\n), NOT space-separatedenrichr_gene_enrichment_analysis— param:gene_list(array),libs(array). NOTE:datafield is a JSON string that needs parsingkegg_search_pathway— pathway keyword search
Phase 5: Gene Ontology & Functional Annotation
Characterize biological processes, molecular functions, and cellular components.
- Tools: Enrichr (GO libraries), QuickGO, GO annotations, OpenTargets GO
- Run GO enrichment for all 3 aspects (BP, MF, CC)
Phase 6: Therapeutic Landscape
Map approved drugs, druggable targets, repurposing opportunities, and clinical trials.
DGIdb_get_drug_gene_interactions— drug interactions by gene (param:genesas array). Often more comprehensive than OpenTargets for drug-gene data.- OpenTargets drugs/tractability (use EFO IDs like
EFO_0000384for Crohn's, not MONDO — MONDO IDs may return null for drug queries) search_clinical_trials—query_termis REQUIRED
Phase 7: Multi-Omics Integration
Integrate findings across all layers. See integration-scoring.md for full details.
- Cross-layer gene concordance: count layers per gene, score multi-layer hub genes
- Direction concordance: genetics + expression agreement
- Biomarker identification: diagnostic, prognostic, predictive
- Mechanistic hypothesis generation
Phase 8: Report Finalization
Write executive summary, calculate confidence score, verify completeness.
- See
integration-scoring.mdfor quality checklist and scoring formula
Key Tool Parameter Notes
These are the most common parameter pitfalls:
OpenTargetsdisease IDs: underscore format (MONDO_0004975), NOT colonSTRINGprotein_ids: must be array (['APOE']), not stringenrichrlibs: must be array (['KEGG_2021_Human'])HPA_get_rna_expression_by_source: ALL 3 params required (gene_name,source_type,source_name)humanbase_ppi_analysis: ALL params required (gene_list,tissue,max_node,interaction,string_mode)expression_atlas_disease_target_score:pageSizeis REQUIREDsearch_clinical_trials:query_termis REQUIRED even ifconditionis provided
For full tool parameters and per-phase workflows, see tool-reference.md.
Reference Files
All detailed content is in reference files in this directory:
| File | Contents |
|---|---|
tool-reference.md |
Full tool parameters, inputs/outputs, per-phase workflows, quick reference table |
report-template.md |
Complete report markdown template with all sections and checklists |
integration-scoring.md |
Confidence score formula (0-100), evidence grading (T1-T4), integration procedures, quality checklist |
response-formats.md |
Verified JSON response structures for key tools |
use-patterns.md |
Common use patterns, edge case handling, fallback strategies |