Multi-Omics Integration
Coordinate and integrate multiple omics datasets for comprehensive systems biology analysis. Orchestrates specialized ToolUniverse skills to perform cross-omics correlation, multi-omics clustering, pathway-level integration, and unified interpretation.
Domain Reasoning
Multi-omics integration asks whether different molecular layers tell a concordant story. If a gene is upregulated in RNA-seq AND its protein is elevated in proteomics, that is concordant evidence of true biological change. Discordance β high mRNA but low protein, or elevated protein without matching mRNA β may indicate post-transcriptional regulation (miRNA silencing, protein degradation, translational control) and is itself a meaningful finding worth reporting. Not every discordance is noise; some are the most interesting biology.
LOOK UP DON'T GUESS
- Expected RNA-protein correlation ranges: compute Spearman r from the actual data; the typical range (0.4-0.6) is a guide, not a guarantee.
- Pathway enrichment results: run
ReactomeAnalysis_pathway_enrichment or gseapy on the actual gene lists; never list enriched pathways from memory.
- eQTL associations: query GTEx or eQTL databases for the specific variant and tissue; do not assume regulatory relationships.
- Methylation-expression directionality at specific loci: retrieve experimental data; promoter repression is the canonical model but exceptions exist.
When to Use This Skill
- User has multiple omics datasets (RNA-seq + proteomics, methylation + expression, etc.)
- Cross-omics correlation queries (e.g., "How does methylation affect expression?")
- Multi-omics biomarker discovery or patient subtyping
- Systems biology questions requiring multiple molecular layers
- Precision medicine applications with multi-omics patient data
Workflow Overview
Phase 1: Data Loading & QC
Load each omics type, format-specific QC, normalize
Supported: RNA-seq, proteomics, methylation, CNV/SNV, metabolomics
Phase 2: Sample Matching
Harmonize sample IDs, find common samples, handle missing omics
Phase 3: Feature Mapping
Map features to common gene-level identifiers
CpG->gene (promoter), CNV->gene, metabolite->enzyme
Phase 4: Cross-Omics Correlation
RNA vs Protein (translation efficiency)
Methylation vs Expression (epigenetic regulation)
CNV vs Expression (dosage effect)
eQTL variants vs Expression (genetic regulation)
Phase 5: Multi-Omics Clustering
MOFA+, NMF, SNF for patient subtyping
Phase 6: Pathway-Level Integration
Aggregate omics evidence at pathway level
Score pathway dysregulation with combined evidence
Phase 7: Biomarker Discovery
Feature selection across omics, multi-omics classification
Phase 8: Integrated Report
Summary, correlations, clusters, pathways, biomarkers
See: phase_details.md for complete code and implementation details.
Supported Data Types
| Omics |
Formats |
QC Focus |
| Transcriptomics |
CSV/TSV, HDF5, h5ad |
Low-count filter, normalize (TPM/DESeq2), log-transform |
| Proteomics |
MaxQuant, Spectronaut, DIA-NN |
Missing value imputation, median/quantile normalization |
| Methylation |
IDAT, beta matrices |
Failed probes, batch correction, cross-reactive filter |
| Genomics |
VCF, SEG (CNV) |
Variant QC, CNV segmentation |
| Metabolomics |
Peak tables |
Missing values, normalization |
Core Operations
Sample Matching
def match_samples_across_omics(omics_data_dict):
"""Match samples across multiple omics datasets."""
sample_ids = {k: set(df.columns) for k, df in omics_data_dict.items()}
common_samples = set.intersection(*sample_ids.values())
matched_data = {k: df[sorted(common_samples)] for k, df in omics_data_dict.items()}
return sorted(common_samples), matched_data
Cross-Omics Correlation
from scipy.stats import spearmanr, pearsonr
for gene in common_genes:
r, p = spearmanr(rna[gene], protein[gene])
Pathway Integration
pathway_score = mean(abs(rna_fc) + abs(protein_fc) + abs(meth_diff) + abs(cnv))
See: phase_details.md for full implementations of each operation.
Multi-Omics Clustering Methods
| Method |
Description |
Best For |
| MOFA+ |
Latent factors explaining cross-omics variation |
Identifying shared/omics-specific drivers |
| Joint NMF |
Shared decomposition across omics |
Patient subtype discovery |
| SNF |
Similarity network fusion |
Integrating heterogeneous data types |
ToolUniverse Skills Coordination
| Skill |
Used For |
Phase |
tooluniverse-rnaseq-deseq2 |
RNA-seq analysis |
1, 4 |
tooluniverse-epigenomics |
Methylation, ChIP-seq |
1, 4 |
tooluniverse-variant-analysis |
CNV/SNV processing |
1, 3, 4 |
tooluniverse-protein-interactions |
Protein network context |
6 |
tooluniverse-gene-enrichment |
Pathway enrichment |
6 |
tooluniverse-expression-data-retrieval |
Public data retrieval |
1 |
tooluniverse-target-research |
Gene/protein annotation |
3, 8 |
Use Cases
Cancer Multi-Omics
Integrate TCGA RNA-seq + proteomics + methylation + CNV to identify patient subtypes, cross-omics driver genes, and multi-omics biomarkers.
eQTL + Expression + Methylation
Identify SNP -> methylation -> expression regulatory chains (mediation analysis).
Drug Response Multi-Omics
Predict drug response using baseline multi-omics profiles; identify resistance/sensitivity pathways.
See: phase_details.md "Use Cases" for detailed step-by-step workflows.
Quantified Minimums
| Component |
Requirement |
| Omics types |
At least 2 datasets |
| Common samples |
At least 10 across omics |
| Cross-correlation |
Pearson/Spearman computed |
| Clustering |
At least one method (MOFA+, NMF, or SNF) |
| Pathway integration |
Enrichment with multi-omics evidence scores |
| Report |
Summary, correlations, clusters, pathways, biomarkers |
Limitations
- Sample size: n >= 20 recommended for integration
- Missing data: Pairwise integration if not all samples have all omics
- Batch effects: Different platforms require careful normalization
- Computational: Large datasets may require significant memory
- Interpretation: Results require domain expertise for validation
References
Detailed Reference
- phase_details.md - Complete code for all phases, correlation functions, clustering, pathway integration, biomarker discovery, report template, and detailed use cases