scvi-tools Deep Learning Skill
This skill provides guidance for deep learning-based single-cell analysis using scvi-tools, the leading framework for probabilistic models in single-cell genomics.
How to Use This Skill
- Identify the appropriate workflow from the model/workflow tables below
- Read the corresponding reference file for detailed steps and code
- Use scripts in
scripts/ to avoid rewriting common code
- For installation or GPU issues, consult
references/environment_setup.md
- For debugging, consult
references/troubleshooting.md
When to Use This Skill
- When scvi-tools, scVI, scANVI, or related models are mentioned
- When deep learning-based batch correction or integration is needed
- When working with multi-modal data (CITE-seq, multiome)
- When reference mapping or label transfer is required
- When analyzing ATAC-seq or spatial transcriptomics data
- When learning latent representations of single-cell data
Model Selection Guide
| Data Type |
Model |
Primary Use Case |
| scRNA-seq |
scVI |
Unsupervised integration, DE, imputation |
| scRNA-seq + labels |
scANVI |
Label transfer, semi-supervised integration |
| CITE-seq (RNA+protein) |
totalVI |
Multi-modal integration, protein denoising |
| scATAC-seq |
PeakVI |
Chromatin accessibility analysis |
| Multiome (RNA+ATAC) |
MultiVI |
Joint modality analysis |
| Spatial + scRNA reference |
DestVI |
Cell type deconvolution |
| RNA velocity |
veloVI |
Transcriptional dynamics |
| Cross-technology |
sysVI |
System-level batch correction |
Workflow Reference Files
| Workflow |
Reference File |
Description |
| Environment Setup |
references/environment_setup.md |
Installation, GPU, version info |
| Data Preparation |
references/data_preparation.md |
Formatting data for any model |
| scRNA Integration |
references/scrna_integration.md |
scVI/scANVI batch correction |
| ATAC-seq Analysis |
references/atac_peakvi.md |
PeakVI for accessibility |
| CITE-seq Analysis |
references/citeseq_totalvi.md |
totalVI for protein+RNA |
| Multiome Analysis |
references/multiome_multivi.md |
MultiVI for RNA+ATAC |
| Spatial Deconvolution |
references/spatial_deconvolution.md |
DestVI spatial analysis |
| Label Transfer |
references/label_transfer.md |
scANVI reference mapping |
| scArches Mapping |
references/scarches_mapping.md |
Query-to-reference mapping |
| Batch Correction |
references/batch_correction_sysvi.md |
Advanced batch methods |
| RNA Velocity |
references/rna_velocity_velovi.md |
veloVI dynamics |
| Troubleshooting |
references/troubleshooting.md |
Common issues and solutions |
CLI Scripts
Modular scripts for common workflows. Chain together or modify as needed.
Pipeline Scripts
| Script |
Purpose |
Usage |
prepare_data.py |
QC, filter, HVG selection |
python scripts/prepare_data.py raw.h5ad prepared.h5ad --batch-key batch |
train_model.py |
Train any scvi-tools model |
python scripts/train_model.py prepared.h5ad results/ --model scvi |
cluster_embed.py |
Neighbors, UMAP, Leiden |
python scripts/cluster_embed.py adata.h5ad results/ |
differential_expression.py |
DE analysis |
python scripts/differential_expression.py model/ adata.h5ad de.csv --groupby leiden |
transfer_labels.py |
Label transfer with scANVI |
python scripts/transfer_labels.py ref_model/ query.h5ad results/ |
integrate_datasets.py |
Multi-dataset integration |
python scripts/integrate_datasets.py results/ data1.h5ad data2.h5ad |
validate_adata.py |
Check data compatibility |
python scripts/validate_adata.py data.h5ad --batch-key batch |
Example Workflow
python scripts/validate_adata.py raw.h5ad --batch-key batch --suggest
python scripts/prepare_data.py raw.h5ad prepared.h5ad --batch-key batch --n-hvgs 2000
python scripts/train_model.py prepared.h5ad results/ --model scvi --batch-key batch
python scripts/cluster_embed.py results/adata_trained.h5ad results/ --resolution 0.8
python scripts/differential_expression.py results/model results/adata_clustered.h5ad results/de.csv --groupby leiden
Python Utilities
The scripts/model_utils.py provides importable functions for custom workflows:
| Function |
Purpose |
prepare_adata() |
Data preparation (QC, HVG, layer setup) |
train_scvi() |
Train scVI or scANVI |
evaluate_integration() |
Compute integration metrics |
get_marker_genes() |
Extract DE markers |
save_results() |
Save model, data, plots |
auto_select_model() |
Suggest best model |
quick_clustering() |
Neighbors + UMAP + Leiden |
Critical Requirements
-
Raw counts required: scvi-tools models require integer count data
adata.layers["counts"] = adata.X.copy()
scvi.model.SCVI.setup_anndata(adata, layer="counts")
-
HVG selection: Use 2000-4000 highly variable genes
sc.pp.highly_variable_genes(adata, n_top_genes=2000, batch_key="batch", layer="counts", flavor="seurat_v3")
adata = adata[:, adata.var['highly_variable']].copy()
-
Batch information: Specify batch_key for integration
scvi.model.SCVI.setup_anndata(adata, layer="counts", batch_key="batch")
Quick Decision Tree
Need to integrate scRNA-seq data?
βββ Have cell type labels? β scANVI (references/label_transfer.md)
βββ No labels? β scVI (references/scrna_integration.md)
Have multi-modal data?
βββ CITE-seq (RNA + protein)? β totalVI (references/citeseq_totalvi.md)
βββ Multiome (RNA + ATAC)? β MultiVI (references/multiome_multivi.md)
βββ scATAC-seq only? β PeakVI (references/atac_peakvi.md)
Have spatial data?
βββ Need cell type deconvolution? β DestVI (references/spatial_deconvolution.md)
Have pre-trained reference model?
βββ Map query to reference? β scArches (references/scarches_mapping.md)
Need RNA velocity?
βββ veloVI (references/rna_velocity_velovi.md)
Strong cross-technology batch effects?
βββ sysVI (references/batch_correction_sysvi.md)
Key Resources