tooluniverse-immune-repertoire-analysis▌
mims-harvard/tooluniverse · updated Apr 8, 2026
Comprehensive skill for analyzing T-cell receptor (TCR) and B-cell receptor (BCR) repertoire sequencing data to characterize adaptive immune responses, clonal expansion, and antigen specificity.
ToolUniverse Immune Repertoire Analysis
Comprehensive skill for analyzing T-cell receptor (TCR) and B-cell receptor (BCR) repertoire sequencing data to characterize adaptive immune responses, clonal expansion, and antigen specificity.
Domain Reasoning
Repertoire diversity reflects immune history. High clonality — a few clones dominating — indicates antigen-driven expansion, as seen in active infection, tumor-infiltrating lymphocytes, or chronic stimulation. Low diversity points to immunodeficiency or treatment-induced lymphopenia. Always compare observed metrics against healthy donor reference distributions before drawing conclusions; a Shannon entropy of 7 is unremarkable in a healthy adult but alarming post-chemotherapy.
LOOK UP DON'T GUESS
- Clonotype frequency thresholds, CDR3 length ranges, and convergence ratios: query IEDB or VDJdb; do not assume values from memory.
- Epitope specificities for expanded clones: search
iedb_search_tcell_assaysandBVBRC_search_epitopes; never infer antigen identity from CDR3 alone. - V gene family usage biases in healthy donors: retrieve published reference data or query ImmPort; do not assume baseline distributions are uniform.
- Sequencing depth adequacy: compute rarefaction curves from the actual data; do not guess whether depth is sufficient.
Overview
Adaptive immune receptor repertoire sequencing (AIRR-seq) enables comprehensive profiling of T-cell and B-cell populations through high-throughput sequencing of TCR and BCR variable regions. This skill provides an 8-phase workflow for:
- Clonotype identification and tracking
- Diversity and clonality assessment
- V(D)J gene usage analysis
- CDR3 sequence characterization
- Clonal expansion and convergence detection
- Epitope specificity prediction
- Integration with single-cell phenotyping
- Longitudinal repertoire tracking
Core Workflow
Phase 1: Data Import & Clonotype Definition
Load AIRR-seq data from common formats (MiXCR, ImmunoSEQ, AIRR standard, 10x Genomics VDJ). Standardize columns to: cloneId, count, frequency, cdr3aa, cdr3nt, v_gene, j_gene, chain. Define clonotypes using one of three methods:
- cdr3aa: Amino acid CDR3 sequence only
- cdr3nt: Nucleotide CDR3 sequence
- vj_cdr3: V gene + J gene + CDR3aa (most common, recommended)
Aggregate by clonotype, sort by count, assign ranks.
Phase 2: Diversity & Clonality Analysis
Calculate diversity metrics for the repertoire:
- Shannon entropy: Overall diversity (higher = more diverse)
- Simpson index: Probability two random clones are same
- Inverse Simpson: Effective number of clonotypes
- Gini coefficient: Inequality in clonotype distribution
- Clonality: 1 - Pielou's evenness (higher = more clonal)
- Richness: Number of unique clonotypes
Generate rarefaction curves to assess whether sequencing depth is sufficient.
Phase 3: V(D)J Gene Usage Analysis
Analyze V and J gene usage patterns weighted by clonotype count:
- V gene family usage frequencies
- J gene family usage frequencies
- V-J pairing frequencies
- Statistical testing for biased usage (chi-square test vs. uniform expectation)
Phase 4: CDR3 Sequence Analysis
Characterize CDR3 sequences:
- Length distribution: Typical TCR CDR3 = 12-18 aa; BCR CDR3 = 10-20 aa
- Amino acid composition: Weighted by clonotype frequency
- Flag unusual length distributions (may indicate PCR bias)
Phase 5: Clonal Expansion Detection
Identify expanded clonotypes above a frequency threshold (default: 95th percentile). Track clonotypes longitudinally across multiple timepoints to measure persistence, mean/max frequency, and fold changes.
Phase 6: Convergence & Public Clonotypes
- Convergent recombination: Same CDR3 amino acid from different nucleotide sequences (evidence of antigen-driven selection)
- Public clonotypes: Shared across multiple samples/individuals (may indicate common antigen responses)
Phase 7: Epitope Prediction & Specificity
Query epitope databases for known TCR-epitope associations:
- IEDB (
iedb_search_tcell_assays): Search T-cell assay records by sequence or MHC class; useiedb_search_epitopeswithsequence_containsfor motif search - BVBRC (
BVBRC_search_epitopes): Best for organism-based epitope discovery (e.g.,taxon_id="2697049"for SARS-CoV-2); returns epitope sequences with T-cell/B-cell assay counts - VDJdb (manual): https://vdjdb.cdr3.net/search
- PubMed literature (
PubMed_search_articles): Search for CDR3 + epitope/antigen/specificity - IEDB detail tools:
iedb_get_epitope_antigens(link epitope→antigen),iedb_get_epitope_mhc(MHC restriction)
Phase 8: Integration with Single-Cell Data
Link TCR/BCR clonotypes to cell phenotypes from paired single-cell RNA-seq:
- Map clonotypes to cell barcodes
- Identify expanded clonotype phenotypes on UMAP
- Analyze clonotype-cluster associations (cross-tabulation)
- Find cluster-specific clonotypes (>80% cells in one cluster)
- Differential gene expression: expanded vs. non-expanded cells
ToolUniverse Tool Integration
Key Tools Used:
iedb_search_tcell_assays- T-cell assay records (sequence, MHC class filters)iedb_search_bcell- B-cell assay recordsiedb_search_epitopes- Epitope motif search viasequence_containsBVBRC_search_epitopes- Organism-based epitope discovery (best for pathogen-specific queries)NCBI_SRA_search_runs- Find public TCR/BCR-seq datasets (use strategy="AMPLICON")ImmPort_search_studies- NIAID immunology studies (vaccine trials, flow cytometry)PubMed_search_articles- Literature on TCR/BCR specificityUniProt_get_entry_by_accession- Antigen protein information
Integration with Other Skills:
tooluniverse-single-cell- Single-cell transcriptomicstooluniverse-rnaseq-deseq2- Bulk RNA-seq analysistooluniverse-variant-analysis- Somatic hypermutation analysis (BCR)
Quick Start
from tooluniverse import ToolUniverse
# 1. Load data
tcr_data = load_airr_data("clonotypes.txt", format='mixcr')
# 2. Define clonotypes
clonotypes = define_clonotypes(tcr_data, method='vj_cdr3')
# 3. Calculate diversity
diversity = calculate_diversity(clonotypes['count'])
print(f"Shannon entropy: {diversity['shannon_entropy']:.2f}")
# 4. Detect expanded clones
expansion = detect_expanded_clones(clonotypes)
print(f"Expanded clonotypes: {expansion['n_expanded']}")
# 5. Analyze V(D)J usage
vdj_usage = analyze_vdj_usage(tcr_data)
# 6. Query epitope databases
top_clones = expansion['expanded_clonotypes']['clonotype'].head(10)
epitopes = query_epitope_database(top_clones)
Reasoning Framework for Result Interpretation
Evidence Grading
| Grade | Criteria | Example |
|---|---|---|
| Strong | Clonal expansion > 1% frequency, convergent recombination confirmed, epitope match in IEDB/VDJdb | CDR3 at 5% frequency with 3 nucleotide variants encoding same amino acid, IEDB hit |
| Moderate | Expanded clone (0.1-1%), V(D)J bias significant (chi-sq p < 0.01), partial epitope match | Clone at 0.5% with TRBV20-1 bias, similar CDR3 motif in VDJdb |
| Weak | Low-frequency expansion (0.01-0.1%), single timepoint only, no epitope database match | Moderately expanded clone without convergence or known specificity |
| Insufficient | Below detection threshold, sequencing depth < 10,000 clonotypes, no replication | Singleton clonotypes that may be PCR/sequencing artifacts |
Interpretation Guidance
- Clonality metrics: Shannon diversity measures overall repertoire complexity (higher = more diverse, typical range 5-12 for healthy blood). Gini coefficient ranges from 0 (perfectly even) to 1 (single dominant clone); values > 0.3 suggest clonal expansion. Clonality (1 - Pielou's evenness) > 0.2 indicates moderate clonal dominance; > 0.5 suggests strong oligoclonal expansion (common in active infection or tumor-infiltrating lymphocytes).
- V(D)J usage significance: Biased V or J gene usage (chi-square p < 0.01 vs expected uniform distribution) may indicate antigen-driven selection. However, baseline V gene usage is not uniform even in healthy repertoires due to genomic proximity and recombination efficiency. Compare against healthy donor reference distributions rather than uniform expectation when possible.
- CDR3 convergence meaning: Convergent recombination (same CDR3 amino acid from different nucleotide sequences) is strong evidence of antigen-driven selection because independent recombination events converged on the same receptor. Public clonotypes (shared across individuals) further strengthen this inference. A convergence ratio > 2 (nucleotide variants per amino acid sequence) for expanded clones is noteworthy.
- Sequencing depth: Rarefaction curves that plateau indicate sufficient depth. If the curve is still rising, richness and diversity estimates are underestimates. Minimum recommended depth: 50,000-100,000 total reads for bulk TCR-seq.
- Longitudinal tracking: Persistent clones across timepoints with stable or increasing frequency indicate antigen-driven maintenance. Transient expansions that disappear may reflect acute responses.
Synthesis Questions
- Does the observed clonal expansion pattern (Gini coefficient, top-clone frequency) match the expected immune context (e.g., post-vaccination expansion, tumor-infiltrating lymphocyte oligoclonality)?
- Are convergent CDR3 sequences found across multiple individuals in the cohort, suggesting a public response to a shared antigen?
- Do expanded clonotypes show biased V gene usage consistent with known antigen-specific repertoire features (e.g., TRBV20-1 enrichment in CMV-specific responses)?
- Is the sequencing depth sufficient (rarefaction plateau reached) to reliably estimate diversity metrics and detect low-frequency expanded clones?
- For longitudinal data, do clonal dynamics (expansion, contraction, persistence) correlate with clinical outcomes or treatment response?
References
- Dash P, et al. (2017) Quantifiable predictive features define epitope-specific T cell receptor repertoires. Nature
- Glanville J, et al. (2017) Identifying specificity groups in the T cell receptor repertoire. Nature
- Stubbington MJT, et al. (2016) T cell fate and clonality inference from single-cell transcriptomes. Nature Methods
- Vander Heiden JA, et al. (2014) pRESTO: a toolkit for processing high-throughput sequencing raw reads of lymphocyte receptor repertoires. Bioinformatics
See Also
ANALYSIS_DETAILS.md- Detailed code snippets for all 8 phasesUSE_CASES.md- Complete use cases (immunotherapy, vaccine, autoimmune, single-cell integration) and best practices