tooluniverse-cancer-variant-interpretation▌
mims-harvard/tooluniverse · updated Apr 8, 2026
Comprehensive clinical interpretation of somatic mutations in cancer. Transforms a gene + variant input into an actionable precision oncology report covering clinical evidence, therapeutic options, resistance mechanisms, clinical trials, and prognostic implications.
Cancer Variant Interpretation for Precision Oncology
Comprehensive clinical interpretation of somatic mutations in cancer. Transforms a gene + variant input into an actionable precision oncology report covering clinical evidence, therapeutic options, resistance mechanisms, clinical trials, and prognostic implications.
KEY PRINCIPLES:
- Report-first approach - Create report file FIRST, then populate progressively
- Evidence-graded - Every recommendation has an evidence tier (T1-T4)
- Actionable output - Prioritized treatment options, not data dumps
- Clinical focus - Answer "what should we treat with?" not "what databases exist?"
- Resistance-aware - Always check for known resistance mechanisms
- Cancer-type specific - Tailor all recommendations to the patient's cancer type when provided
- Source-referenced - Every statement must cite the tool/database source
- English-first queries - Always use English terms in tool calls (gene names, drug names, cancer types), even if the user writes in another language. Respond in the user's language
LOOK UP, DON'T GUESS
When uncertain about any scientific fact, SEARCH databases first (PubMed, UniProt, ChEMBL, ClinVar, etc.) rather than reasoning from memory. A database-verified answer is always more reliable than a guess.
COMPUTE, DON'T DESCRIBE
When analysis requires computation (statistics, data processing, scoring, enrichment), write and run Python code via Bash. Don't describe what you would do — execute it and report actual results. Use ToolUniverse tools to retrieve data, then Python (pandas, scipy, statsmodels, matplotlib) to analyze it.
When to Use
Apply when user asks:
- "What treatments exist for EGFR L858R in lung cancer?"
- "Patient has BRAF V600E melanoma - what are the options?"
- "Is KRAS G12C targetable?"
- "Patient progressed on osimertinib - what's next?"
- "What clinical trials are available for PIK3CA E545K?"
- "Interpret this somatic mutation: TP53 R273H"
Input Parsing
Required: Gene symbol + variant notation (e.g., "EGFR L858R", "BRAF p.V600E", "EML4-ALK fusion", "HER2 amplification") Optional: Cancer type (improves specificity)
Parse the gene symbol and variant separately. For fusions, use the kinase partner as the primary gene. For amplifications/deletions, use the gene name directly. Normalize common aliases: HER2 -> ERBB2, PD-L1 -> CD274, VEGF -> VEGFA.
Phase 0: Tool Parameter Verification (CRITICAL)
BEFORE calling ANY tool for the first time, verify its parameters.
| Tool | WRONG Parameter | CORRECT Parameter |
|---|---|---|
OpenTargets_get_associated_drugs_by_target_ensemblID |
ensemblID |
ensemblId (camelCase) |
OpenTargets_get_drug_chembId_by_generic_name |
genericName |
drugName |
OpenTargets_target_disease_evidence |
ensemblID |
ensemblId + efoId |
MyGene_query_genes |
q |
query |
search_clinical_trials |
disease, biomarker |
condition, query_term (required) |
civic_get_variants_by_gene |
gene_symbol |
gene_id (CIViC numeric ID) |
drugbank_* |
any 3 params | ALL 4 required: query, case_sensitive, exact_match, limit |
ChEMBL_get_drug_mechanisms |
chembl_id |
drug_chembl_id__exact |
ensembl_lookup_gene |
no species | species='homo_sapiens' is REQUIRED |
Workflow Overview
Input: Gene symbol + Variant notation + Optional cancer type
Phase 1: Gene Disambiguation & ID Resolution
- Resolve gene to Ensembl ID, UniProt accession, Entrez ID
- Get gene function, pathways, protein domains
- Identify cancer type EFO ID (if cancer type provided)
Phase 2: Clinical Variant Evidence (CIViC)
- Find gene in CIViC (via Entrez ID matching)
- Get all variants for the gene, match specific variant
- Retrieve evidence items (predictive, prognostic, diagnostic)
Phase 3: Mutation Prevalence (cBioPortal)
- Frequency across cancer studies
- Co-occurring mutations, cancer type distribution
Phase 4: Therapeutic Associations (OpenTargets + ChEMBL + FDA + DrugBank)
- FDA-approved targeted therapies
- Clinical trial drugs (phase 2-3), drug mechanisms
- Combination therapies
Phase 5: Resistance Mechanisms
- Known resistance variants (CIViC, literature)
- Bypass pathway analysis (Reactome)
Phase 6: Clinical Trials
- Active trials recruiting for this mutation
- Trial phase, status, eligibility
Phase 7: Prognostic Impact & Pathway Context
- Survival associations (literature)
- Pathway context (Reactome), Expression data (GTEx)
Phase 8: Report Synthesis
- Executive summary, clinical actionability score
- Treatment recommendations (prioritized), completeness checklist
For detailed code snippets and API call patterns for each phase, see ANALYSIS_DETAILS.md.
Clinical Reasoning Strategies
Driver vs Passenger Reasoning
Not every mutation in a tumor is driving the cancer. Before querying databases, form a hypothesis:
- Is this gene a known oncogene or tumor suppressor? Genes like EGFR, BRAF, KRAS, TP53, PIK3CA are well-established cancer drivers. A mutation in one of these warrants deep investigation. A mutation in a gene with no known cancer role is likely a passenger.
- Is this specific mutation recurrent across tumors (hotspot)? Use cBioPortal to check. A mutation seen in hundreds of independent tumors (e.g., BRAF V600E) is almost certainly a driver. A unique, never-before-seen missense in the same gene is less certain.
- What is the predicted functional impact? Truncating mutations (nonsense, frameshift) in tumor suppressors are likely loss-of-function drivers. Missense mutations in oncogenes at known hotspot residues are likely gain-of-function drivers.
- Conclusion pattern: A recurrent mutation in a known driver gene is likely actionable. A unique mutation in a gene not associated with cancer is likely a passenger. State your assessment and the reasoning behind it.
Actionability Reasoning
Actionable means a therapy exists that targets this alteration. Think in tiers based on evidence strength:
- Tier 1: FDA-approved drug for this mutation in this cancer type. The standard of care — recommend confidently. Example reasoning: "CIViC returns Level A evidence, FDA label confirms indication."
- Tier 2: FDA-approved for this mutation in a different cancer type, or strong clinical trial evidence (phase 2-3) in this cancer type. Reasonable to consider, especially under tumor-agnostic approvals or with molecular tumor board discussion.
- Tier 3: Preclinical evidence only — cell line data, animal models, or case reports. May justify clinical trial enrollment but not off-label use.
- Tier 4: Biological rationale but no direct evidence — the mutation is in a druggable pathway, or a structurally similar mutation responds to therapy. Hypothesis-generating only.
When synthesizing, state the tier and explain WHY you assigned it based on the evidence you found, not just which database returned a hit.
Resistance Reasoning
If the patient has already been treated, ask: could this mutation be a resistance mechanism?
- On-target resistance: Mutations in the drug target gene itself that restore signaling despite drug binding. These typically emerge at the drug-binding site (e.g., EGFR T790M after erlotinib, EGFR C797S after osimertinib, ABL T315I after imatinib).
- Bypass pathway activation: Mutations in parallel signaling pathways that render the target irrelevant (e.g., MET amplification bypassing EGFR inhibition, BRAF activation bypassing MEK inhibition).
- Phenotypic transformation: Lineage changes (e.g., small cell transformation in EGFR-mutant lung cancer) that eliminate dependence on the original driver.
- Timing matters: If the mutation was detected AFTER treatment, it is more likely a resistance mechanism than if it was present at diagnosis.
When to Use Which Tool
Form your clinical hypothesis FIRST based on gene function and mutation type, THEN use tools to validate:
- CIViC (
civic_search_genes,civic_get_variants_by_gene): Your primary source for clinical evidence. Returns curated evidence items with evidence levels, clinical significance, and associated therapies. Start here for any variant with potential clinical relevance. - cBioPortal (
cBioPortal_get_mutations): Use to assess mutation prevalence — is this a hotspot? How common is it across cancer types? This informs your driver vs passenger assessment. - OpenTargets (
OpenTargets_get_associated_drugs_by_target_ensemblID): Use for actionability — what drugs target this gene? Cross-reference with CIViC evidence to assign tiers. - PubMed (
PubMed_search_articles): Use when CIViC lacks entries for your variant, or to find resistance mechanism reports and recent clinical trial results. - ClinicalTrials.gov (
search_clinical_trials): Use after establishing the variant is potentially actionable, to find enrollment opportunities.
Tool Reference (Verified Parameters)
Gene Resolution
| Tool | Key Parameters | Response Key Fields |
|---|---|---|
MyGene_query_genes |
query, species |
hits[].ensembl.gene, .entrezgene, .symbol |
UniProt_search |
query, organism, limit |
results[].accession |
OpenTargets_get_target_id_description_by_name |
targetName |
data.search.hits[].id |
ensembl_lookup_gene |
gene_id, species (REQUIRED) |
data.id, .version |
Clinical Evidence
| Tool | Key Parameters | Response Key Fields |
|---|---|---|
civic_search_genes |
query, limit |
data.genes.nodes[].id, .entrezId |
civic_get_variants_by_gene |
gene_id (CIViC numeric) |
data.gene.variants.nodes[] |
civic_get_variant |
variant_id |
data.variant |
Drug Information
| Tool | Key Parameters | Response Key Fields |
|---|---|---|
OpenTargets_get_associated_drugs_by_target_ensemblID |
ensemblId, size |
data.target.drugAndClinicalCandidates.rows[] |
FDA_get_indications_by_drug_name |
drug_name, limit |
results[].indications_and_usage |
drugbank_get_drug_basic_info_by_drug_name_or_id |
query, case_sensitive, exact_match, limit (ALL required) |
results[] |
Mutation Prevalence
| Tool | Key Parameters | Response Key Fields |
|---|---|---|
cBioPortal_get_mutations |
study_id, gene_list |
data[].proteinChange |
cBioPortal_get_cancer_studies |
limit |
[].studyId, .cancerTypeId |
Clinical Trials & Literature
| Tool | Key Parameters | Response Key Fields |
|---|---|---|
search_clinical_trials |
query_term (required), condition |
studies[] |
PubMed_search_articles |
query, limit, include_abstract |
Returns list of dicts (NOT wrapped) |
Reactome_map_uniprot_to_pathways |
id (UniProt accession) |
Pathway mappings |
GTEx_get_median_gene_expression |
gencode_id, operation="median" |
Expression by tissue |
Fallback Strategy
When a primary tool returns no results, fall back rather than reporting "no data found":
- CIViC empty -> search PubMed for "[gene] [variant] clinical evidence"
- OpenTargets no drugs -> try ChEMBL drug search by target
- cBioPortal specific study empty -> try pan-cancer study (msk_impact_2017 or similar)
- Reactome no pathways -> use UniProt function annotation for pathway context