pytdc▌
davila7/claude-code-templates · updated Apr 8, 2026
MDX-style export adds YAML metadata + attribution linking explainx.ai and this canonical listing URL.
PyTDC is an open-science platform providing AI-ready datasets and benchmarks for drug discovery and development. Access curated datasets spanning the entire therapeutics pipeline with standardized evaluation metrics and meaningful data splits, organized into three categories: single-instance prediction (molecular/protein properties), multi-instance prediction (drug-target interactions, DDI), and generation (molecule generation, retrosynthesis).
PyTDC (Therapeutics Data Commons)
Overview
PyTDC is an open-science platform providing AI-ready datasets and benchmarks for drug discovery and development. Access curated datasets spanning the entire therapeutics pipeline with standardized evaluation metrics and meaningful data splits, organized into three categories: single-instance prediction (molecular/protein properties), multi-instance prediction (drug-target interactions, DDI), and generation (molecule generation, retrosynthesis).
When to Use This Skill
This skill should be used when:
- Working with drug discovery or therapeutic ML datasets
- Benchmarking machine learning models on standardized pharmaceutical tasks
- Predicting molecular properties (ADME, toxicity, bioactivity)
- Predicting drug-target or drug-drug interactions
- Generating novel molecules with desired properties
- Accessing curated datasets with proper train/test splits (scaffold, cold-split)
- Using molecular oracles for property optimization
Installation & Setup
Install PyTDC using pip:
uv pip install PyTDC
To upgrade to the latest version:
uv pip install PyTDC --upgrade
Core dependencies (automatically installed):
- numpy, pandas, tqdm, seaborn, scikit_learn, fuzzywuzzy
Additional packages are installed automatically as needed for specific features.
Quick Start
The basic pattern for accessing any TDC dataset follows this structure:
from tdc.<problem> import <Task>
data = <Task>(name='<Dataset>')
split = data.get_split(method='scaffold', seed=1, frac=[0.7, 0.1, 0.2])
df = data.get_data(format='df')
Where:
<problem>: One ofsingle_pred,multi_pred, orgeneration<Task>: Specific task category (e.g., ADME, DTI, MolGen)<Dataset>: Dataset name within that task
Example - Loading ADME data:
from tdc.single_pred import ADME
data = ADME(name='Caco2_Wang')
split = data.get_split(method='scaffold')
# Returns dict with 'train', 'valid', 'test' DataFrames
Single-Instance Prediction Tasks
Single-instance prediction involves forecasting properties of individual biomedical entities (molecules, proteins, etc.).
Available Task Categories
1. ADME (Absorption, Distribution, Metabolism, Excretion)
Predict pharmacokinetic properties of drug molecules.
from tdc.single_pred import ADME
data = ADME(name='Caco2_Wang') # Intestinal permeability
# Other datasets: HIA_Hou, Bioavailability_Ma, Lipophilicity_AstraZeneca, etc.
Common ADME datasets:
- Caco2 - Intestinal permeability
- HIA - Human intestinal absorption
- Bioavailability - Oral bioavailability
- Lipophilicity - Octanol-water partition coefficient
- Solubility - Aqueous solubility
- BBB - Blood-brain barrier penetration
- CYP - Cytochrome P450 metabolism
2. Toxicity (Tox)
Predict toxicity and adverse effects of compounds.
from tdc.single_pred import Tox
data = Tox(name='hERG') # Cardiotoxicity
# Other datasets: AMES, DILI, Carcinogens_Lagunin, etc.
Common toxicity datasets:
- hERG - Cardiac toxicity
- AMES - Mutagenicity
- DILI - Drug-induced liver injury
- Carcinogens - Carcinogenicity
- ClinTox - Clinical trial toxicity
3. HTS (High-Throughput Screening)
Bioactivity predictions from screening data.
from tdc.single_pred import HTS
data = HTS(name='SARSCoV2_Vitro_Touret')
4. QM (Quantum Mechanics)
Quantum mechanical properties of molecules.
from tdc.single_pred import QM
data = QM(name='QM7')
5. Other Single Prediction Tasks
- Yields: Chemical reaction yield prediction
- Epitope: Epitope prediction for biologics
- Develop: Development-stage predictions
- CRISPROutcome: Gene editing outcome prediction
Data Format
Single prediction datasets typically return DataFrames with columns:
Drug_IDorCompound_ID: Unique identifierDrugorX: SMILES string or molecular representationY: Target label (continuous or binary)
Multi-Instance Prediction Tasks
Multi-instance prediction involves forecasting properties of interactions between multiple biomedical entities.
Available Task Categories
1. DTI (Drug-Target Interaction)
Predict binding affinity between drugs and protein targets.
from tdc.multi_pred import DTI
data = DTI(name='BindingDB_Kd')
split = data.get_split()
Available datasets:
- BindingDB_Kd - Dissociation constant (52,284 pairs)
- BindingDB_IC50 - Half-maximal inhibitory concentration (991,486 pairs)
- BindingDB_Ki - Inhibition constant (375,032 pairs)
- DAVIS, KIBA - Kinase binding datasets
Data format: Drug_ID, Target_ID, Drug (SMILES), Target (sequence), Y (binding affinity)
2. DDI (Drug-Drug Interaction)
Predict interactions between drug pairs.
from tdc.multi_pred import DDI
data = DDI(name='DrugBank')
split = data.get_split()
Multi-class classification task predicting interaction types. Dataset contains 191,808 DDI pairs with 1,706 drugs.
3. PPI (Protein-Protein Interaction)
Predict protein-protein interactions.
from tdc.multi_pred import PPI
data = PPI(name='HuRI')
4. Other Multi-Prediction Tasks
- GDA: Gene-disease associations
- DrugRes: Drug resistance prediction
- DrugSyn: Drug synergy prediction
- PeptideMHC: Peptide-MHC binding
- AntibodyAff: Antibody affinity prediction
- MTI: miRNA-target interactions
- Catalyst: Catalyst prediction
- TrialOutcome: Clinical trial outcome prediction
Generation Tasks
Generation tasks involve creating novel biomedical entities with desired properties.
1. Molecular Generation (MolGen)
Generate diverse, novel molecules with desirable chemical properties.
from tdc.generation import MolGen
data = MolGen(name='ChEMBL_V29')
split = data.get_split()
Use with oracles to optimize for specific properties:
from tdc import Oracle
oracle = Oracle(name='GSK3B')
score = oracle('CC(C)Cc1ccc(cc1)C(C)C(O)=O') # Evaluate SMILES
See references/oracles.md for all available oracle functions.
2. Retrosynthesis (RetroSyn)
Predict reactants needed to synthesize a target molecule.
from tdc.generation import RetroSyn
data = RetroSyn(name='USPTO')
split = data.get_split()
Dataset contains 1,939,253 reactions from USPTO database.
3. Paired Molecule Generation
Generate molecule pairs (e.g., prodrug-drug pairs).
from tdc.generation import PairMolGen
data = PairMolGen(name='Prodrug')
For detailed oracle documentation and molecular generation workflows, refer to references/oracles.md and scripts/molecular_generation.py.
Benchmark Groups
Benchmark groups provide curated collections of related datasets for systematic model evaluation.
ADMET Benchmark Group
from tdc.benchmark_group import admet_group
group = admet_group(path='data/')
# Get benchmark datasets
benchmark = group.get('Caco2_Wang')
predictions = {}
for seed in [1, 2, 3, 4, 5]:
train, valid = benchmark['train'], benchmark['valid']
# Train model here
predictions[seed] = model.predict(benchmark['test'])
# Evaluate with required 5 seeds
results = group.evaluate(predictions)
ADMET Group includes 22 datasets covering absorption, distribution, metabolism, excretion, and toxicity.
Other Benchmark Groups
Available benchmark groups include collections for:
- ADMET properties
- Drug-target interactions
- Drug combination prediction
- And more specialized therapeutic tasks
For benchmark evaluation workflows, see scripts/benchmark_evaluation.py.
Data Functions
TDC provides comprehensive data processing utilities organized into four categories.
1. Dataset Splits
Retrieve train/validation/test partitions with various strategies:
# Scaffold split (default for most tasks)
split = data.get_split(method='scaffold', seed=1, frac=[0.7, 0.1, 0.2])
# Random split
split = data.get_split(method='random', seed=42, frac=[0.8, 0.1, 0.1])
# Cold split (for DTI/DDI tasks)
split = data.get_split(method='cold_drug', seed=1) # Unseen drugs in test
split = data.get_split(method='cold_target', seed=1) # Unseen targets in test
Available split strategies:
random: Random shufflingscaffold: Scaffold-based (for chemical diversity)cold_drug,cold_target,cold_drug_target: For DTI taskstemporal: Time-based splits for temporal datasets
2. Model Evaluation
Use standardized metrics for evaluation:
from tdc import Evaluator
# For binary classification
evaluator = Evaluator(name='ROC-AUC'How to use pytdc on Cursor
AI-first code editor with Composer
Prerequisites
Before installing skills in Cursor, ensure your development environment meets these requirements:
- ›Cursor installed and configured on your development machine
- ›Node.js version 16.0+ with npm package manager (verify with
node --version) - ›Active project directory or workspace where you want to add pytdc
Execute installation command
Execute the skills CLI command in your project's root directory to begin installation:
The skills CLI fetches pytdc from GitHub repository davila7/claude-code-templates and configures it for Cursor.
Select Cursor when prompted
The CLI will show a list of available agents. Use arrow keys to navigate and space to select Cursor:
Verify installation
Confirm successful installation by checking the skill directory location:
Reload or restart Cursor to activate pytdc. Access the skill through slash commands (e.g., /pytdc) or your agent's skill management interface.
Security & Verification Notice
We perform automated surface-level scans (Gen AI Scanner, Socket, Snyk) during installation. These checks detect common vulnerabilities but do not guarantee complete security. Always review skill source code and verify the publisher's reputation before production use.
Skills execute code in your development environment. Always verify the publisher's identity, review recent commits, and test in isolated environments before production deployment.
List & Monetize Your Skill
Submit your Claude Code skill and start earning
Use Cases▌
User Story & Requirements Generation
Create detailed user stories, acceptance criteria, and feature specs
Example
Generate user stories for 'password reset feature' with acceptance criteria, edge cases, and test scenarios
Reduce spec writing time by 50%, ensure comprehensive coverage
Competitive Analysis
Research competitors, compare features, identify gaps
Example
Analyze 5 competitor products, create feature comparison matrix, suggest differentiation opportunities
Complete competitive research in 2 hours instead of 2 days
Roadmap Prioritization
Evaluate features using frameworks (RICE, ICE, Kano) and create prioritized backlogs
Example
Score 20 feature ideas using RICE framework, generate prioritized roadmap with rationale
Make data-driven prioritization decisions faster
Stakeholder Communication
Draft PRDs, status updates, and stakeholder presentations
Example
Create executive summary of Q3 roadmap, monthly progress report, feature launch announcement
Save 3-5 hours/week on communication overhead
Implementation Guide▌
Prerequisites
- ›Claude Desktop or compatible AI client
- ›Access to product documentation and roadmap tools (Jira, Notion, etc.)
- ›Understanding of product management frameworks (RICE, Jobs-to-be-Done, etc.)
- ›Stakeholder contact information and communication channels
Time Estimate
30-60 minutes to see productivity improvements
Installation Steps
- 1.Install product management skill
- 2.Start with user story generation for known feature
- 3.Progress to competitive analysis: research 2-3 competitors
- 4.Use for roadmap prioritization: apply RICE/ICE scoring
- 5.Draft stakeholder communications and refine based on feedback
- 6.Build template library for recurring PM tasks
- 7.Share effective prompts with product team
Common Pitfalls
- ⚠Not validating competitive research—verify facts before sharing
- ⚠Accepting user stories without involving engineering team
- ⚠Over-relying on frameworks without qualitative judgment
- ⚠Not customizing outputs to company culture and communication style
- ⚠Skipping stakeholder validation of generated requirements
Best Practices▌
✓ Do
- +Validate research and competitive analysis with real data
- +Collaborate with engineering when generating technical requirements
- +Customize frameworks and templates to your company context
- +Use skill for first drafts, refine with stakeholder input
- +Document successful prompt patterns for PM tasks
- +Combine AI efficiency with human judgment and intuition
✗ Don't
- −Don't publish competitive analysis without fact-checking
- −Don't finalize user stories without engineering review
- −Don't make prioritization decisions solely on AI scoring
- −Don't skip customer validation of generated requirements
- −Don't ignore company-specific context and culture
💡 Pro Tips
- ★Provide context: company goals, constraints, customer feedback
- ★Ask for alternatives: 'Show 3 ways to prioritize this roadmap'
- ★Request stakeholder-specific formatting: 'Executive summary vs. engineering spec'
- ★Use skill for 70% generation + 30% customization to company needs
When to Use This▌
✓ Use When
Use for user story writing, competitive research, roadmap prioritization, stakeholder communication, and PRD drafting. Best for reducing repetitive documentation and research work.
✗ Avoid When
Avoid for strategic product vision (requires deep customer empathy), pricing decisions (needs market and financial expertise), or when face-to-face customer discovery is more valuable than speed.
Learning Path▌
- 1Basic: user stories, feature specs, status updates
- 2Intermediate: competitive analysis, prioritization frameworks, PRDs
- 3Advanced: product strategy, go-to-market planning, OKR setting
- 4Expert: product vision, market positioning, business model innovation
Discussion
Product Hunt–style comments (not star reviews)- No comments yet — start the thread.
Ratings
4.5★★★★★54 reviews- ★★★★★Kofi Reddy· Dec 12, 2024
I recommend pytdc for anyone iterating fast on agent tooling; clear intent and a small, reviewable surface area.
- ★★★★★Layla Chen· Dec 8, 2024
pytdc reduced setup friction for our internal harness; good balance of opinion and flexibility.
- ★★★★★Anika Jackson· Nov 27, 2024
pytdc is among the better-maintained entries we tried; worth keeping pinned for repeat workflows.
- ★★★★★Layla Brown· Nov 11, 2024
pytdc fits our agent workflows well — practical, well scoped, and easy to wire into existing repos.
- ★★★★★Kofi Choi· Nov 7, 2024
Registry listing for pytdc matched our evaluation — installs cleanly and behaves as described in the markdown.
- ★★★★★Kofi Huang· Nov 3, 2024
Useful defaults in pytdc — fewer surprises than typical one-off scripts, and it plays nicely with `npx skills` flows.
- ★★★★★Kofi Sethi· Oct 26, 2024
pytdc fits our agent workflows well — practical, well scoped, and easy to wire into existing repos.
- ★★★★★Ava Menon· Oct 22, 2024
pytdc is among the better-maintained entries we tried; worth keeping pinned for repeat workflows.
- ★★★★★Amina Nasser· Oct 18, 2024
Useful defaults in pytdc — fewer surprises than typical one-off scripts, and it plays nicely with `npx skills` flows.
- ★★★★★Layla Park· Oct 2, 2024
Registry listing for pytdc matched our evaluation — installs cleanly and behaves as described in the markdown.
showing 1-10 of 54