scikit-bio

davila7/claude-code-templates · updated Apr 8, 2026

MDX-style export adds YAML metadata + attribution linking explainx.ai and this canonical listing URL.

$npx skills add https://github.com/davila7/claude-code-templates --skill scikit-bio
0 commentsdiscussion
summary

scikit-bio is a comprehensive Python library for working with biological data. Apply this skill for bioinformatics analyses spanning sequence manipulation, alignment, phylogenetics, microbial ecology, and multivariate statistics.

skill.md

scikit-bio

Overview

scikit-bio is a comprehensive Python library for working with biological data. Apply this skill for bioinformatics analyses spanning sequence manipulation, alignment, phylogenetics, microbial ecology, and multivariate statistics.

When to Use This Skill

This skill should be used when the user:

  • Works with biological sequences (DNA, RNA, protein)
  • Needs to read/write biological file formats (FASTA, FASTQ, GenBank, Newick, BIOM, etc.)
  • Performs sequence alignments or searches for motifs
  • Constructs or analyzes phylogenetic trees
  • Calculates diversity metrics (alpha/beta diversity, UniFrac distances)
  • Performs ordination analysis (PCoA, CCA, RDA)
  • Runs statistical tests on biological/ecological data (PERMANOVA, ANOSIM, Mantel)
  • Analyzes microbiome or community ecology data
  • Works with protein embeddings from language models
  • Needs to manipulate biological data tables

Core Capabilities

1. Sequence Manipulation

Work with biological sequences using specialized classes for DNA, RNA, and protein data.

Key operations:

  • Read/write sequences from FASTA, FASTQ, GenBank, EMBL formats
  • Sequence slicing, concatenation, and searching
  • Reverse complement, transcription (DNA→RNA), and translation (RNA→protein)
  • Find motifs and patterns using regex
  • Calculate distances (Hamming, k-mer based)
  • Handle sequence quality scores and metadata

Common patterns:

import skbio

# Read sequences from file
seq = skbio.DNA.read('input.fasta')

# Sequence operations
rc = seq.reverse_complement()
rna = seq.transcribe()
protein = rna.translate()

# Find motifs
motif_positions = seq.find_with_regex('ATG[ACGT]{3}')

# Check for properties
has_degens = seq.has_degenerates()
seq_no_gaps = seq.degap()

Important notes:

  • Use DNA, RNA, Protein classes for grammared sequences with validation
  • Use Sequence class for generic sequences without alphabet restrictions
  • Quality scores automatically loaded from FASTQ files into positional metadata
  • Metadata types: sequence-level (ID, description), positional (per-base), interval (regions/features)

2. Sequence Alignment

Perform pairwise and multiple sequence alignments using dynamic programming algorithms.

Key capabilities:

  • Global alignment (Needleman-Wunsch with semi-global variant)
  • Local alignment (Smith-Waterman)
  • Configurable scoring schemes (match/mismatch, gap penalties, substitution matrices)
  • CIGAR string conversion
  • Multiple sequence alignment storage and manipulation with TabularMSA

Common patterns:

from skbio.alignment import local_pairwise_align_ssw, TabularMSA

# Pairwise alignment
alignment = local_pairwise_align_ssw(seq1, seq2)

# Access aligned sequences
msa = alignment.aligned_sequences

# Read multiple alignment from file
msa = TabularMSA.read('alignment.fasta', constructor=skbio.DNA)

# Calculate consensus
consensus = msa.consensus()

Important notes:

  • Use local_pairwise_align_ssw for local alignments (faster, SSW-based)
  • Use StripedSmithWaterman for protein alignments
  • Affine gap penalties recommended for biological sequences
  • Can convert between scikit-bio, BioPython, and Biotite alignment formats

3. Phylogenetic Trees

Construct, manipulate, and analyze phylogenetic trees representing evolutionary relationships.

Key capabilities:

  • Tree construction from distance matrices (UPGMA, WPGMA, Neighbor Joining, GME, BME)
  • Tree manipulation (pruning, rerooting, traversal)
  • Distance calculations (patristic, cophenetic, Robinson-Foulds)
  • ASCII visualization
  • Newick format I/O

Common patterns:

from skbio import TreeNode
from skbio.tree import nj

# Read tree from file
tree = TreeNode.read('tree.nwk')

# Construct tree from distance matrix
tree = nj(distance_matrix)

# Tree operations
subtree = tree.shear(['taxon1', 'taxon2', 'taxon3'])
tips = [node for node in tree.tips()]
lca = tree.lowest_common_ancestor(['taxon1', 'taxon2'])

# Calculate distances
patristic_dist = tree.find('taxon1').distance(tree.find('taxon2'))
cophenetic_matrix = tree.cophenetic_matrix()

# Compare trees
rf_distance = tree.robinson_foulds(other_tree)

Important notes:

  • Use nj() for neighbor joining (classic phylogenetic method)
  • Use upgma() for UPGMA (assumes molecular clock)
  • GME and BME are highly scalable for large trees
  • Trees can be rooted or unrooted; some metrics require specific rooting

4. Diversity Analysis

Calculate alpha and beta diversity metrics for microbial ecology and community analysis.

Key capabilities:

  • Alpha diversity: richness, Shannon entropy, Simpson index, Faith's PD, Pielou's evenness
  • Beta diversity: Bray-Curtis, Jaccard, weighted/unweighted UniFrac, Euclidean distances
  • Phylogenetic diversity metrics (require tree input)
  • Rarefaction and subsampling
  • Integration with ordination and statistical tests

Common patterns:

from skbio.diversity import alpha_diversity, beta_diversity
import skbio

# Alpha diversity
alpha = alpha_diversity('shannon', counts_matrix, ids=sample_ids)
faith_pd = alpha_diversity('faith_pd', counts_matrix, ids=sample_ids,
                          tree=tree, otu_ids=feature_ids)

# Beta diversity
bc_dm = beta_diversity('braycurtis', counts_matrix, ids=sample_ids)
unifrac_dm = beta_diversity('unweighted_unifrac', counts_matrix,
                           ids=sample_ids, tree=tree, otu_ids=feature_ids)

# Get available metrics
from skbio.diversity import get_alpha_diversity_metrics
print(get_alpha_diversity_metrics())

Important notes:

  • Counts must be integers representing abundances, not relative frequencies
  • Phylogenetic metrics (Faith's PD, UniFrac) require tree and OTU ID mapping
  • Use partial_beta_diversity() for computing specific sample pairs only
  • Alpha diversity returns Series, beta diversity returns DistanceMatrix

5. Ordination Methods

Reduce high-dimensional biological data to visualizable lower-dimensional spaces.

Key capabilities:

  • PCoA (Principal Coordinate Analysis) from distance matrices
  • CA (Correspondence Analysis) for contingency tables
  • CCA (Canonical Correspondence Analysis) with environmental constraints
  • RDA (Redundancy Analysis) for linear relationships
  • Biplot projection for feature interpretation

Common patterns:

from skbio.stats.ordination import pcoa, cca

# PCoA from distance matrix
pcoa_results = pcoa(distance_matrix)
pc1 = pcoa_results.samples['PC1']
pc2 = pcoa_results.samples['PC2']

# CCA with environmental variables
cca_results = cca(species_matrix, environmental_matrix)

# Save/load ordination results
pcoa_results.write('ordination.txt')
results = skbio.OrdinationResults.read('ordination.txt')

Important notes:

  • PCoA works with any distance/dissimilarity matrix
  • CCA reveals environmental drivers of community composition
  • Ordination results include eigenvalues, proportion explained, and sample/feature coordinates
  • Results integrate with plotting libraries (matplotlib, seaborn, plotly)

6. Statistical Testing

Perform hypothesis tests specific to ecological and biological data.

Key capabilities:

  • PERMANOVA: test group differences using distance matrices
  • ANOSIM: alternative test for group differences
  • PERMDISP: test homogeneity of group dispersions
  • Mantel test: correlation between distance matrices
  • Bioenv: find environmental variables correlated with distances

Common patterns:

from skbio.stats.distance import permanova, anosim, mantel

# Test if groups differ significantly
permanova_results = permanova(distance_matrix, grouping, permutations=999)
print(f"p-value: {permanova_results['p-value']}")

# ANOSIM test
anosim_results = anosim(distance_matrix, grouping, permutations=999)

# Mantel test between two distance matrices
mantel_results = mantel(dm1, dm2, method='pearson', permutations=999)
print(f"Correlation: {mantel_results[0]}, p-value: {mantel_results[1]}")

Important notes:

  • Permutation tests provide non-parametric significance testing
  • Use 999+ permutations for robust p-values
  • PERMANOVA sensitive to dispersion differences; pair with PERMDISP
  • Mantel tests assess matrix correlation (e.g., geographic vs genetic distance)

7. File I/O and Format Conversion

Read and write 19+ biological file formats with automatic format detection.

Supported formats:

  • Sequences: FASTA, FASTQ, GenBank, EMBL, QSeq
  • Alignments: Clustal, PHYLIP, Stockholm
  • Trees: Newick
  • Tables: BIOM (HDF5 and JSON)
  • Distances: delimited square matrices
  • Analysis: BLAST+6/7, GFF3, Ordination results
  • Metadata: TSV/CSV with validation

Common patterns:

import skbio

# Read with automatic format detection
seq = skbio
how to use scikit-bio

How to use scikit-bio on Cursor

AI-first code editor with Composer

1

Prerequisites

Before installing skills in Cursor, ensure your development environment meets these requirements:

  • Cursor installed and configured on your development machine
  • Node.js version 16.0+ with npm package manager (verify with node --version)
  • Active project directory or workspace where you want to add scikit-bio
2

Execute installation command

Execute the skills CLI command in your project's root directory to begin installation:

$npx skills add https://github.com/davila7/claude-code-templates --skill scikit-bio

The skills CLI fetches scikit-bio from GitHub repository davila7/claude-code-templates and configures it for Cursor.

3

Select Cursor when prompted

The CLI will show a list of available agents. Use arrow keys to navigate and space to select Cursor:

◆ Which agents do you want to install to?
│ ── Universal (.agents/skills) ── always included ────
│ • Amp
│ • Antigravity
│ • Cline
│ • Codex
│ ●Cursor(selected)
│ • Cursor
│ • Windsurf
4

Verify installation

Confirm successful installation by checking the skill directory location:

.cursor/skills/scikit-bio

Reload or restart Cursor to activate scikit-bio. Access the skill through slash commands (e.g., /scikit-bio) or your agent's skill management interface.

Security & Verification Notice

We perform automated surface-level scans (Gen AI Scanner, Socket, Snyk) during installation. These checks detect common vulnerabilities but do not guarantee complete security. Always review skill source code and verify the publisher's reputation before production use.

Skills execute code in your development environment. Always verify the publisher's identity, review recent commits, and test in isolated environments before production deployment.

List & Monetize Your Skill

Submit your Claude Code skill and start earning

GET_STARTED →

Use Cases

User Story & Requirements Generation

Create detailed user stories, acceptance criteria, and feature specs

Example

Generate user stories for 'password reset feature' with acceptance criteria, edge cases, and test scenarios

Reduce spec writing time by 50%, ensure comprehensive coverage

Competitive Analysis

Research competitors, compare features, identify gaps

Example

Analyze 5 competitor products, create feature comparison matrix, suggest differentiation opportunities

Complete competitive research in 2 hours instead of 2 days

Roadmap Prioritization

Evaluate features using frameworks (RICE, ICE, Kano) and create prioritized backlogs

Example

Score 20 feature ideas using RICE framework, generate prioritized roadmap with rationale

Make data-driven prioritization decisions faster

Stakeholder Communication

Draft PRDs, status updates, and stakeholder presentations

Example

Create executive summary of Q3 roadmap, monthly progress report, feature launch announcement

Save 3-5 hours/week on communication overhead

Implementation Guide

Prerequisites

  • Claude Desktop or compatible AI client
  • Access to product documentation and roadmap tools (Jira, Notion, etc.)
  • Understanding of product management frameworks (RICE, Jobs-to-be-Done, etc.)
  • Stakeholder contact information and communication channels

Time Estimate

30-60 minutes to see productivity improvements

Installation Steps

  1. 1.Install product management skill
  2. 2.Start with user story generation for known feature
  3. 3.Progress to competitive analysis: research 2-3 competitors
  4. 4.Use for roadmap prioritization: apply RICE/ICE scoring
  5. 5.Draft stakeholder communications and refine based on feedback
  6. 6.Build template library for recurring PM tasks
  7. 7.Share effective prompts with product team

Common Pitfalls

  • Not validating competitive research—verify facts before sharing
  • Accepting user stories without involving engineering team
  • Over-relying on frameworks without qualitative judgment
  • Not customizing outputs to company culture and communication style
  • Skipping stakeholder validation of generated requirements

Best Practices

✓ Do

  • +Validate research and competitive analysis with real data
  • +Collaborate with engineering when generating technical requirements
  • +Customize frameworks and templates to your company context
  • +Use skill for first drafts, refine with stakeholder input
  • +Document successful prompt patterns for PM tasks
  • +Combine AI efficiency with human judgment and intuition

✗ Don't

  • Don't publish competitive analysis without fact-checking
  • Don't finalize user stories without engineering review
  • Don't make prioritization decisions solely on AI scoring
  • Don't skip customer validation of generated requirements
  • Don't ignore company-specific context and culture

💡 Pro Tips

  • Provide context: company goals, constraints, customer feedback
  • Ask for alternatives: 'Show 3 ways to prioritize this roadmap'
  • Request stakeholder-specific formatting: 'Executive summary vs. engineering spec'
  • Use skill for 70% generation + 30% customization to company needs

When to Use This

✓ Use When

Use for user story writing, competitive research, roadmap prioritization, stakeholder communication, and PRD drafting. Best for reducing repetitive documentation and research work.

✗ Avoid When

Avoid for strategic product vision (requires deep customer empathy), pricing decisions (needs market and financial expertise), or when face-to-face customer discovery is more valuable than speed.

Learning Path

  1. 1Basic: user stories, feature specs, status updates
  2. 2Intermediate: competitive analysis, prioritization frameworks, PRDs
  3. 3Advanced: product strategy, go-to-market planning, OKR setting
  4. 4Expert: product vision, market positioning, business model innovation

Discussion

Product Hunt–style comments (not star reviews)
  • No comments yet — start the thread.
general reviews

Ratings

4.652 reviews
  • Ava Ghosh· Dec 24, 2024

    Keeps context tight: scikit-bio is the kind of skill you can hand to a new teammate without a long onboarding doc.

  • Aarav Choi· Dec 24, 2024

    I recommend scikit-bio for anyone iterating fast on agent tooling; clear intent and a small, reviewable surface area.

  • Dhruvi Jain· Dec 16, 2024

    scikit-bio reduced setup friction for our internal harness; good balance of opinion and flexibility.

  • Aisha Liu· Dec 8, 2024

    Registry listing for scikit-bio matched our evaluation — installs cleanly and behaves as described in the markdown.

  • Aanya Zhang· Nov 27, 2024

    Useful defaults in scikit-bio — fewer surprises than typical one-off scripts, and it plays nicely with `npx skills` flows.

  • Aanya Yang· Nov 15, 2024

    scikit-bio has been reliable in day-to-day use. Documentation quality is above average for community skills.

  • Valentina Torres· Nov 15, 2024

    scikit-bio reduced setup friction for our internal harness; good balance of opinion and flexibility.

  • Oshnikdeep· Nov 7, 2024

    I recommend scikit-bio for anyone iterating fast on agent tooling; clear intent and a small, reviewable surface area.

  • Ganesh Mohane· Oct 26, 2024

    Useful defaults in scikit-bio — fewer surprises than typical one-off scripts, and it plays nicely with `npx skills` flows.

  • Mateo Kim· Oct 18, 2024

    I recommend scikit-bio for anyone iterating fast on agent tooling; clear intent and a small, reviewable surface area.

showing 1-10 of 52

1 / 6