pysam▌
K-Dense-AI/scientific-agent-skills · updated Jun 4, 2026
MDX-style export adds YAML metadata + attribution linking explainx.ai and this canonical listing URL.
### Pysam
- ›name: "pysam"
- ›description: "Genomic file toolkit. Read/write SAM/BAM/CRAM alignments, VCF/BCF variants, FASTA/FASTQ sequences, extract regions, calculate coverage, for NGS data processing pipelines."
| name | pysam |
| description | Genomic file toolkit. Read/write SAM/BAM/CRAM alignments, VCF/BCF variants, FASTA/FASTQ sequences, extract regions, calculate coverage, for NGS data processing pipelines. |
| license | MIT license |
| metadata | version: "1.0" skill-author: K-Dense Inc. |
Pysam
Overview
Pysam is a Python module for reading, manipulating, and writing genomic datasets. Read/write SAM/BAM/CRAM alignment files, VCF/BCF variant files, and FASTA/FASTQ sequences with a Pythonic interface to htslib. Query tabix-indexed files, perform pileup analysis for coverage, and execute samtools/bcftools commands.
When to Use This Skill
This skill should be used when:
- Working with sequencing alignment files (BAM/CRAM)
- Analyzing genetic variants (VCF/BCF)
- Extracting reference sequences or gene regions
- Processing raw sequencing data (FASTQ)
- Calculating coverage or read depth
- Implementing bioinformatics analysis pipelines
- Quality control of sequencing data
- Variant calling and annotation workflows
Quick Start
Installation
uv pip install pysam
Basic Examples
Read alignment file:
import pysam
# Open BAM file and fetch reads in region
samfile = pysam.AlignmentFile("example.bam", "rb")
for read in samfile.fetch("chr1", 1000, 2000):
print(f"{read.query_name}: {read.reference_start}")
samfile.close()
Read variant file:
# Open VCF file and iterate variants
vcf = pysam.VariantFile("variants.vcf")
for variant in vcf:
print(f"{variant.chrom}:{variant.pos} {variant.ref}>{variant.alts}")
vcf.close()
Query reference sequence:
# Open FASTA and extract sequence
fasta = pysam.FastaFile("reference.fasta")
sequence = fasta.fetch("chr1", 1000, 2000)
print(sequence)
fasta.close()
Core Capabilities
1. Alignment File Operations (SAM/BAM/CRAM)
Use the AlignmentFile class to work with aligned sequencing reads. This is appropriate for analyzing mapping results, calculating coverage, extracting reads, or quality control.
Common operations:
- Open and read BAM/SAM/CRAM files
- Fetch reads from specific genomic regions
- Filter reads by mapping quality, flags, or other criteria
- Write filtered or modified alignments
- Calculate coverage statistics
- Perform pileup analysis (base-by-base coverage)
- Access read sequences, quality scores, and alignment information
Reference: See references/alignment_files.md for detailed documentation on:
- Opening and reading alignment files
- AlignedSegment attributes and methods
- Region-based fetching with
fetch() - Pileup analysis for coverage
- Writing and creating BAM files
- Coordinate systems and indexing
- Performance optimization tips
2. Variant File Operations (VCF/BCF)
Use the VariantFile class to work with genetic variants from variant calling pipelines. This is appropriate for variant analysis, filtering, annotation, or population genetics.
Common operations:
- Read and write VCF/BCF files
- Query variants in specific regions
- Access variant information (position, alleles, quality)
- Extract genotype data for samples
- Filter variants by quality, allele frequency, or other criteria
- Annotate variants with additional information
- Subset samples or regions
Reference: See references/variant_files.md for detailed documentation on:
- Opening and reading variant files
- VariantRecord attributes and methods
- Accessing INFO and FORMAT fields
- Working with genotypes and samples
- Creating and writing VCF files
- Filtering and subsetting variants
- Multi-sample VCF operations
3. Sequence File Operations (FASTA/FASTQ)
Use FastaFile for random access to reference sequences and FastxFile for reading raw sequencing data. This is appropriate for extracting gene sequences, validating variants against reference, or processing raw reads.
Common operations:
- Query reference sequences by genomic coordinates
- Extract sequences for genes or regions of interest
- Read FASTQ files with quality scores
- Validate variant reference alleles
- Calculate sequence statistics
- Filter reads by quality or length
- Convert between FASTA and FASTQ formats
Reference: See references/sequence_files.md for detailed documentation on:
- FASTA file access and indexing
- Extracting sequences by region
- Handling reverse complement for genes
- Reading FASTQ files sequentially
- Quality score conversion and filtering
- Working with tabix-indexed files (BED, GTF, GFF)
- Common sequence processing patterns
4. Integrated Bioinformatics Workflows
Pysam excels at integrating multiple file types for comprehensive genomic analyses. Common workflows combine alignment files, variant files, and reference sequences.
Common workflows:
- Calculate coverage statistics for specific regions
- Validate variants against aligned reads
- Annotate variants with coverage information
- Extract sequences around variant positions
- Filter alignments or variants based on multiple criteria
- Generate coverage tracks for visualization
- Quality control across multiple data types
Reference: See references/common_workflows.md for detailed examples of:
- Quality control workflows (BAM statistics, reference consistency)
- Coverage analysis (per-base coverage, low coverage detection)
- Variant analysis (annotation, filtering by read support)
- Sequence extraction (variant contexts, gene sequences)
- Read filtering and subsetting
- Integration patterns (BAM+VCF, VCF+BED, etc.)
- Performance optimization for complex workflows
Key Concepts
Coordinate Systems
Critical: Pysam uses 0-based, half-open coordinates (Python convention):
- Start positions are 0-based (first base is position 0)
- End positions are exclusive (not included in the range)
- Region 1000-2000 includes bases 1000-1999 (1000 bases total)
Exception: Region strings in fetch() follow samtools convention (1-based):
samfile.fetch("chr1", 999, 2000) # 0-based: positions 999-1999
samfile.fetch("chr1:1000-2000") # 1-based string: positions 1000-2000
VCF files: Use 1-based coordinates in the file format, but VariantRecord.start is 0-based.
Indexing Requirements
Random access to specific genomic regions requires index files:
- BAM files: Require
.baiindex (create withpysam.index()) - CRAM files: Require
.craiindex - FASTA files: Require
.faiindex (create withpysam.faidx()) - VCF.gz files: Require
.tbitabix index (create withpysam.tabix_index()) - BCF files: Require
.csiindex
Without an index, use fetch(until_eof=True) for sequential reading.
File Modes
Specify format when opening files:
"rb"- Read BAM (binary)"r"- Read SAM (text)"rc"- Read CRAM"wb"- Write BAM"w"- Write SAM"wc"- Write CRAM
Performance Considerations
- Always use indexed files for random access operations
- Use
pileup()for column-wise analysis instead of repeated fetch operations - Use
count()for counting instead of iterating and counting manually - Process regions in parallel when analyzing independent genomic regions
- Close files explicitly to free resources
- Use
until_eof=Truefor sequential processing without index - Avoid multiple iterators unless necessary (use
multiple_iterators=Trueif needed)
Common Pitfalls
- Coordinate confusion: Remember 0-based vs 1-based systems in different contexts
- Missing indices: Many operations require index files—create them first
- Partial overlaps:
fetch()returns reads overlapping region boundaries, not just those fully contained - Iterator scope: Keep pileup iterator references alive to avoid "PileupProxy accessed after iterator finished" errors
- Quality score editing: Cannot modify
query_qualitiesin place after changingquery_sequence—create a copy first - Stream limitations: Only stdin/stdout are supported for streaming, not arbitrary Python file objects
- Thread safety: While GIL is released during I/O, comprehensive thread-safety hasn't been fully validated
Command-Line Tools
Pysam provides access to samtools and bcftools commands:
# Sort BAM file
pysam.samtools.sort("-o", "sorted.bam", "input.bam")
# Index BAM
pysam.samtools.index("sorted.bam")
# View specific region
pysam.samtools.view("-b", "-o", "region.bam", "input.bam", "chr1:1000-2000")
# BCF tools
pysam.bcftools.view("-O", "z", "-o", "output.vcf.gz", "input.vcf")
Error handling:
try:
pysam.samtools.sort("-o", "output.bam", "input.bam")
except pysam.SamtoolsError as e:
print(f"Error: {e}")
Resources
references/
Detailed documentation for each major capability:
-
alignment_files.md - Complete guide to SAM/BAM/CRAM operations, including AlignmentFile class, AlignedSegment attributes, fetch operations, pileup analysis, and writing alignments
-
variant_files.md - Complete guide to VCF/BCF operations, including VariantFile class, VariantRecord attributes, genotype handling, INFO/FORMAT fields, and multi-sample operations
-
sequence_files.md - Complete guide to FASTA/FASTQ operations, including FastaFile and FastxFile classes, sequence extraction, quality score handling, and tabix-indexed file access
-
common_workflows.md - Practical examples of integrated bioinformatics workflows combining multiple file types, including quality control, coverage analysis, variant validation, and sequence extraction
Getting Help
For detailed information on specific operations, refer to the appropriate reference document:
- Working with BAM files or calculating coverage →
alignment_files.md - Analyzing variants or genotypes →
variant_files.md - Extracting sequences or processing FASTQ →
sequence_files.md - Complex workflows integrating multiple file types →
common_workflows.md
Official documentation: https://pysam.readthedocs.io/
How to use pysam on Cursor
AI-first code editor with Composer
Prerequisites
Before installing skills in Cursor, ensure your development environment meets these requirements:
- ›Cursor installed and configured on your development machine
- ›Node.js version 16.0+ with npm package manager (verify with
node --version) - ›Active project directory or workspace where you want to add pysam
Execute installation command
Execute the skills CLI command in your project's root directory to begin installation:
The skills CLI fetches pysam from GitHub repository K-Dense-AI/scientific-agent-skills and configures it for Cursor.
Select Cursor when prompted
The CLI will show a list of available agents. Use arrow keys to navigate and space to select Cursor:
Verify installation
Confirm successful installation by checking the skill directory location:
Reload or restart Cursor to activate pysam. Access the skill through slash commands (e.g., /pysam) or your agent's skill management interface.
Security & Verification Notice
We perform automated surface-level scans (Gen AI Scanner, Socket, Snyk) during installation. These checks detect common vulnerabilities but do not guarantee complete security. Always review skill source code and verify the publisher's reputation before production use.
Skills execute code in your development environment. Always verify the publisher's identity, review recent commits, and test in isolated environments before production deployment.
List & Monetize Your Skill
Submit your Claude Code skill and start earning
Use Cases▌
Task Automation & Efficiency
Automate repetitive workflows and reduce manual effort
Example
Generate reports, summarize documents, draft communications
Save 3-5 hours per week on routine tasks
Knowledge Enhancement
Learn new skills, understand complex topics, get expert guidance
Example
Explain concepts, provide examples, suggest learning resources
Accelerate learning and skill development by 2x
Quality Improvement
Enhance output quality through reviews, suggestions, and refinements
Example
Review drafts, suggest improvements, catch errors
Improve work quality by 30-40% with less effort
Implementation Guide▌
Prerequisites
- ›Claude Desktop or compatible AI client with skill support
- ›Clear understanding of task or problem to solve
- ›Willingness to iterate and refine outputs
Time Estimate
15-45 minutes depending on use case complexity
Installation Steps
- 1.Install skill using provided installation command
- 2.Test with simple use case relevant to your work
- 3.Evaluate output quality and relevance
- 4.Iterate on prompts to improve results
- 5.Integrate into regular workflow if valuable
Common Pitfalls
- ⚠Expecting perfect results without iteration
- ⚠Not providing enough context in prompts
- ⚠Using skill for tasks outside its intended scope
- ⚠Accepting outputs without review and validation
Best Practices▌
✓ Do
- +Start with clear, specific prompts
- +Provide relevant context and constraints
- +Review and refine all outputs before using
- +Iterate to improve output quality
- +Document successful prompt patterns
✗ Don't
- −Don't use without understanding skill limitations
- −Don't skip validation of outputs
- −Don't share sensitive information in prompts
- −Don't expect skill to replace human judgment
💡 Pro Tips
- ★Be specific about desired format and style
- ★Ask for multiple options to choose from
- ★Request explanations to understand reasoning
- ★Combine AI efficiency with human expertise
When to Use This▌
✓ Use When
Use when skill capabilities match your task, clear ROI on time saved, and you can validate outputs. Best for repetitive tasks, learning, and quality improvement.
✗ Avoid When
Avoid when task requires deep expertise you can't validate, involves sensitive decisions, or when learning process is more valuable than speed of completion.
Learning Path▌
- 1Familiarize yourself with skill capabilities and limitations
- 2Start with low-risk, non-critical tasks
- 3Progress to more complex and valuable use cases
- 4Build expertise through regular use and experimentation
Discussion
Product Hunt–style comments (not star reviews)- No comments yet — start the thread.
Ratings
4.8★★★★★48 reviews- ★★★★★Zara Sethi· Dec 28, 2024
Registry listing for pysam matched our evaluation — installs cleanly and behaves as described in the markdown.
- ★★★★★Jin Khan· Dec 12, 2024
Useful defaults in pysam — fewer surprises than typical one-off scripts, and it plays nicely with `npx skills` flows.
- ★★★★★Zara Shah· Dec 8, 2024
We added pysam from the explainx registry; install was straightforward and the SKILL.md answered most questions upfront.
- ★★★★★Maya Kapoor· Nov 27, 2024
pysam fits our agent workflows well — practical, well scoped, and easy to wire into existing repos.
- ★★★★★Rahul Santra· Nov 19, 2024
Solid pick for teams standardizing on skills: pysam is focused, and the summary matches what you get after install.
- ★★★★★Xiao Brown· Nov 19, 2024
Keeps context tight: pysam is the kind of skill you can hand to a new teammate without a long onboarding doc.
- ★★★★★Naina Bhatia· Nov 3, 2024
I recommend pysam for anyone iterating fast on agent tooling; clear intent and a small, reviewable surface area.
- ★★★★★Meera Rahman· Oct 22, 2024
Keeps context tight: pysam is the kind of skill you can hand to a new teammate without a long onboarding doc.
- ★★★★★Kofi Torres· Oct 18, 2024
pysam has been reliable in day-to-day use. Documentation quality is above average for community skills.
- ★★★★★Pratham Ware· Oct 10, 2024
pysam is among the better-maintained entries we tried; worth keeping pinned for repeat workflows.
showing 1-10 of 48