anndata

K-Dense-AI/scientific-agent-skills · updated Jun 4, 2026

MDX-style export adds YAML metadata + attribution linking explainx.ai and this canonical listing URL.

$npx skills add https://github.com/K-Dense-AI/scientific-agent-skills --skill anndata
0 commentsdiscussion
summary

### Anndata

  • name: "anndata"
  • description: "Data structure for annotated matrices in single-cell analysis. Use when working with .h5ad files or integrating with the scverse ecosystem. This is the data format skill—for analysis workflows use sca..."
skill.md
name
anndata
description
Data structure for annotated matrices in single-cell analysis. Use when working with .h5ad files or integrating with the scverse ecosystem. This is the data format skill—for analysis workflows use scanpy; for probabilistic models use scvi-tools; for population-scale queries use cellxgene-census.
license
BSD-3-Clause license
metadata
version: "1.0" skill-author: K-Dense Inc.

AnnData

Overview

AnnData is a Python package for handling annotated data matrices, storing experimental measurements (X) alongside observation metadata (obs), variable metadata (var), and multi-dimensional annotations (obsm, varm, obsp, varp, uns). Originally designed for single-cell genomics through Scanpy, it now serves as a general-purpose framework for any annotated data requiring efficient storage, manipulation, and analysis.

When to Use This Skill

Use this skill when:

  • Creating, reading, or writing AnnData objects
  • Working with h5ad, zarr, or other genomics data formats
  • Performing single-cell RNA-seq analysis
  • Managing large datasets with sparse matrices or backed mode
  • Concatenating multiple datasets or experimental batches
  • Subsetting, filtering, or transforming annotated data
  • Integrating with scanpy, scvi-tools, or other scverse ecosystem tools

Installation

Requires Python 3.11+ (anndata 0.11+ dropped 3.9). Current stable release: 0.12.x.

uv pip install anndata

# Lazy I/O and dask-backed operations
uv pip install "anndata[dask,lazy]"

# Development / docs (contributors)
uv pip install "anndata[dev,test,doc]"

Quick Start

Creating an AnnData object

import anndata as ad
import numpy as np
import pandas as pd

# Minimal creation
X = np.random.rand(100, 2000)  # 100 cells × 2000 genes
adata = ad.AnnData(X)

# With metadata
obs = pd.DataFrame({
    'cell_type': ['T cell', 'B cell'] * 50,
    'sample': ['A', 'B'] * 50
}, index=[f'cell_{i}' for i in range(100)])

var = pd.DataFrame({
    'gene_name': [f'Gene_{i}' for i in range(2000)]
}, index=[f'ENSG{i:05d}' for i in range(2000)])

adata = ad.AnnData(X=X, obs=obs, var=var)

Reading data

# Native formats (read_h5ad/read_zarr remain at top-level)
adata = ad.read_h5ad('data.h5ad')
adata = ad.read_h5ad('large_data.h5ad', backed='r')  # lazy load for large files
adata = ad.read_zarr('data.zarr')

# Other formats: prefer anndata.io (top-level imports are deprecated)
from anndata.io import read_csv, read_loom, read_mtx

adata = read_csv('data.csv')
adata = read_loom('data.loom')

# 10X Genomics: use scanpy (not anndata) — see scanpy skill
import scanpy as sc
adata = sc.read_10x_h5('filtered_feature_bc_matrix.h5')
adata = sc.read_10x_mtx('filtered_feature_bc_matrix/')

Writing data

# Write h5ad file
adata.write_h5ad('output.h5ad')

# Write with compression
adata.write_h5ad('output.h5ad', compression='gzip')

# Write other formats
adata.write_zarr('output.zarr')
adata.write_csvs('output_dir/')

Basic operations

# Subset by conditions
t_cells = adata[adata.obs['cell_type'] == 'T cell']

# Subset by indices
subset = adata[0:50, 0:100]

# Add metadata
adata.obs['quality_score'] = np.random.rand(adata.n_obs)
adata.var['highly_variable'] = np.random.rand(adata.n_vars) > 0.8

# Access dimensions
print(f"{adata.n_obs} observations × {adata.n_vars} variables")

Core Capabilities

1. Data Structure

Understand the AnnData object structure including X, obs, var, layers, obsm, varm, obsp, varp, uns, and raw components.

See: references/data_structure.md for comprehensive information on:

  • Core components (X, obs, var, layers, obsm, varm, obsp, varp, uns, raw)
  • Creating AnnData objects from various sources
  • Accessing and manipulating data components
  • Memory-efficient practices

2. Input/Output Operations

Read and write data in various formats with support for compression, backed mode, and cloud storage.

See: references/io_operations.md for details on:

  • Native formats (h5ad, zarr)
  • Alternative formats (CSV, MTX, Loom, 10X, Excel)
  • Backed mode for large datasets
  • Remote data access
  • Format conversion
  • Performance optimization

Common commands:

from anndata.io import read_mtx

# Read/write h5ad
adata = ad.read_h5ad('data.h5ad', backed='r')
adata.write_h5ad('output.h5ad', compression='gzip')

# 10X Genomics (via scanpy)
import scanpy as sc
adata = sc.read_10x_h5('filtered_feature_bc_matrix.h5')

# Read MTX format
adata = read_mtx('matrix.mtx').T

3. Concatenation

Combine multiple AnnData objects along observations or variables with flexible join strategies.

See: references/concatenation.md for comprehensive coverage of:

  • Basic concatenation (axis=0 for observations, axis=1 for variables)
  • Join types (inner, outer)
  • Merge strategies (same, unique, first, only)
  • Tracking data sources with labels
  • Lazy concatenation (AnnCollection)
  • On-disk concatenation for large datasets

Common commands:

# Concatenate observations (combine samples)
adata = ad.concat(
    [adata1, adata2, adata3],
    axis=0,
    join='inner',
    label='batch',
    keys=['batch1', 'batch2', 'batch3']
)

# Concatenate variables (combine modalities)
adata = ad.concat([adata_rna, adata_protein], axis=1)

# Lazy concatenation
from anndata.experimental import AnnCollection
collection = AnnCollection(
    ['data1.h5ad', 'data2.h5ad'],
    join_obs='outer',
    label='dataset'
)

4. Data Manipulation

Transform, subset, filter, and reorganize data efficiently.

See: references/manipulation.md for detailed guidance on:

  • Subsetting (by indices, names, boolean masks, metadata conditions)
  • Transposition
  • Copying (full copies vs views)
  • Renaming (observations, variables, categories)
  • Type conversions (strings to categoricals, sparse/dense)
  • Adding/removing data components
  • Reordering
  • Quality control filtering

Common commands:

# Subset by metadata
filtered = adata[adata.obs['quality_score'] > 0.8]
hv_genes = adata[:, adata.var['highly_variable']]

# Transpose
adata_T = adata.T

# Copy vs view
view = adata[0:100, :]  # View (lightweight reference)
copy = adata[0:100, :].copy()  # Independent copy

# Convert strings to categoricals
adata.strings_to_categoricals()

5. Best Practices

Follow recommended patterns for memory efficiency, performance, and reproducibility.

See: references/best_practices.md for guidelines on:

  • Memory management (sparse matrices, categoricals, backed mode)
  • Views vs copies
  • Data storage optimization
  • Performance optimization
  • Working with raw data
  • Metadata management
  • Reproducibility
  • Error handling
  • Integration with other tools
  • Common pitfalls and solutions

Key recommendations:

# Use sparse matrices for sparse data
from scipy.sparse import csr_matrix
adata.X = csr_matrix(adata.X)

# Convert strings to categoricals
adata.strings_to_categoricals()

# Use backed mode for large files
adata = ad.read_h5ad('large.h5ad', backed='r')

# Store raw before filtering
adata.raw = adata.copy()
adata = adata[:, adata.var['highly_variable']]

Integration with Scverse Ecosystem

AnnData serves as the foundational data structure for the scverse ecosystem:

Scanpy (Single-cell analysis)

import scanpy as sc

# Preprocessing
sc.pp.filter_cells(adata, min_genes=200)
sc.pp.normalize_total(adata, target_sum=1e4)
sc.pp.log1p(adata)
sc.pp.highly_variable_genes(adata, n_top_genes=2000)

# Dimensionality reduction
sc.pp.pca(adata, n_comps=50)
sc.pp.neighbors(adata, n_neighbors=15)
sc.tl.umap(adata)
sc.tl.leiden(adata)

# Visualization
sc.pl.umap(adata, color=['cell_type', 'leiden'])

Muon (Multimodal data)

import muon as mu

# Combine RNA and protein data
mdata = mu.MuData({'rna': adata_rna, 'protein': adata_protein})

PyTorch integration

from anndata.experimental import AnnLoader

# Create DataLoader for deep learning
dataloader = AnnLoader(adata, batch_size=128, shuffle=True)

for batch in dataloader:
    X = batch.X
    # Train model

Common Workflows

Single-cell RNA-seq analysis

import anndata as ad
import scanpy as sc

# 1. Load data (10X via scanpy; anndata handles h5ad/zarr natively)
adata = sc.read_10x_h5('filtered_feature_bc_matrix.h5')

# 2. Quality control
adata.obs['n_genes'] = (adata.X > 0).sum(axis=1)
adata.obs['n_counts'] = adata.X.sum(axis=1)
adata = adata[adata.obs['n_genes'] > 200]
adata = adata[adata.obs['n_counts'] < 50000]

# 3. Store raw
adata.raw = adata.copy()

# 4. Normalize and filter
sc.pp.normalize_total(adata, target_sum=1e4)
sc.pp.log1p(adata)
sc.pp.highly_variable_genes(adata, n_top_genes=2000)
adata = adata[:, adata.var['highly_variable']]

# 5. Save processed data
adata.write_h5ad('processed.h5ad')

Batch integration

# Load multiple batches
adata1 = ad.read_h5ad('batch1.h5ad')
adata2 = ad.read_h5ad('batch2.h5ad')
adata3 = ad.read_h5ad('batch3.h5ad')

# Concatenate with batch labels
adata = ad.concat(
    [adata1, adata2, adata3],
    label='batch',
    keys=['batch1', 'batch2', 'batch3'],
    join='inner'
)

# Apply batch correction
import scanpy as sc
sc.pp.combat(adata, key='batch')

# Continue analysis
sc.pp.pca(adata)
sc.pp.neighbors(adata)
sc.tl.umap(adata)

Working with large datasets

# Open in backed mode
adata = ad.read_h5ad('100GB_dataset.h5ad', backed='r')

# Filter based on metadata (no data loading)
high_quality = adata[adata.obs['quality_score'] > 0.8]

# Load filtered subset
adata_subset = high_quality.to_memory()

# Process subset
process(adata_subset)

# Or process in chunks
chunk_size = 1000
for i in range(0, adata.n_obs, chunk_size):
    chunk = adata[i:i+chunk_size, :].to_memory()
    process(chunk)

Troubleshooting

Out of memory errors

Use backed mode or convert to sparse matrices:

# Backed mode
adata = ad.read_h5ad('file.h5ad', backed='r')

# Sparse matrices
from scipy.sparse import csr_matrix
adata.X = csr_matrix(adata.X)

Slow file reading

Use compression and appropriate formats:

# Optimize for storage
adata.strings_to_categoricals()
adata.write_h5ad('file.h5ad', compression='gzip')

# Use Zarr for cloud storage (v3 optional since anndata 0.12)
import anndata
anndata.settings.zarr_write_format = 3  # default is 2
adata.write_zarr('file.zarr', chunks=(1000, 1000))

Index alignment issues

Always align external data on index:

# Wrong
adata.obs['new_col'] = external_data['values']

# Correct
adata.obs['new_col'] = external_data.set_index('cell_id').loc[adata.obs_names, 'values']

Additional Resources

how to use anndata

How to use anndata on Cursor

AI-first code editor with Composer

1

Prerequisites

Before installing skills in Cursor, ensure your development environment meets these requirements:

  • Cursor installed and configured on your development machine
  • Node.js version 16.0+ with npm package manager (verify with node --version)
  • Active project directory or workspace where you want to add anndata
2

Execute installation command

Execute the skills CLI command in your project's root directory to begin installation:

$npx skills add https://github.com/K-Dense-AI/scientific-agent-skills --skill anndata

The skills CLI fetches anndata from GitHub repository K-Dense-AI/scientific-agent-skills and configures it for Cursor.

3

Select Cursor when prompted

The CLI will show a list of available agents. Use arrow keys to navigate and space to select Cursor:

◆ Which agents do you want to install to?
│ ── Universal (.agents/skills) ── always included ────
│ • Amp
│ • Antigravity
│ • Cline
│ • Codex
│ ●Cursor(selected)
│ • Cursor
│ • Windsurf
4

Verify installation

Confirm successful installation by checking the skill directory location:

.cursor/skills/anndata

Reload or restart Cursor to activate anndata. Access the skill through slash commands (e.g., /anndata) or your agent's skill management interface.

Security & Verification Notice

We perform automated surface-level scans (Gen AI Scanner, Socket, Snyk) during installation. These checks detect common vulnerabilities but do not guarantee complete security. Always review skill source code and verify the publisher's reputation before production use.

Skills execute code in your development environment. Always verify the publisher's identity, review recent commits, and test in isolated environments before production deployment.

List & Monetize Your Skill

Submit your Claude Code skill and start earning

GET_STARTED →

Use Cases

Task Automation & Efficiency

Automate repetitive workflows and reduce manual effort

Example

Generate reports, summarize documents, draft communications

Save 3-5 hours per week on routine tasks

Knowledge Enhancement

Learn new skills, understand complex topics, get expert guidance

Example

Explain concepts, provide examples, suggest learning resources

Accelerate learning and skill development by 2x

Quality Improvement

Enhance output quality through reviews, suggestions, and refinements

Example

Review drafts, suggest improvements, catch errors

Improve work quality by 30-40% with less effort

Implementation Guide

Prerequisites

  • Claude Desktop or compatible AI client with skill support
  • Clear understanding of task or problem to solve
  • Willingness to iterate and refine outputs

Time Estimate

15-45 minutes depending on use case complexity

Installation Steps

  1. 1.Install skill using provided installation command
  2. 2.Test with simple use case relevant to your work
  3. 3.Evaluate output quality and relevance
  4. 4.Iterate on prompts to improve results
  5. 5.Integrate into regular workflow if valuable

Common Pitfalls

  • Expecting perfect results without iteration
  • Not providing enough context in prompts
  • Using skill for tasks outside its intended scope
  • Accepting outputs without review and validation

Best Practices

✓ Do

  • +Start with clear, specific prompts
  • +Provide relevant context and constraints
  • +Review and refine all outputs before using
  • +Iterate to improve output quality
  • +Document successful prompt patterns

✗ Don't

  • Don't use without understanding skill limitations
  • Don't skip validation of outputs
  • Don't share sensitive information in prompts
  • Don't expect skill to replace human judgment

💡 Pro Tips

  • Be specific about desired format and style
  • Ask for multiple options to choose from
  • Request explanations to understand reasoning
  • Combine AI efficiency with human expertise

When to Use This

✓ Use When

Use when skill capabilities match your task, clear ROI on time saved, and you can validate outputs. Best for repetitive tasks, learning, and quality improvement.

✗ Avoid When

Avoid when task requires deep expertise you can't validate, involves sensitive decisions, or when learning process is more valuable than speed of completion.

Learning Path

  1. 1Familiarize yourself with skill capabilities and limitations
  2. 2Start with low-risk, non-critical tasks
  3. 3Progress to more complex and valuable use cases
  4. 4Build expertise through regular use and experimentation

Discussion

Product Hunt–style comments (not star reviews)
  • No comments yet — start the thread.
general reviews

Ratings

4.652 reviews
  • Fatima Martin· Dec 28, 2024

    anndata fits our agent workflows well — practical, well scoped, and easy to wire into existing repos.

  • James Yang· Dec 28, 2024

    Useful defaults in anndata — fewer surprises than typical one-off scripts, and it plays nicely with `npx skills` flows.

  • Camila White· Dec 24, 2024

    We added anndata from the explainx registry; install was straightforward and the SKILL.md answered most questions upfront.

  • Fatima Harris· Dec 20, 2024

    anndata reduced setup friction for our internal harness; good balance of opinion and flexibility.

  • Emma Robinson· Nov 19, 2024

    anndata is among the better-maintained entries we tried; worth keeping pinned for repeat workflows.

  • Tariq Sharma· Nov 15, 2024

    Useful defaults in anndata — fewer surprises than typical one-off scripts, and it plays nicely with `npx skills` flows.

  • Emma Ghosh· Nov 11, 2024

    Registry listing for anndata matched our evaluation — installs cleanly and behaves as described in the markdown.

  • Yash Thakker· Nov 3, 2024

    Keeps context tight: anndata is the kind of skill you can hand to a new teammate without a long onboarding doc.

  • Dhruvi Jain· Oct 22, 2024

    I recommend anndata for anyone iterating fast on agent tooling; clear intent and a small, reviewable surface area.

  • Fatima Sethi· Oct 10, 2024

    Solid pick for teams standardizing on skills: anndata is focused, and the summary matches what you get after install.

showing 1-10 of 52

1 / 6