pufferlib

K-Dense-AI/scientific-agent-skills · updated Jun 4, 2026

MDX-style export adds YAML metadata + attribution linking explainx.ai and this canonical listing URL.

$npx skills add https://github.com/K-Dense-AI/scientific-agent-skills --skill pufferlib
0 commentsdiscussion
summary

### Pufferlib

  • name: "pufferlib"
  • description: "High-performance reinforcement learning framework optimized for speed and scale. Use when you need fast parallel training, vectorized environments, multi-agent systems, or integration with game enviro..."
skill.md
name
pufferlib
description
High-performance reinforcement learning framework optimized for speed and scale. Use when you need fast parallel training, vectorized environments, multi-agent systems, or integration with game environments (Atari, Procgen, NetHack). Achieves 2-10x speedups over standard implementations. For quick prototyping or standard algorithm implementations with extensive documentation, use stable-baselines3 instead.
license
MIT license
metadata
version: "1.0" skill-author: K-Dense Inc.

PufferLib - High-Performance Reinforcement Learning

Overview

PufferLib is a high-performance reinforcement learning library designed for fast parallel environment simulation and training. It achieves training at millions of steps per second through optimized vectorization, native multi-agent support, and efficient PPO implementation (PuffeRL). The library provides the Ocean suite of 20+ environments and seamless integration with Gymnasium, PettingZoo, and specialized RL frameworks.

When to Use This Skill

Use this skill when:

  • Training RL agents with PPO on any environment (single or multi-agent)
  • Creating custom environments using the PufferEnv API
  • Optimizing performance for parallel environment simulation (vectorization)
  • Integrating existing environments from Gymnasium, PettingZoo, Atari, Procgen, etc.
  • Developing policies with CNN, LSTM, or custom architectures
  • Scaling RL to millions of steps per second for faster experimentation
  • Multi-agent RL with native multi-agent environment support

Core Capabilities

1. High-Performance Training (PuffeRL)

PuffeRL is PufferLib's optimized PPO+LSTM training algorithm achieving 1M-4M steps/second.

Quick start training:

# CLI training
puffer train procgen-coinrun --train.device cuda --train.learning-rate 3e-4

# Distributed training
torchrun --nproc_per_node=4 train.py

Python training loop:

import pufferlib
from pufferlib import PuffeRL

# Create vectorized environment
env = pufferlib.make('procgen-coinrun', num_envs=256)

# Create trainer
trainer = PuffeRL(
    env=env,
    policy=my_policy,
    device='cuda',
    learning_rate=3e-4,
    batch_size=32768
)

# Training loop
for iteration in range(num_iterations):
    trainer.evaluate()  # Collect rollouts
    trainer.train()     # Train on batch
    trainer.mean_and_log()  # Log results

For comprehensive training guidance, read references/training.md for:

  • Complete training workflow and CLI options
  • Hyperparameter tuning with Protein
  • Distributed multi-GPU/multi-node training
  • Logger integration (Weights & Biases, Neptune)
  • Checkpointing and resume training
  • Performance optimization tips
  • Curriculum learning patterns

2. Environment Development (PufferEnv)

Create custom high-performance environments with the PufferEnv API.

Basic environment structure:

import numpy as np
from pufferlib import PufferEnv

class MyEnvironment(PufferEnv):
    def __init__(self, buf=None):
        super().__init__(buf)

        # Define spaces
        self.observation_space = self.make_space((4,))
        self.action_space = self.make_discrete(4)

        self.reset()

    def reset(self):
        # Reset state and return initial observation
        return np.zeros(4, dtype=np.float32)

    def step(self, action):
        # Execute action, compute reward, check done
        obs = self._get_observation()
        reward = self._compute_reward()
        done = self._is_done()
        info = {}

        return obs, reward, done, info

Use the template script: scripts/env_template.py provides complete single-agent and multi-agent environment templates with examples of:

  • Different observation space types (vector, image, dict)
  • Action space variations (discrete, continuous, multi-discrete)
  • Multi-agent environment structure
  • Testing utilities

For complete environment development, read references/environments.md for:

  • PufferEnv API details and in-place operation patterns
  • Observation and action space definitions
  • Multi-agent environment creation
  • Ocean suite (20+ pre-built environments)
  • Performance optimization (Python to C workflow)
  • Environment wrappers and best practices
  • Debugging and validation techniques

3. Vectorization and Performance

Achieve maximum throughput with optimized parallel simulation.

Vectorization setup:

import pufferlib

# Automatic vectorization
env = pufferlib.make('environment_name', num_envs=256, num_workers=8)

# Performance benchmarks:
# - Pure Python envs: 100k-500k SPS
# - C-based envs: 100M+ SPS
# - With training: 400k-4M total SPS

Key optimizations:

  • Shared memory buffers for zero-copy observation passing
  • Busy-wait flags instead of pipes/queues
  • Surplus environments for async returns
  • Multiple environments per worker

For vectorization optimization, read references/vectorization.md for:

  • Architecture and performance characteristics
  • Worker and batch size configuration
  • Serial vs multiprocessing vs async modes
  • Shared memory and zero-copy patterns
  • Hierarchical vectorization for large scale
  • Multi-agent vectorization strategies
  • Performance profiling and troubleshooting

4. Policy Development

Build policies as standard PyTorch modules with optional utilities.

Basic policy structure:

import torch.nn as nn
from pufferlib.pytorch import layer_init

class Policy(nn.Module):
    def __init__(self, observation_space, action_space):
        super().__init__()

        # Encoder
        self.encoder = nn.Sequential(
            layer_init(nn.Linear(obs_dim, 256)),
            nn.ReLU(),
            layer_init(nn.Linear(256, 256)),
            nn.ReLU()
        )

        # Actor and critic heads
        self.actor = layer_init(nn.Linear(256, num_actions), std=0.01)
        self.critic = layer_init(nn.Linear(256, 1), std=1.0)

    def forward(self, observations):
        features = self.encoder(observations)
        return self.actor(features), self.critic(features)

For complete policy development, read references/policies.md for:

  • CNN policies for image observations
  • Recurrent policies with optimized LSTM (3x faster inference)
  • Multi-input policies for complex observations
  • Continuous action policies
  • Multi-agent policies (shared vs independent parameters)
  • Advanced architectures (attention, residual)
  • Observation normalization and gradient clipping
  • Policy debugging and testing

5. Environment Integration

Seamlessly integrate environments from popular RL frameworks.

Gymnasium integration:

import gymnasium as gym
import pufferlib

# Wrap Gymnasium environment
gym_env = gym.make('CartPole-v1')
env = pufferlib.emulate(gym_env, num_envs=256)

# Or use make directly
env = pufferlib.make('gym-CartPole-v1', num_envs=256)

PettingZoo multi-agent:

# Multi-agent environment
env = pufferlib.make('pettingzoo-knights-archers-zombies', num_envs=128)

Supported frameworks:

  • Gymnasium / OpenAI Gym
  • PettingZoo (parallel and AEC)
  • Atari (ALE)
  • Procgen
  • NetHack / MiniHack
  • Minigrid
  • Neural MMO
  • Crafter
  • GPUDrive
  • MicroRTS
  • Griddly
  • And more...

For integration details, read references/integration.md for:

  • Complete integration examples for each framework
  • Custom wrappers (observation, reward, frame stacking, action repeat)
  • Space flattening and unflattening
  • Environment registration
  • Compatibility patterns
  • Performance considerations
  • Integration debugging

Quick Start Workflow

For Training Existing Environments

  1. Choose environment from Ocean suite or compatible framework
  2. Use scripts/train_template.py as starting point
  3. Configure hyperparameters for your task
  4. Run training with CLI or Python script
  5. Monitor with Weights & Biases or Neptune
  6. Refer to references/training.md for optimization

For Creating Custom Environments

  1. Start with scripts/env_template.py
  2. Define observation and action spaces
  3. Implement reset() and step() methods
  4. Test environment locally
  5. Vectorize with pufferlib.emulate() or make()
  6. Refer to references/environments.md for advanced patterns
  7. Optimize with references/vectorization.md if needed

For Policy Development

  1. Choose architecture based on observations:
    • Vector observations → MLP policy
    • Image observations → CNN policy
    • Sequential tasks → LSTM policy
    • Complex observations → Multi-input policy
  2. Use layer_init for proper weight initialization
  3. Follow patterns in references/policies.md
  4. Test with environment before full training

For Performance Optimization

  1. Profile current throughput (steps per second)
  2. Check vectorization configuration (num_envs, num_workers)
  3. Optimize environment code (in-place ops, numpy vectorization)
  4. Consider C implementation for critical paths
  5. Use references/vectorization.md for systematic optimization

Resources

scripts/

train_template.py - Complete training script template with:

  • Environment creation and configuration
  • Policy initialization
  • Logger integration (WandB, Neptune)
  • Training loop with checkpointing
  • Command-line argument parsing
  • Multi-GPU distributed training setup

env_template.py - Environment implementation templates:

  • Single-agent PufferEnv example (grid world)
  • Multi-agent PufferEnv example (cooperative navigation)
  • Multiple observation/action space patterns
  • Testing utilities

references/

training.md - Comprehensive training guide:

  • Training workflow and CLI options
  • Hyperparameter configuration
  • Distributed training (multi-GPU, multi-node)
  • Monitoring and logging
  • Checkpointing
  • Protein hyperparameter tuning
  • Performance optimization
  • Common training patterns
  • Troubleshooting

environments.md - Environment development guide:

  • PufferEnv API and characteristics
  • Observation and action spaces
  • Multi-agent environments
  • Ocean suite environments
  • Custom environment development workflow
  • Python to C optimization path
  • Third-party environment integration
  • Wrappers and best practices
  • Debugging

vectorization.md - Vectorization optimization:

  • Architecture and key optimizations
  • Vectorization modes (serial, multiprocessing, async)
  • Worker and batch configuration
  • Shared memory and zero-copy patterns
  • Advanced vectorization (hierarchical, custom)
  • Multi-agent vectorization
  • Performance monitoring and profiling
  • Troubleshooting and best practices

policies.md - Policy architecture guide:

  • Basic policy structure
  • CNN policies for images
  • LSTM policies with optimization
  • Multi-input policies
  • Continuous action policies
  • Multi-agent policies
  • Advanced architectures (attention, residual)
  • Observation processing and unflattening
  • Initialization and normalization
  • Debugging and testing

integration.md - Framework integration guide:

  • Gymnasium integration
  • PettingZoo integration (parallel and AEC)
  • Third-party environments (Procgen, NetHack, Minigrid, etc.)
  • Custom wrappers (observation, reward, frame stacking, etc.)
  • Space conversion and unflattening
  • Environment registration
  • Compatibility patterns
  • Performance considerations
  • Debugging integration

Tips for Success

  1. Start simple: Begin with Ocean environments or Gymnasium integration before creating custom environments

  2. Profile early: Measure steps per second from the start to identify bottlenecks

  3. Use templates: scripts/train_template.py and scripts/env_template.py provide solid starting points

  4. Read references as needed: Each reference file is self-contained and focused on a specific capability

  5. Optimize progressively: Start with Python, profile, then optimize critical paths with C if needed

  6. Leverage vectorization: PufferLib's vectorization is key to achieving high throughput

  7. Monitor training: Use WandB or Neptune to track experiments and identify issues early

  8. Test environments: Validate environment logic before scaling up training

  9. Check existing environments: Ocean suite provides 20+ pre-built environments

  10. Use proper initialization: Always use layer_init from pufferlib.pytorch for policies

Common Use Cases

Training on Standard Benchmarks

# Atari
env = pufferlib.make('atari-pong', num_envs=256)

# Procgen
env = pufferlib.make('procgen-coinrun', num_envs=256)

# Minigrid
env = pufferlib.make('minigrid-empty-8x8', num_envs=256)

Multi-Agent Learning

# PettingZoo
env = pufferlib.make('pettingzoo-pistonball', num_envs=128)

# Shared policy for all agents
policy = create_policy(env.observation_space, env.action_space)
trainer = PuffeRL(env=env, policy=policy)

Custom Task Development

# Create custom environment
class MyTask(PufferEnv):
    # ... implement environment ...

# Vectorize and train
env = pufferlib.emulate(MyTask, num_envs=256)
trainer = PuffeRL(env=env, policy=my_policy)

High-Performance Optimization

# Maximize throughput
env = pufferlib.make(
    'my-env',
    num_envs=1024,      # Large batch
    num_workers=16,     # Many workers
    envs_per_worker=64  # Optimize per worker
)

Installation

uv pip install pufferlib

Documentation

how to use pufferlib

How to use pufferlib on Cursor

AI-first code editor with Composer

1

Prerequisites

Before installing skills in Cursor, ensure your development environment meets these requirements:

  • Cursor installed and configured on your development machine
  • Node.js version 16.0+ with npm package manager (verify with node --version)
  • Active project directory or workspace where you want to add pufferlib
2

Execute installation command

Execute the skills CLI command in your project's root directory to begin installation:

$npx skills add https://github.com/K-Dense-AI/scientific-agent-skills --skill pufferlib

The skills CLI fetches pufferlib from GitHub repository K-Dense-AI/scientific-agent-skills and configures it for Cursor.

3

Select Cursor when prompted

The CLI will show a list of available agents. Use arrow keys to navigate and space to select Cursor:

◆ Which agents do you want to install to?
│ ── Universal (.agents/skills) ── always included ────
│ • Amp
│ • Antigravity
│ • Cline
│ • Codex
│ ●Cursor(selected)
│ • Cursor
│ • Windsurf
4

Verify installation

Confirm successful installation by checking the skill directory location:

.cursor/skills/pufferlib

Reload or restart Cursor to activate pufferlib. Access the skill through slash commands (e.g., /pufferlib) or your agent's skill management interface.

Security & Verification Notice

We perform automated surface-level scans (Gen AI Scanner, Socket, Snyk) during installation. These checks detect common vulnerabilities but do not guarantee complete security. Always review skill source code and verify the publisher's reputation before production use.

Skills execute code in your development environment. Always verify the publisher's identity, review recent commits, and test in isolated environments before production deployment.

List & Monetize Your Skill

Submit your Claude Code skill and start earning

GET_STARTED →

Use Cases

Task Automation & Efficiency

Automate repetitive workflows and reduce manual effort

Example

Generate reports, summarize documents, draft communications

Save 3-5 hours per week on routine tasks

Knowledge Enhancement

Learn new skills, understand complex topics, get expert guidance

Example

Explain concepts, provide examples, suggest learning resources

Accelerate learning and skill development by 2x

Quality Improvement

Enhance output quality through reviews, suggestions, and refinements

Example

Review drafts, suggest improvements, catch errors

Improve work quality by 30-40% with less effort

Implementation Guide

Prerequisites

  • Claude Desktop or compatible AI client with skill support
  • Clear understanding of task or problem to solve
  • Willingness to iterate and refine outputs

Time Estimate

15-45 minutes depending on use case complexity

Installation Steps

  1. 1.Install skill using provided installation command
  2. 2.Test with simple use case relevant to your work
  3. 3.Evaluate output quality and relevance
  4. 4.Iterate on prompts to improve results
  5. 5.Integrate into regular workflow if valuable

Common Pitfalls

  • Expecting perfect results without iteration
  • Not providing enough context in prompts
  • Using skill for tasks outside its intended scope
  • Accepting outputs without review and validation

Best Practices

✓ Do

  • +Start with clear, specific prompts
  • +Provide relevant context and constraints
  • +Review and refine all outputs before using
  • +Iterate to improve output quality
  • +Document successful prompt patterns

✗ Don't

  • Don't use without understanding skill limitations
  • Don't skip validation of outputs
  • Don't share sensitive information in prompts
  • Don't expect skill to replace human judgment

💡 Pro Tips

  • Be specific about desired format and style
  • Ask for multiple options to choose from
  • Request explanations to understand reasoning
  • Combine AI efficiency with human expertise

When to Use This

✓ Use When

Use when skill capabilities match your task, clear ROI on time saved, and you can validate outputs. Best for repetitive tasks, learning, and quality improvement.

✗ Avoid When

Avoid when task requires deep expertise you can't validate, involves sensitive decisions, or when learning process is more valuable than speed of completion.

Learning Path

  1. 1Familiarize yourself with skill capabilities and limitations
  2. 2Start with low-risk, non-critical tasks
  3. 3Progress to more complex and valuable use cases
  4. 4Build expertise through regular use and experimentation

Discussion

Product Hunt–style comments (not star reviews)
  • No comments yet — start the thread.
general reviews

Ratings

4.627 reviews
  • Chaitanya Patil· Dec 28, 2024

    Registry listing for pufferlib matched our evaluation — installs cleanly and behaves as described in the markdown.

  • Michael Diallo· Dec 16, 2024

    pufferlib has been reliable in day-to-day use. Documentation quality is above average for community skills.

  • Chinedu Chawla· Dec 8, 2024

    Keeps context tight: pufferlib is the kind of skill you can hand to a new teammate without a long onboarding doc.

  • Yuki Thompson· Nov 27, 2024

    We added pufferlib from the explainx registry; install was straightforward and the SKILL.md answered most questions upfront.

  • Piyush G· Nov 19, 2024

    pufferlib reduced setup friction for our internal harness; good balance of opinion and flexibility.

  • Ren Zhang· Nov 7, 2024

    Solid pick for teams standardizing on skills: pufferlib is focused, and the summary matches what you get after install.

  • Diya Bansal· Nov 3, 2024

    pufferlib is among the better-maintained entries we tried; worth keeping pinned for repeat workflows.

  • Ren Smith· Oct 26, 2024

    We added pufferlib from the explainx registry; install was straightforward and the SKILL.md answered most questions upfront.

  • Dev Abebe· Oct 22, 2024

    Useful defaults in pufferlib — fewer surprises than typical one-off scripts, and it plays nicely with `npx skills` flows.

  • Yuki Agarwal· Oct 18, 2024

    Solid pick for teams standardizing on skills: pufferlib is focused, and the summary matches what you get after install.

showing 1-10 of 27

1 / 3