💬 Pufferlib
高速かつ大規模な強化学習を効率的に実行するための
📺 まず動画で見る(YouTube)
▶ 【最新版】Claude(クロード)完全解説!20以上の便利機能をこの動画1本で全て解説 ↗
※ jpskill.com 編集部が参考用に選んだ動画です。動画の内容と Skill の挙動は厳密には一致しないことがあります。
📜 元の英語説明(参考)
High-performance reinforcement learning framework optimized for speed and scale. Use when you need fast parallel training, vectorized environments, multi-agent systems, or integration with game environments (Atari, Procgen, NetHack). Achieves 2-10x speedups over standard implementations. For quick prototyping or standard algorithm implementations with extensive documentation, use stable-baselines3 instead.
🇯🇵 日本人クリエイター向け解説
高速かつ大規模な強化学習を効率的に実行するための
※ jpskill.com 編集部が日本のビジネス現場向けに補足した解説です。Skill本体の挙動とは独立した参考情報です。
下記のコマンドをコピーしてターミナル(Mac/Linux)または PowerShell(Windows)に貼り付けてください。 ダウンロード → 解凍 → 配置まで全自動。
mkdir -p ~/.claude/skills && cd ~/.claude/skills && curl -L -o pufferlib.zip https://jpskill.com/download/4211.zip && unzip -o pufferlib.zip && rm pufferlib.zip
$d = "$env:USERPROFILE\.claude\skills"; ni -Force -ItemType Directory $d | Out-Null; iwr https://jpskill.com/download/4211.zip -OutFile "$d\pufferlib.zip"; Expand-Archive "$d\pufferlib.zip" -DestinationPath $d -Force; ri "$d\pufferlib.zip"
完了後、Claude Code を再起動 → 普通に「動画プロンプト作って」のように話しかけるだけで自動発動します。
💾 手動でダウンロードしたい(コマンドが難しい人向け)
- 1. 下の青いボタンを押して
pufferlib.zipをダウンロード - 2. ZIPファイルをダブルクリックで解凍 →
pufferlibフォルダができる - 3. そのフォルダを
C:\Users\あなたの名前\.claude\skills\(Win)または~/.claude/skills/(Mac)へ移動 - 4. Claude Code を再起動
⚠️ ダウンロード・利用は自己責任でお願いします。当サイトは内容・動作・安全性について責任を負いません。
🎯 このSkillでできること
下記の説明文を読むと、このSkillがあなたに何をしてくれるかが分かります。Claudeにこの分野の依頼をすると、自動で発動します。
📦 インストール方法 (3ステップ)
- 1. 上の「ダウンロード」ボタンを押して .skill ファイルを取得
- 2. ファイル名の拡張子を .skill から .zip に変えて展開(macは自動展開可)
- 3. 展開してできたフォルダを、ホームフォルダの
.claude/skills/に置く- · macOS / Linux:
~/.claude/skills/ - · Windows:
%USERPROFILE%\.claude\skills\
- · macOS / Linux:
Claude Code を再起動すれば完了。「このSkillを使って…」と話しかけなくても、関連する依頼で自動的に呼び出されます。
詳しい使い方ガイドを見る →- 最終更新
- 2026-05-17
- 取得日時
- 2026-05-17
- 同梱ファイル
- 8
💬 こう話しかけるだけ — サンプルプロンプト
- › Pufferlib で、お客様への返信文を作って
- › Pufferlib を使って、社内向けアナウンスを書いて
- › Pufferlib で、メールテンプレートを整備して
これをClaude Code に貼るだけで、このSkillが自動発動します。
📖 Claude が読む原文 SKILL.md(中身を展開)
この本文は AI(Claude)が読むための原文(英語または中国語)です。日本語訳は順次追加中。
PufferLib - High-Performance Reinforcement Learning
Overview
PufferLib is a high-performance reinforcement learning library designed for fast parallel environment simulation and training. It achieves training at millions of steps per second through optimized vectorization, native multi-agent support, and efficient PPO implementation (PuffeRL). The library provides the Ocean suite of 20+ environments and seamless integration with Gymnasium, PettingZoo, and specialized RL frameworks.
When to Use This Skill
Use this skill when:
- Training RL agents with PPO on any environment (single or multi-agent)
- Creating custom environments using the PufferEnv API
- Optimizing performance for parallel environment simulation (vectorization)
- Integrating existing environments from Gymnasium, PettingZoo, Atari, Procgen, etc.
- Developing policies with CNN, LSTM, or custom architectures
- Scaling RL to millions of steps per second for faster experimentation
- Multi-agent RL with native multi-agent environment support
Core Capabilities
1. High-Performance Training (PuffeRL)
PuffeRL is PufferLib's optimized PPO+LSTM training algorithm achieving 1M-4M steps/second.
Quick start training:
# CLI training
puffer train procgen-coinrun --train.device cuda --train.learning-rate 3e-4
# Distributed training
torchrun --nproc_per_node=4 train.py
Python training loop:
import pufferlib
from pufferlib import PuffeRL
# Create vectorized environment
env = pufferlib.make('procgen-coinrun', num_envs=256)
# Create trainer
trainer = PuffeRL(
env=env,
policy=my_policy,
device='cuda',
learning_rate=3e-4,
batch_size=32768
)
# Training loop
for iteration in range(num_iterations):
trainer.evaluate() # Collect rollouts
trainer.train() # Train on batch
trainer.mean_and_log() # Log results
For comprehensive training guidance, read references/training.md for:
- Complete training workflow and CLI options
- Hyperparameter tuning with Protein
- Distributed multi-GPU/multi-node training
- Logger integration (Weights & Biases, Neptune)
- Checkpointing and resume training
- Performance optimization tips
- Curriculum learning patterns
2. Environment Development (PufferEnv)
Create custom high-performance environments with the PufferEnv API.
Basic environment structure:
import numpy as np
from pufferlib import PufferEnv
class MyEnvironment(PufferEnv):
def __init__(self, buf=None):
super().__init__(buf)
# Define spaces
self.observation_space = self.make_space((4,))
self.action_space = self.make_discrete(4)
self.reset()
def reset(self):
# Reset state and return initial observation
return np.zeros(4, dtype=np.float32)
def step(self, action):
# Execute action, compute reward, check done
obs = self._get_observation()
reward = self._compute_reward()
done = self._is_done()
info = {}
return obs, reward, done, info
Use the template script: scripts/env_template.py provides complete single-agent and multi-agent environment templates with examples of:
- Different observation space types (vector, image, dict)
- Action space variations (discrete, continuous, multi-discrete)
- Multi-agent environment structure
- Testing utilities
For complete environment development, read references/environments.md for:
- PufferEnv API details and in-place operation patterns
- Observation and action space definitions
- Multi-agent environment creation
- Ocean suite (20+ pre-built environments)
- Performance optimization (Python to C workflow)
- Environment wrappers and best practices
- Debugging and validation techniques
3. Vectorization and Performance
Achieve maximum throughput with optimized parallel simulation.
Vectorization setup:
import pufferlib
# Automatic vectorization
env = pufferlib.make('environment_name', num_envs=256, num_workers=8)
# Performance benchmarks:
# - Pure Python envs: 100k-500k SPS
# - C-based envs: 100M+ SPS
# - With training: 400k-4M total SPS
Key optimizations:
- Shared memory buffers for zero-copy observation passing
- Busy-wait flags instead of pipes/queues
- Surplus environments for async returns
- Multiple environments per worker
For vectorization optimization, read references/vectorization.md for:
- Architecture and performance characteristics
- Worker and batch size configuration
- Serial vs multiprocessing vs async modes
- Shared memory and zero-copy patterns
- Hierarchical vectorization for large scale
- Multi-agent vectorization strategies
- Performance profiling and troubleshooting
4. Policy Development
Build policies as standard PyTorch modules with optional utilities.
Basic policy structure:
import torch.nn as nn
from pufferlib.pytorch import layer_init
class Policy(nn.Module):
def __init__(self, observation_space, action_space):
super().__init__()
# Encoder
self.encoder = nn.Sequential(
layer_init(nn.Linear(obs_dim, 256)),
nn.ReLU(),
layer_init(nn.Linear(256, 256)),
nn.ReLU()
)
# Actor and critic heads
self.actor = layer_init(nn.Linear(256, num_actions), std=0.01)
self.critic = layer_init(nn.Linear(256, 1), std=1.0)
def forward(self, observations):
features = self.encoder(observations)
return self.actor(features), self.critic(features)
For complete policy development, read references/policies.md for:
- CNN policies for image observations
- Recurrent policies with optimized LSTM (3x faster inference)
- Multi-input policies for complex observations
- Continuous action policies
- Multi-agent policies (shared vs independent parameters)
- Advanced architectures (attention, residual)
- Observation normalization and gradient clipping
- Policy debugging and testing
5. Environment Integration
Seamlessly integrate environments from popular RL frameworks.
Gymnasium integration:
import gymnasium as gym
import pufferlib
# Wrap Gymnasium environment
gym_env = gym.make('CartPole-v1')
env = pufferlib.emulate(gym_env, num_envs=256)
# Or use make directly
env = pufferlib.make('gym-CartPole-v1', num_envs=256)
PettingZoo multi-agent:
# Multi-agent environment
env = pufferlib.make('pettingzoo-knights-archers-zombies', num_envs=128)
Supported frameworks:
- Gymnasium / OpenAI Gym
- PettingZoo (parallel and AEC)
- Atari (ALE)
- Procgen
- NetHack / MiniHack
- Minigrid
- Neural MMO
- Crafter
- GPUDrive
- MicroRTS
- Griddly
- And more...
For integration details, read references/integration.md for:
- Complete integration examples for each framework
- Custom wrappers (observation, reward, frame stacking, action repeat)
- Space flattening and unflattening
- Environment registration
- Compatibility patterns
- Performance considerations
- Integration debugging
Quick Start Workflow
For Training Existing Environments
- Choose environment from Ocean suite or compatible framework
- Use
scripts/train_template.pyas starting point - Configure hyperparameters for your task
- Run training with CLI or Python script
- Monitor with Weights & Biases or Neptune
- Refer to
references/training.mdfor optimization
For Creating Custom Environments
- Start with
scripts/env_template.py - Define observation and action spaces
- Implement
reset()andstep()methods - Test environment locally
- Vectorize with
pufferlib.emulate()ormake() - Refer to
references/environments.mdfor advanced patterns - Optimize with
references/vectorization.mdif needed
For Policy Development
- Choose architecture based on observations:
- Vector observations → MLP policy
- Image observations → CNN policy
- Sequential tasks → LSTM policy
- Complex observations → Multi-input policy
- Use
layer_initfor proper weight initialization - Follow patterns in
references/policies.md - Test with environment before full training
For Performance Optimization
- Profile current throughput (steps per second)
- Check vectorization configuration (num_envs, num_workers)
- Optimize environment code (in-place ops, numpy vectorization)
- Consider C implementation for critical paths
- Use
references/vectorization.mdfor systematic optimization
Resources
scripts/
train_template.py - Complete training script template with:
- Environment creation and configuration
- Policy initialization
- Logger integration (WandB, Neptune)
- Training loop with checkpointing
- Command-line argument parsing
- Multi-GPU distributed training setup
env_template.py - Environment implementation templates:
- Single-agent PufferEnv example (grid world)
- Multi-agent PufferEnv example (cooperative navigation)
- Multiple observation/action space patterns
- Testing utilities
references/
training.md - Comprehensive training guide:
- Training workflow and CLI options
- Hyperparameter configuration
- Distributed training (multi-GPU, multi-node)
- Monitoring and logging
- Checkpointing
- Protein hyperparameter tuning
- Performance optimization
- Common training patterns
- Troubleshooting
environments.md - Environment development guide:
- PufferEnv API and characteristics
- Observation and action spaces
- Multi-agent environments
- Ocean suite environments
- Custom environment development workflow
- Python to C optimization path
- Third-party environment integration
- Wrappers and best practices
- Debugging
vectorization.md - Vectorization optimization:
- Architecture and key optimizations
- Vectorization modes (serial, multiprocessing, async)
- Worker and batch configuration
- Shared memory and zero-copy patterns
- Advanced vectorization (hierarchical, custom)
- Multi-agent vectorization
- Performance monitoring and profiling
- Troubleshooting and best practices
policies.md - Policy architecture guide:
- Basic policy structure
- CNN policies for images
- LSTM policies with optimization
- Multi-input policies
- Continuous action policies
- Multi-agent policies
- Advanced architectures (attention, residual)
- Observation processing and unflattening
- Initialization and normalization
- Debugging and testing
integration.md - Framework integration guide:
- Gymnasium integration
- PettingZoo integration (parallel and AEC)
- Third-party environments (Procgen, NetHack, Minigrid, etc.)
- Custom wrappers (observation, reward, frame stacking, etc.)
- Space conversion and unflattening
- Environment registration
- Compatibility patterns
- Performance considerations
- Debugging integration
Tips for Success
-
Start simple: Begin with Ocean environments or Gymnasium integration before creating custom environments
-
Profile early: Measure steps per second from the start to identify bottlenecks
-
Use templates:
scripts/train_template.pyandscripts/env_template.pyprovide solid starting points -
Read references as needed: Each reference file is self-contained and focused on a specific capability
-
Optimize progressively: Start with Python, profile, then optimize critical paths with C if needed
-
Leverage vectorization: PufferLib's vectorization is key to achieving high throughput
-
Monitor training: Use WandB or Neptune to track experiments and identify issues early
-
Test environments: Validate environment logic before scaling up training
-
Check existing environments: Ocean suite provides 20+ pre-built environments
-
Use proper initialization: Always use
layer_initfrompufferlib.pytorchfor policies
Common Use Cases
Training on Standard Benchmarks
# Atari
env = pufferlib.make('atari-pong', num_envs=256)
# Procgen
env = pufferlib.make('procgen-coinrun', num_envs=256)
# Minigrid
env = pufferlib.make('minigrid-empty-8x8', num_envs=256)
Multi-Agent Learning
# PettingZoo
env = pufferlib.make('pettingzoo-pistonball', num_envs=128)
# Shared policy for all agents
policy = create_policy(env.observation_space, env.action_space)
trainer = PuffeRL(env=env, policy=policy)
Custom Task Development
# Create custom environment
class MyTask(PufferEnv):
# ... implement environment ...
# Vectorize and train
env = pufferlib.emulate(MyTask, num_envs=256)
trainer = PuffeRL(env=env, policy=my_policy)
High-Performance Optimization
# Maximize throughput
env = pufferlib.make(
'my-env',
num_envs=1024, # Large batch
num_workers=16, # Many workers
envs_per_worker=64 # Optimize per worker
)
Installation
uv pip install pufferlib
Documentation
- Official docs: https://puffer.ai/docs.html
- GitHub: https://github.com/PufferAI/PufferLib
- Discord: Community support available
同梱ファイル
※ ZIPに含まれるファイル一覧。`SKILL.md` 本体に加え、参考資料・サンプル・スクリプトが入っている場合があります。
- 📄 SKILL.md (13,501 bytes)
- 📎 references/environments.md (12,489 bytes)
- 📎 references/integration.md (15,165 bytes)
- 📎 references/policies.md (19,183 bytes)
- 📎 references/training.md (8,127 bytes)
- 📎 references/vectorization.md (12,962 bytes)
- 📎 scripts/env_template.py (10,157 bytes)
- 📎 scripts/train_template.py (8,027 bytes)