🛠️ 開発・MCP コミュニティ

proteinmpnn

ProteinMPNNは、RFdiffusionで作成した構造に対し、タンパク質の配列を設計・再設計したり、特定の部位を固定しつつ最適化したり、発現や安定性を高める配列をデザインするSkill。

📜 元の英語説明(参考)

Design protein sequences using ProteinMPNN inverse folding. Use this skill when: (1) Designing sequences for RFdiffusion backbones, (2) Redesigning existing protein sequences, (3) Fixing specific residues while designing others, (4) Optimizing sequences for expression or stability, (5) Multi-state or negative design. For backbone generation, use rfdiffusion or bindcraft. For ligand-aware design, use ligandmpnn. For solubility optimization, use solublempnn.

🇯🇵 日本人クリエイター向け解説

一言でいうと

※ jpskill.com 編集部が日本のビジネス現場向けに補足した解説です。Skill本体の挙動とは独立した参考情報です。

⚡ おすすめ: コマンド1行でインストール(60秒)

下記のコマンドをコピーしてターミナル(Mac/Linux)または PowerShell(Windows)に貼り付けてください。ダウンロード → 解凍 → 配置まで全自動。

🍎 Mac / 🐧 Linux

mkdir -p ~/.claude/skills && cd ~/.claude/skills && curl -L -o proteinmpnn.zip https://jpskill.com/download/9553.zip && unzip -o proteinmpnn.zip && rm proteinmpnn.zip

🪟 Windows (PowerShell)

$d = "$env:USERPROFILE\.claude\skills"; ni -Force -ItemType Directory $d | Out-Null; iwr https://jpskill.com/download/9553.zip -OutFile "$d\proteinmpnn.zip"; Expand-Archive "$d\proteinmpnn.zip" -DestinationPath $d -Force; ri "$d\proteinmpnn.zip"

完了後、Claude Code を再起動 → 普通に「動画プロンプト作って」のように話しかけるだけで自動発動します。

💾 手動でダウンロードしたい(コマンドが難しい人向け)

1. 下の青いボタンを押して proteinmpnn.zip をダウンロード
2. ZIPファイルをダブルクリックで解凍 → proteinmpnn フォルダができる
3. そのフォルダを C:\Users\あなたの名前\.claude\skills\(Win)または ~/.claude/skills/(Mac)へ移動
4. Claude Code を再起動

⬇ .zip でダウンロード(推奨) ⬇ .skill 形式(上級者用) 元のソース ↗

⚠️ ダウンロード・利用は自己責任でお願いします。当サイトは内容・動作・安全性について責任を負いません。

🎯 このSkillでできること

下記の説明文を読むと、このSkillがあなたに何をしてくれるかが分かります。Claudeにこの分野の依頼をすると、自動で発動します。

📦 インストール方法 (3ステップ)

1. 上の「ダウンロード」ボタンを押して .skill ファイルを取得
2. ファイル名の拡張子を .skill から .zip に変えて展開(macは自動展開可)
3. 展開してできたフォルダを、ホームフォルダの .claude/skills/ に置く
- · macOS / Linux: ~/.claude/skills/
- · Windows: %USERPROFILE%\.claude\skills\

Claude Code を再起動すれば完了。「このSkillを使って…」と話しかけなくても、関連する依頼で自動的に呼び出されます。

詳しい使い方ガイドを見る →

最終更新: 2026-05-18
取得日時: 2026-05-18
同梱ファイル: 1

📖 Skill本文(日本語訳)

※ 原文(英語/中国語)を Gemini で日本語化したものです。Claude 自身は原文を読みます。誤訳がある場合は原文をご確認ください。

ProteinMPNN配列設計

前提条件

要件	最小	推奨
Python	3.8+	3.10
CUDA	11.0+	11.7+
GPU VRAM	8GB	16GB (T4)
RAM	8GB	16GB

実行方法

初回ですか？ Modalとbiomodalsのセットアップについては、インストールガイドを参照してください。

オプション1：ローカルインストール（推奨）

git clone https://github.com/dauparas/ProteinMPNN.git
cd ProteinMPNN

python protein_mpnn_run.py \
  --pdb_path backbone.pdb \
  --out_folder output/ \
  --num_seq_per_target 16 \
  --sampling_temp "0.1"

GPU: T4 (16GB) で十分 | 時間: ~50-100配列/分

オプション2：Modal（LigandMPNNラッパー経由）

cd biomodals
modal run modal_ligandmpnn.py \
  --pdb-path backbone.pdb \
  --num-seq-per-target 16

注：LigandMPNNにはProteinMPNNの機能が含まれています。

設定スキーマ

コアパラメータ

パラメータ	デフォルト	範囲	説明
`--pdb_path`	必須	path	単一のPDB入力
`--pdb_path_chains`	all	A,B	設計する鎖（カンマ区切り）
`--out_folder`	必須	path	出力ディレクトリ
`--num_seq_per_target`	1	1-1000	構造ごとの配列数
`--sampling_temp`	"0.1"	"0.0001-1.0"	温度（文字列！）
`--seed`	0	int	乱数シード
`--batch_size`	1	1-32	バッチサイズ

温度ガイド

0.1  -> 低多様性、高回収率（本番）
0.2  -> 中程度の多様性（デフォルト）
0.3  -> より高い多様性（探索）
0.5+ -> 非常に多様、低品質

重要: 温度はfloatではなく、文字列として渡す必要があります。

よくある間違い

温度パラメータ

✅ 正しい:

--sampling_temp "0.1"    # 引用符付きの文字列

❌ 間違っている:

--sampling_temp 0.1      # 引用符なしのfloat - エラーの原因となる可能性があります
--sampling_temp 0.1,0.2  # 複数の温度には適切な形式が必要です

固定位置JSONL

✅ 正しい:

{"A": [1, 2, 3, 10, 11], "B": [5, 6]}

❌ 間違っている:

{"A": "1,2,3,10,11"}     # リストの代わりに文字列
{A: [1, 2, 3]}           # キーに引用符がない
{"A": [1,2,3,]}          # 末尾のカンマ

鎖の選択

✅ 正しい:

--pdb_path_chains A,B    # スペースなし

❌ 間違っている:

--pdb_path_chains A, B   # カンマの後のスペース
--pdb_path_chains "A,B"  # 引用符が問題を引き起こす可能性があります

アミノ酸バイアス

# 特定のAAへのバイアス（正の値 = 好ましい）
--bias_AA_jsonl '{"A": {"A": 1.5, "W": -2.0}}'

# 特定のAAをグローバルに省略
--omit_AAs "CM"  # システインまたはメチオニンなし

# 位置ごとの省略
--omit_AA_jsonl '{"A": {"1": "C", "2": "CM"}}'

マルチ鎖設計

# 鎖AとBを一緒に設計
--pdb_path_chains A,B

# 鎖を結合（同じ配列）
--tied_positions_jsonl tied.jsonl

バリアントの比較

バリアント	ユースケース	主な違い
ProteinMPNN	一般	オリジナルモデル
SolubleMPNN	発現	可溶性タンパク質でトレーニング済み
LigandMPNN	低分子	リガンドを認識したコンテキスト

出力形式

output/
├── seqs/
│   └── backbone.fa          # FASTA配列
└── backbone_pdb/
    └── backbone_0001.pdb    # 設計された配列を持つPDB

FASTAヘッダー形式

>backbone_0001, score=1.234, global_score=1.234, seq_recovery=0.85
MKTAYIAKQRQISFVKSHFSRQLE...

一般的なワークフロー

バインダー配列設計

python protein_mpnn_run.py \
  --pdb_path binder_backbone.pdb \
  --out_folder output/ \
  --num_seq_per_target 16 \
  --sampling_temp "0.1" \
  --pdb_path_chains B  # バインダー鎖のみを設計

インターフェース再設計

# コアを固定し、インターフェースを設計
python protein_mpnn_run.py \
  --pdb_path complex.pdb \
  --fixed_positions_jsonl core_positions.jsonl \
  --num_seq_per_target 32

マルチステート設計

# 複数のコンフォメーションを設計
python protein_mpnn_run.py \
  --pdb_path_multi state1.pdb,state2.pdb \
  --num_seq_per_target 16

サンプル出力

成功した実行

$ python protein_mpnn_run.py --pdb_path backbone.pdb --out_folder output/ --num_seq_per_target 8
Loading model weights...
Designing sequences for backbone.pdb
Generated 8 sequences in 2.3 seconds

output/seqs/backbone.fa:
>backbone_0001, score=1.234, global_score=1.189, seq_recovery=0.82
MKTAYIAKQRQISFVKSHFSRQLEERGLTKE...
>backbone_0002, score=1.198, global_score=1.156, seq_recovery=0.79
MKTAYIAKQRQISFVKSQFSRQLDERGLTKE...

良好な出力の例：

Score: 1.0-2.0 (低いほど確信度が高い)
Seq recovery: de novoの場合は0.3-0.6、再設計の場合は0.7-0.9
temp > 0.1の場合、多様な配列（すべてが同一ではない）

決定木

ProteinMPNNを使用すべきですか？
│
├─ バックボーン構造を持っていますか？
│  ├─ はい → 下に進みます
│  └─ いいえ → まずRFdiffusionを使用します
│
├─ 結合部位には何がありますか？
│  ├─ 何もなし / タンパク質のみ → ProteinMPNN ✓
│  ├─ 低分子 / リガンド → LigandMPNNを使用します
│  └─ 金属 / 補因子 → LigandMPNNを使用します
│
├─ 優先順位は？
│  ├─ 可溶性/発現 → SolubleMPNNを検討します
│  ├─ 速度 → ProteinMPNN ✓
│  └─ AF2最適化 → ColabDesignを検討します
│
└─ 固定位置が必要ですか？
   ├─ はい → --fixed_positions_jsonlを使用します
   └─ いいえ → ProteinMPNN ✓ (すべて設計)

標準的なパフォーマンス

キャンペーンサイズ	時間 (T4)	コスト (Modal)	注
100バックボーン × 8配列	15-20 分	~$2	標準
500バックボーン × 8配列	1-1.5 時間	~$8	大規模キャンペーン
1000バックボーン × 16配列	3-4 時間	~$18	包括的

スループット: T4 GPUで~50-100配列/分。

検証

grep -c "^>" output/seqs/*.fa  # backbone_count × num_seq_per_target と一致する必要があります

トラブルシューティング

配列の多様性が低い: sampling_tempを0.2-0.3に増やします 回収率が低い: sampling_tempを0.1に減らします OOMエラー: batch_sizeを減らします 不要なシステイン: --omit_AAs "C"を使用します

エラーの解釈

エラー	原因	修正
`

(原文がここで切り詰められています)

📜 原文 SKILL.md(Claudeが読む英語/中国語)を展開

ProteinMPNN Sequence Design

Prerequisites

Requirement	Minimum	Recommended
Python	3.8+	3.10
CUDA	11.0+	11.7+
GPU VRAM	8GB	16GB (T4)
RAM	8GB	16GB

How to run

First time? See Installation Guide to set up Modal and biomodals.

Option 1: Local installation (recommended)

git clone https://github.com/dauparas/ProteinMPNN.git
cd ProteinMPNN

python protein_mpnn_run.py \
  --pdb_path backbone.pdb \
  --out_folder output/ \
  --num_seq_per_target 16 \
  --sampling_temp "0.1"

GPU: T4 (16GB) sufficient | Time: ~50-100 sequences/minute

Option 2: Modal (via LigandMPNN wrapper)

cd biomodals
modal run modal_ligandmpnn.py \
  --pdb-path backbone.pdb \
  --num-seq-per-target 16

Note: LigandMPNN includes ProteinMPNN functionality.

Config Schema

Core Parameters

Parameter	Default	Range	Description
`--pdb_path`	required	path	Single PDB input
`--pdb_path_chains`	all	A,B	Chains to design (comma-sep)
`--out_folder`	required	path	Output directory
`--num_seq_per_target`	1	1-1000	Sequences per structure
`--sampling_temp`	"0.1"	"0.0001-1.0"	Temperature (string!)
`--seed`	0	int	Random seed
`--batch_size`	1	1-32	Batch size

Temperature Guide

0.1  -> Low diversity, high recovery (production)
0.2  -> Moderate diversity (default)
0.3  -> Higher diversity (exploration)
0.5+ -> Very diverse, lower quality

IMPORTANT: Temperature must be passed as a string, not float.

Common mistakes

Temperature Parameter

✅ Correct:

--sampling_temp "0.1"    # String with quotes

❌ Wrong:

--sampling_temp 0.1      # Float without quotes - may cause errors
--sampling_temp 0.1,0.2  # Multiple temps need proper format

Fixed Positions JSONL

✅ Correct:

{"A": [1, 2, 3, 10, 11], "B": [5, 6]}

❌ Wrong:

{"A": "1,2,3,10,11"}     # String instead of list
{A: [1, 2, 3]}           # Missing quotes on key
{"A": [1,2,3,]}          # Trailing comma

Chain Selection

✅ Correct:

--pdb_path_chains A,B    # No spaces

❌ Wrong:

--pdb_path_chains A, B   # Space after comma
--pdb_path_chains "A,B"  # Quotes may cause issues

Amino Acid Biases

# Bias toward certain AAs (positive = favor)
--bias_AA_jsonl '{"A": {"A": 1.5, "W": -2.0}}'

# Omit specific AAs globally
--omit_AAs "CM"  # No cysteine or methionine

# Per-position omission
--omit_AA_jsonl '{"A": {"1": "C", "2": "CM"}}'

Multi-Chain Design

# Design chains A and B together
--pdb_path_chains A,B

# Tie chains (same sequence)
--tied_positions_jsonl tied.jsonl

Variants Comparison

Variant	Use Case	Key Difference
ProteinMPNN	General	Original model
SolubleMPNN	Expression	Trained on soluble proteins
LigandMPNN	Small molecules	Ligand-aware context

Output format

output/
├── seqs/
│   └── backbone.fa          # FASTA sequences
└── backbone_pdb/
    └── backbone_0001.pdb    # PDBs with designed sequence

FASTA Header Format

>backbone_0001, score=1.234, global_score=1.234, seq_recovery=0.85
MKTAYIAKQRQISFVKSHFSRQLE...

Common workflows

Binder Sequence Design

python protein_mpnn_run.py \
  --pdb_path binder_backbone.pdb \
  --out_folder output/ \
  --num_seq_per_target 16 \
  --sampling_temp "0.1" \
  --pdb_path_chains B  # Design binder chain only

Interface Redesign

# Fix core, design interface
python protein_mpnn_run.py \
  --pdb_path complex.pdb \
  --fixed_positions_jsonl core_positions.jsonl \
  --num_seq_per_target 32

Multi-State Design

# Design for multiple conformations
python protein_mpnn_run.py \
  --pdb_path_multi state1.pdb,state2.pdb \
  --num_seq_per_target 16

Sample output

Successful run

$ python protein_mpnn_run.py --pdb_path backbone.pdb --out_folder output/ --num_seq_per_target 8
Loading model weights...
Designing sequences for backbone.pdb
Generated 8 sequences in 2.3 seconds

output/seqs/backbone.fa:
>backbone_0001, score=1.234, global_score=1.189, seq_recovery=0.82
MKTAYIAKQRQISFVKSHFSRQLEERGLTKE...
>backbone_0002, score=1.198, global_score=1.156, seq_recovery=0.79
MKTAYIAKQRQISFVKSQFSRQLDERGLTKE...

What good output looks like:

Score: 1.0-2.0 (lower = more confident)
Seq recovery: 0.3-0.6 for de novo, 0.7-0.9 for redesign
Diverse sequences (not all identical) when temp > 0.1

Decision tree

Should I use ProteinMPNN?
│
├─ Have a backbone structure?
│  ├─ Yes → Continue below
│  └─ No → Use RFdiffusion first
│
├─ What's in the binding site?
│  ├─ Nothing / protein only → ProteinMPNN ✓
│  ├─ Small molecule / ligand → Use LigandMPNN
│  └─ Metal / cofactor → Use LigandMPNN
│
├─ Priority?
│  ├─ Solubility/expression → Consider SolubleMPNN
│  ├─ Speed → ProteinMPNN ✓
│  └─ AF2 optimization → Consider ColabDesign
│
└─ Need fixed positions?
   ├─ Yes → Use --fixed_positions_jsonl
   └─ No → ProteinMPNN ✓ (design all)

Typical performance

Campaign Size	Time (T4)	Cost (Modal)	Notes
100 backbones × 8 seq	15-20 min	~$2	Standard
500 backbones × 8 seq	1-1.5h	~$8	Large campaign
1000 backbones × 16 seq	3-4h	~$18	Comprehensive

Throughput: ~50-100 sequences/minute on T4 GPU.

Verify

grep -c "^>" output/seqs/*.fa  # Should match backbone_count × num_seq_per_target

Troubleshooting

Low sequence diversity: Increase sampling_temp to 0.2-0.3 Poor recovery: Decrease sampling_temp to 0.1 OOM errors: Reduce batch_size Unwanted cysteines: Use --omit_AAs "C"

Error interpretation

Error	Cause	Fix
`RuntimeError: CUDA out of memory`	Long protein or large batch	Reduce batch_size or use larger GPU
`KeyError: 'A'`	Chain not in PDB	Check chain IDs in your PDB file
`JSONDecodeError`	Invalid JSONL format	Validate JSON syntax (see Common Mistakes)
`IndexError: list index`	Empty chain or residue list	Check PDB has atoms, not just HEADER

Next: Structure prediction for validation → protein-qc for filtering.