alphafold-database
Access AlphaFold's 200M+ AI-predicted protein structures. Retrieve structures by UniProt ID, download PDB/mmCIF files, analyze confidence metrics (pLDDT, PAE), for drug discovery and structural biology.
下記のコマンドをコピーしてターミナル(Mac/Linux)または PowerShell(Windows)に貼り付けてください。 ダウンロード → 解凍 → 配置まで全自動。
mkdir -p ~/.claude/skills && cd ~/.claude/skills && curl -L -o alphafold-database.zip https://jpskill.com/download/18338.zip && unzip -o alphafold-database.zip && rm alphafold-database.zip
$d = "$env:USERPROFILE\.claude\skills"; ni -Force -ItemType Directory $d | Out-Null; iwr https://jpskill.com/download/18338.zip -OutFile "$d\alphafold-database.zip"; Expand-Archive "$d\alphafold-database.zip" -DestinationPath $d -Force; ri "$d\alphafold-database.zip"
完了後、Claude Code を再起動 → 普通に「動画プロンプト作って」のように話しかけるだけで自動発動します。
💾 手動でダウンロードしたい(コマンドが難しい人向け)
- 1. 下の青いボタンを押して
alphafold-database.zipをダウンロード - 2. ZIPファイルをダブルクリックで解凍 →
alphafold-databaseフォルダができる - 3. そのフォルダを
C:\Users\あなたの名前\.claude\skills\(Win)または~/.claude/skills/(Mac)へ移動 - 4. Claude Code を再起動
⚠️ ダウンロード・利用は自己責任でお願いします。当サイトは内容・動作・安全性について責任を負いません。
🎯 このSkillでできること
下記の説明文を読むと、このSkillがあなたに何をしてくれるかが分かります。Claudeにこの分野の依頼をすると、自動で発動します。
📦 インストール方法 (3ステップ)
- 1. 上の「ダウンロード」ボタンを押して .skill ファイルを取得
- 2. ファイル名の拡張子を .skill から .zip に変えて展開(macは自動展開可)
- 3. 展開してできたフォルダを、ホームフォルダの
.claude/skills/に置く- · macOS / Linux:
~/.claude/skills/ - · Windows:
%USERPROFILE%\.claude\skills\
- · macOS / Linux:
Claude Code を再起動すれば完了。「このSkillを使って…」と話しかけなくても、関連する依頼で自動的に呼び出されます。
詳しい使い方ガイドを見る →- 最終更新
- 2026-05-18
- 取得日時
- 2026-05-18
- 同梱ファイル
- 2
📖 Skill本文(日本語訳)
※ 原文(英語/中国語)を Gemini で日本語化したものです。Claude 自身は原文を読みます。誤訳がある場合は原文をご確認ください。
AlphaFold Database
概要
AlphaFold DB は、DeepMind と EMBL-EBI によって維持されている、2億以上のタンパク質の AI 予測された3Dタンパク質構造の公開リポジトリです。信頼性メトリクス付きの構造予測へのアクセス、座標ファイルのダウンロード、バルクデータセットの取得、および予測の計算ワークフローへの統合が可能です。
この Skill を使用する場面
この Skill は、次のようなシナリオで AI 予測されたタンパク質構造を扱う場合に使用する必要があります。
- UniProt ID またはタンパク質名によるタンパク質構造予測の取得
- 構造解析のための PDB/mmCIF 座標ファイルのダウンロード
- 信頼性を評価するための予測信頼性メトリクス (pLDDT, PAE) の分析
- Google Cloud Platform を介したバルクプロテオームデータセットへのアクセス
- 予測された構造と実験データとの比較
- 構造ベースの創薬またはタンパク質工学の実行
- 実験構造がないタンパク質の構造モデルの構築
- AlphaFold 予測の計算パイプラインへの統合
主要な機能
1. 予測の検索と取得
Biopython の使用 (推奨):
Biopython ライブラリは、AlphaFold 構造を取得するための最もシンプルなインターフェースを提供します。
from Bio.PDB import alphafold_db
# UniProt アクセッションのすべての予測を取得
predictions = list(alphafold_db.get_predictions("P00520"))
# 構造ファイル (mmCIF 形式) をダウンロード
for prediction in predictions:
cif_file = alphafold_db.download_cif_for(prediction, directory="./structures")
print(f"Downloaded: {cif_file}")
# Structure オブジェクトを直接取得
from Bio.PDB import MMCIFParser
structures = list(alphafold_db.get_structural_models_for("P00520"))
直接 API アクセス:
REST エンドポイントを使用して予測をクエリします。
import requests
# UniProt アクセッションの予測メタデータを取得
uniprot_id = "P00520"
api_url = f"https://alphafold.ebi.ac.uk/api/prediction/{uniprot_id}"
response = requests.get(api_url)
prediction_data = response.json()
# AlphaFold ID を抽出
alphafold_id = prediction_data[0]['entryId']
print(f"AlphaFold ID: {alphafold_id}")
UniProt を使用してアクセッションを検索:
最初に UniProt を検索してタンパク質アクセッションを見つけます。
import urllib.parse, urllib.request
def get_uniprot_ids(query, query_type='PDB_ID'):
"""UniProt をクエリしてアクセッション ID を取得"""
url = 'https://www.uniprot.org/uploadlists/'
params = {
'from': query_type,
'to': 'ACC',
'format': 'txt',
'query': query
}
data = urllib.parse.urlencode(params).encode('ascii')
with urllib.request.urlopen(urllib.request.Request(url, data)) as response:
return response.read().decode('utf-8').splitlines()
# 例: タンパク質名の UniProt ID を検索
protein_ids = get_uniprot_ids("hemoglobin", query_type="GENE_NAME")
2. 構造ファイルのダウンロード
AlphaFold は、各予測に対して複数のファイル形式を提供します。
利用可能なファイルの種類:
- モデル座標 (
model_v4.cif): mmCIF/PDBx 形式の原子座標 - 信頼性スコア (
confidence_v4.json): 残基ごとの pLDDT スコア (0-100) - 予測されたアラインメントエラー (
predicted_aligned_error_v4.json): 残基ペアの信頼性に関する PAE マトリックス
ダウンロード URL:
import requests
alphafold_id = "AF-P00520-F1"
version = "v4"
# モデル座標 (mmCIF)
model_url = f"https://alphafold.ebi.ac.uk/files/{alphafold_id}-model_{version}.cif"
response = requests.get(model_url)
with open(f"{alphafold_id}.cif", "w") as f:
f.write(response.text)
# 信頼性スコア (JSON)
confidence_url = f"https://alphafold.ebi.ac.uk/files/{alphafold_id}-confidence_{version}.json"
response = requests.get(confidence_url)
confidence_data = response.json()
# 予測されたアラインメントエラー (JSON)
pae_url = f"https://alphafold.ebi.ac.uk/files/{alphafold_id}-predicted_aligned_error_{version}.json"
response = requests.get(pae_url)
pae_data = response.json()
PDB 形式 (代替):
# mmCIF の代わりに PDB 形式でダウンロード
pdb_url = f"https://alphafold.ebi.ac.uk/files/{alphafold_id}-model_{version}.pdb"
response = requests.get(pdb_url)
with open(f"{alphafold_id}.pdb", "wb") as f:
f.write(response.content)
3. 信頼性メトリクスの操作
AlphaFold 予測には、解釈に不可欠な信頼性推定が含まれています。
pLDDT (残基ごとの信頼性):
import json
import requests
# 信頼性スコアをロード
alphafold_id = "AF-P00520-F1"
confidence_url = f"https://alphafold.ebi.ac.uk/files/{alphafold_id}-confidence_v4.json"
confidence = requests.get(confidence_url).json()
# pLDDT スコアを抽出
plddt_scores = confidence['confidenceScore']
# 信頼性レベルを解釈
# pLDDT > 90: 非常に高い信頼性
# pLDDT 70-90: 高い信頼性
# pLDDT 50-70: 低い信頼性
# pLDDT < 50: 非常に低い信頼性
high_confidence_residues = [i for i, score in enumerate(plddt_scores) if score > 90]
print(f"High confidence residues: {len(high_confidence_residues)}/{len(plddt_scores)}")
PAE (予測されたアラインメントエラー):
PAE は、相対的なドメイン位置の信頼性を示します。
import numpy as np
import matplotlib.pyplot as plt
# PAE マトリックスをロード
pae_url = f"https://alphafold.ebi.ac.uk/files/{alphafold_id}-predicted_aligned_error_v4.json"
pae = requests.get(pae_url).json()
# PAE マトリックスを可視化
pae_matrix = np.array(pae['distance'])
plt.figure(figsize=(10, 8))
plt.imshow(pae_matrix, cmap='viridis_r', vmin=0, vmax=30)
plt.colorbar(label='PAE (Å)')
plt.title(f'Predicted Aligned Error: {alphafold_id}')
plt.xlabel('Residue')
plt.ylabel('Residue')
plt.savefig(f'{alphafold_id}_pae.png', dpi=300, bbox_inches='tight')
# 低い PAE 値 (<5 Å) は、信頼性の高い相対的な位置決めを示します
# 高い PAE 値 (>15 Å) は、不確実なドメイン配置を示唆します
4. Google Cloud 経由でのバルクデータアクセス
大規模な分析には、Google Cloud データセットを使用します。
Google Cloud Storage:
# gsutil をインストール
uv pip install gsutil
# 利用可能なデータをリスト表示
gsutil ls gs://public-datasets-deepmi
(原文がここで切り詰められています) 📜 原文 SKILL.md(Claudeが読む英語/中国語)を展開
AlphaFold Database
Overview
AlphaFold DB is a public repository of AI-predicted 3D protein structures for over 200 million proteins, maintained by DeepMind and EMBL-EBI. Access structure predictions with confidence metrics, download coordinate files, retrieve bulk datasets, and integrate predictions into computational workflows.
When to Use This Skill
This skill should be used when working with AI-predicted protein structures in scenarios such as:
- Retrieving protein structure predictions by UniProt ID or protein name
- Downloading PDB/mmCIF coordinate files for structural analysis
- Analyzing prediction confidence metrics (pLDDT, PAE) to assess reliability
- Accessing bulk proteome datasets via Google Cloud Platform
- Comparing predicted structures with experimental data
- Performing structure-based drug discovery or protein engineering
- Building structural models for proteins lacking experimental structures
- Integrating AlphaFold predictions into computational pipelines
Core Capabilities
1. Searching and Retrieving Predictions
Using Biopython (Recommended):
The Biopython library provides the simplest interface for retrieving AlphaFold structures:
from Bio.PDB import alphafold_db
# Get all predictions for a UniProt accession
predictions = list(alphafold_db.get_predictions("P00520"))
# Download structure file (mmCIF format)
for prediction in predictions:
cif_file = alphafold_db.download_cif_for(prediction, directory="./structures")
print(f"Downloaded: {cif_file}")
# Get Structure objects directly
from Bio.PDB import MMCIFParser
structures = list(alphafold_db.get_structural_models_for("P00520"))
Direct API Access:
Query predictions using REST endpoints:
import requests
# Get prediction metadata for a UniProt accession
uniprot_id = "P00520"
api_url = f"https://alphafold.ebi.ac.uk/api/prediction/{uniprot_id}"
response = requests.get(api_url)
prediction_data = response.json()
# Extract AlphaFold ID
alphafold_id = prediction_data[0]['entryId']
print(f"AlphaFold ID: {alphafold_id}")
Using UniProt to Find Accessions:
Search UniProt to find protein accessions first:
import urllib.parse, urllib.request
def get_uniprot_ids(query, query_type='PDB_ID'):
"""Query UniProt to get accession IDs"""
url = 'https://www.uniprot.org/uploadlists/'
params = {
'from': query_type,
'to': 'ACC',
'format': 'txt',
'query': query
}
data = urllib.parse.urlencode(params).encode('ascii')
with urllib.request.urlopen(urllib.request.Request(url, data)) as response:
return response.read().decode('utf-8').splitlines()
# Example: Find UniProt IDs for a protein name
protein_ids = get_uniprot_ids("hemoglobin", query_type="GENE_NAME")
2. Downloading Structure Files
AlphaFold provides multiple file formats for each prediction:
File Types Available:
- Model coordinates (
model_v4.cif): Atomic coordinates in mmCIF/PDBx format - Confidence scores (
confidence_v4.json): Per-residue pLDDT scores (0-100) - Predicted Aligned Error (
predicted_aligned_error_v4.json): PAE matrix for residue pair confidence
Download URLs:
import requests
alphafold_id = "AF-P00520-F1"
version = "v4"
# Model coordinates (mmCIF)
model_url = f"https://alphafold.ebi.ac.uk/files/{alphafold_id}-model_{version}.cif"
response = requests.get(model_url)
with open(f"{alphafold_id}.cif", "w") as f:
f.write(response.text)
# Confidence scores (JSON)
confidence_url = f"https://alphafold.ebi.ac.uk/files/{alphafold_id}-confidence_{version}.json"
response = requests.get(confidence_url)
confidence_data = response.json()
# Predicted Aligned Error (JSON)
pae_url = f"https://alphafold.ebi.ac.uk/files/{alphafold_id}-predicted_aligned_error_{version}.json"
response = requests.get(pae_url)
pae_data = response.json()
PDB Format (Alternative):
# Download as PDB format instead of mmCIF
pdb_url = f"https://alphafold.ebi.ac.uk/files/{alphafold_id}-model_{version}.pdb"
response = requests.get(pdb_url)
with open(f"{alphafold_id}.pdb", "wb") as f:
f.write(response.content)
3. Working with Confidence Metrics
AlphaFold predictions include confidence estimates critical for interpretation:
pLDDT (per-residue confidence):
import json
import requests
# Load confidence scores
alphafold_id = "AF-P00520-F1"
confidence_url = f"https://alphafold.ebi.ac.uk/files/{alphafold_id}-confidence_v4.json"
confidence = requests.get(confidence_url).json()
# Extract pLDDT scores
plddt_scores = confidence['confidenceScore']
# Interpret confidence levels
# pLDDT > 90: Very high confidence
# pLDDT 70-90: High confidence
# pLDDT 50-70: Low confidence
# pLDDT < 50: Very low confidence
high_confidence_residues = [i for i, score in enumerate(plddt_scores) if score > 90]
print(f"High confidence residues: {len(high_confidence_residues)}/{len(plddt_scores)}")
PAE (Predicted Aligned Error):
PAE indicates confidence in relative domain positions:
import numpy as np
import matplotlib.pyplot as plt
# Load PAE matrix
pae_url = f"https://alphafold.ebi.ac.uk/files/{alphafold_id}-predicted_aligned_error_v4.json"
pae = requests.get(pae_url).json()
# Visualize PAE matrix
pae_matrix = np.array(pae['distance'])
plt.figure(figsize=(10, 8))
plt.imshow(pae_matrix, cmap='viridis_r', vmin=0, vmax=30)
plt.colorbar(label='PAE (Å)')
plt.title(f'Predicted Aligned Error: {alphafold_id}')
plt.xlabel('Residue')
plt.ylabel('Residue')
plt.savefig(f'{alphafold_id}_pae.png', dpi=300, bbox_inches='tight')
# Low PAE values (<5 Å) indicate confident relative positioning
# High PAE values (>15 Å) suggest uncertain domain arrangements
4. Bulk Data Access via Google Cloud
For large-scale analyses, use Google Cloud datasets:
Google Cloud Storage:
# Install gsutil
uv pip install gsutil
# List available data
gsutil ls gs://public-datasets-deepmind-alphafold-v4/
# Download entire proteomes (by taxonomy ID)
gsutil -m cp gs://public-datasets-deepmind-alphafold-v4/proteomes/proteome-tax_id-9606-*.tar .
# Download specific files
gsutil cp gs://public-datasets-deepmind-alphafold-v4/accession_ids.csv .
BigQuery Metadata Access:
from google.cloud import bigquery
# Initialize client
client = bigquery.Client()
# Query metadata
query = """
SELECT
entryId,
uniprotAccession,
organismScientificName,
globalMetricValue,
fractionPlddtVeryHigh
FROM `bigquery-public-data.deepmind_alphafold.metadata`
WHERE organismScientificName = 'Homo sapiens'
AND fractionPlddtVeryHigh > 0.8
LIMIT 100
"""
results = client.query(query).to_dataframe()
print(f"Found {len(results)} high-confidence human proteins")
Download by Species:
import subprocess
def download_proteome(taxonomy_id, output_dir="./proteomes"):
"""Download all AlphaFold predictions for a species"""
pattern = f"gs://public-datasets-deepmind-alphafold-v4/proteomes/proteome-tax_id-{taxonomy_id}-*_v4.tar"
cmd = f"gsutil -m cp {pattern} {output_dir}/"
subprocess.run(cmd, shell=True, check=True)
# Download E. coli proteome (tax ID: 83333)
download_proteome(83333)
# Download human proteome (tax ID: 9606)
download_proteome(9606)
5. Parsing and Analyzing Structures
Work with downloaded AlphaFold structures using BioPython:
from Bio.PDB import MMCIFParser, PDBIO
import numpy as np
# Parse mmCIF file
parser = MMCIFParser(QUIET=True)
structure = parser.get_structure("protein", "AF-P00520-F1-model_v4.cif")
# Extract coordinates
coords = []
for model in structure:
for chain in model:
for residue in chain:
if 'CA' in residue: # Alpha carbons only
coords.append(residue['CA'].get_coord())
coords = np.array(coords)
print(f"Structure has {len(coords)} residues")
# Calculate distances
from scipy.spatial.distance import pdist, squareform
distance_matrix = squareform(pdist(coords))
# Identify contacts (< 8 Å)
contacts = np.where((distance_matrix > 0) & (distance_matrix < 8))
print(f"Number of contacts: {len(contacts[0]) // 2}")
Extract B-factors (pLDDT values):
AlphaFold stores pLDDT scores in the B-factor column:
from Bio.PDB import MMCIFParser
parser = MMCIFParser(QUIET=True)
structure = parser.get_structure("protein", "AF-P00520-F1-model_v4.cif")
# Extract pLDDT from B-factors
plddt_scores = []
for model in structure:
for chain in model:
for residue in chain:
if 'CA' in residue:
plddt_scores.append(residue['CA'].get_bfactor())
# Identify high-confidence regions
high_conf_regions = [(i, score) for i, score in enumerate(plddt_scores, 1) if score > 90]
print(f"High confidence residues: {len(high_conf_regions)}")
6. Batch Processing Multiple Proteins
Process multiple predictions efficiently:
from Bio.PDB import alphafold_db
import pandas as pd
uniprot_ids = ["P00520", "P12931", "P04637"] # Multiple proteins
results = []
for uniprot_id in uniprot_ids:
try:
# Get prediction
predictions = list(alphafold_db.get_predictions(uniprot_id))
if predictions:
pred = predictions[0]
# Download structure
cif_file = alphafold_db.download_cif_for(pred, directory="./batch_structures")
# Get confidence data
alphafold_id = pred['entryId']
conf_url = f"https://alphafold.ebi.ac.uk/files/{alphafold_id}-confidence_v4.json"
conf_data = requests.get(conf_url).json()
# Calculate statistics
plddt_scores = conf_data['confidenceScore']
avg_plddt = np.mean(plddt_scores)
high_conf_fraction = sum(1 for s in plddt_scores if s > 90) / len(plddt_scores)
results.append({
'uniprot_id': uniprot_id,
'alphafold_id': alphafold_id,
'avg_plddt': avg_plddt,
'high_conf_fraction': high_conf_fraction,
'length': len(plddt_scores)
})
except Exception as e:
print(f"Error processing {uniprot_id}: {e}")
# Create summary DataFrame
df = pd.DataFrame(results)
print(df)
Installation and Setup
Python Libraries
# Install Biopython for structure access
uv pip install biopython
# Install requests for API access
uv pip install requests
# For visualization and analysis
uv pip install numpy matplotlib pandas scipy
# For Google Cloud access (optional)
uv pip install google-cloud-bigquery gsutil
3D-Beacons API Alternative
AlphaFold can also be accessed via the 3D-Beacons federated API:
import requests
# Query via 3D-Beacons
uniprot_id = "P00520"
url = f"https://www.ebi.ac.uk/pdbe/pdbe-kb/3dbeacons/api/uniprot/summary/{uniprot_id}.json"
response = requests.get(url)
data = response.json()
# Filter for AlphaFold structures
af_structures = [s for s in data['structures'] if s['provider'] == 'AlphaFold DB']
Common Use Cases
Structural Proteomics
- Download complete proteome predictions for analysis
- Identify high-confidence structural regions across proteins
- Compare predicted structures with experimental data
- Build structural models for protein families
Drug Discovery
- Retrieve target protein structures for docking studies
- Analyze binding site conformations
- Identify druggable pockets in predicted structures
- Compare structures across homologs
Protein Engineering
- Identify stable/unstable regions using pLDDT
- Design mutations in high-confidence regions
- Analyze domain architectures using PAE
- Model protein variants and mutations
Evolutionary Studies
- Compare ortholog structures across species
- Analyze conservation of structural features
- Study domain evolution patterns
- Identify functionally important regions
Key Concepts
UniProt Accession: Primary identifier for proteins (e.g., "P00520"). Required for querying AlphaFold DB.
AlphaFold ID: Internal identifier format: AF-[UniProt accession]-F[fragment number] (e.g., "AF-P00520-F1").
pLDDT (predicted Local Distance Difference Test): Per-residue confidence metric (0-100). Higher values indicate more confident predictions.
PAE (Predicted Aligned Error): Matrix indicating confidence in relative positions between residue pairs. Low values (<5 Å) suggest confident relative positioning.
Database Version: Current version is v4. File URLs include version suffix (e.g., model_v4.cif).
Fragment Number: Large proteins may be split into fragments. Fragment number appears in AlphaFold ID (e.g., F1, F2).
Confidence Interpretation Guidelines
pLDDT Thresholds:
- >90: Very high confidence - suitable for detailed analysis
- 70-90: High confidence - generally reliable backbone structure
- 50-70: Low confidence - use with caution, flexible regions
- <50: Very low confidence - likely disordered or unreliable
PAE Guidelines:
- <5 Å: Confident relative positioning of domains
- 5-10 Å: Moderate confidence in arrangement
- >15 Å: Uncertain relative positions, domains may be mobile
Resources
references/api_reference.md
Comprehensive API documentation covering:
- Complete REST API endpoint specifications
- File format details and data schemas
- Google Cloud dataset structure and access patterns
- Advanced query examples and batch processing strategies
- Rate limiting, caching, and best practices
- Troubleshooting common issues
Consult this reference for detailed API information, bulk download strategies, or when working with large-scale datasets.
Important Notes
Data Usage and Attribution
- AlphaFold DB is freely available under CC-BY-4.0 license
- Cite: Jumper et al. (2021) Nature and Varadi et al. (2022) Nucleic Acids Research
- Predictions are computational models, not experimental structures
- Always assess confidence metrics before downstream analysis
Version Management
- Current database version: v4 (as of 2024-2025)
- File URLs include version suffix (e.g.,
_v4.cif) - Check for database updates regularly
- Older versions may be deprecated over time
Data Quality Considerations
- High pLDDT doesn't guarantee functional accuracy
- Low confidence regions may be disordered in vivo
- PAE indicates relative domain confidence, not absolute positioning
- Predictions lack ligands, post-translational modifications, and cofactors
- Multi-chain complexes are not predicted (single chains only)
Performance Tips
- Use Biopython for simple single-protein access
- Use Google Cloud for bulk downloads (much faster than individual files)
- Cache downloaded files locally to avoid repeated downloads
- BigQuery free tier: 1 TB processed data per month
- Consider network bandwidth for large-scale downloads
Additional Resources
- AlphaFold DB Website: https://alphafold.ebi.ac.uk/
- API Documentation: https://alphafold.ebi.ac.uk/api-docs
- Google Cloud Dataset: https://cloud.google.com/blog/products/ai-machine-learning/alphafold-protein-structure-database
- 3D-Beacons API: https://www.ebi.ac.uk/pdbe/pdbe-kb/3dbeacons/
- AlphaFold Papers:
- Nature (2021): https://doi.org/10.1038/s41586-021-03819-2
- Nucleic Acids Research (2024): https://doi.org/10.1093/nar/gkad1011
- Biopython Documentation: https://biopython.org/docs/dev/api/Bio.PDB.alphafold_db.html
- GitHub Repository: https://github.com/google-deepmind/alphafold
同梱ファイル
※ ZIPに含まれるファイル一覧。`SKILL.md` 本体に加え、参考資料・サンプル・スクリプトが入っている場合があります。
- 📄 SKILL.md (15,806 bytes)
- 📎 references/api_reference.md (12,557 bytes)