📦 Claw Ancestry Pca
Simons Genome Diversity Projectのデータに基づき、個人の祖先をPCAで詳細に分析するSkill。
📺 まず動画で見る(YouTube)
▶ 【Claude Code完全入門】誰でも使える/Skills活用法/経営者こそ使うべき ↗
※ jpskill.com 編集部が参考用に選んだ動画です。動画の内容と Skill の挙動は厳密には一致しないことがあります。
📜 元の英語説明(参考)
Ancestry decomposition PCA against the Simons Genome Diversity Project
🇯🇵 日本人クリエイター向け解説
Simons Genome Diversity Projectのデータに基づき、個人の祖先をPCAで詳細に分析するSkill。
※ jpskill.com 編集部が日本のビジネス現場向けに補足した解説です。Skill本体の挙動とは独立した参考情報です。
⚠️ ダウンロード・利用は自己責任でお願いします。当サイトは内容・動作・安全性について責任を負いません。
🎯 このSkillでできること
下記の説明文を読むと、このSkillがあなたに何をしてくれるかが分かります。Claudeにこの分野の依頼をすると、自動で発動します。
📦 インストール方法 (3ステップ)
- 1. 上の「ダウンロード」ボタンを押して .skill ファイルを取得
- 2. ファイル名の拡張子を .skill から .zip に変えて展開(macは自動展開可)
- 3. 展開してできたフォルダを、ホームフォルダの
.claude/skills/に置く- · macOS / Linux:
~/.claude/skills/ - · Windows:
%USERPROFILE%\.claude\skills\
- · macOS / Linux:
Claude Code を再起動すれば完了。「このSkillを使って…」と話しかけなくても、関連する依頼で自動的に呼び出されます。
詳しい使い方ガイドを見る →- 最終更新
- 2026-05-17
- 取得日時
- 2026-05-17
- 同梱ファイル
- 1
💬 こう話しかけるだけ — サンプルプロンプト
- › Claw Ancestry Pca の使い方を教えて
- › Claw Ancestry Pca で何ができるか具体例で見せて
- › Claw Ancestry Pca を初めて使う人向けにステップを案内して
これをClaude Code に貼るだけで、このSkillが自動発動します。
📖 Claude が読む原文 SKILL.md(中身を展開)
この本文は AI(Claude)が読むための原文(英語または中国語)です。日本語訳は順次追加中。
🦖 Ancestry Decomposition PCA
Place your study cohort in global genetic context by computing a joint PCA against the Simons Genome Diversity Project (SGDP) — 345 samples from 164 populations spanning every inhabited continent.
What it does
- Takes your VCF + population map as input
- Finds common variants between your cohort and the SGDP reference panel (bundled)
- Runs PLINK PCA on the merged dataset
- Separates your cohort from SGDP reference samples
- Matches SGDP samples to their population labels (164 populations)
- Generates a publication-quality multi-panel figure:
- Panel A: PC1 vs PC2 — main population structure of your cohort
- Panel B: PC3 vs PC2 with regional groupings and confidence ellipses
- Panel C: PC3 vs PC1 with language/cultural groupings
- Panel D: Global context — your samples (circles) vs SGDP (triangles)
- Produces a markdown report with variance explained, population assignments, and reproducibility bundle
Why this exists
If you ask ChatGPT to "run a PCA against a global reference panel," it will:
- Not know which reference panel to use
- Hallucinate PLINK flags for merging datasets with different variant sets
- Skip IBD removal (related individuals distort PCA)
- Not normalise contig names between your VCF and the reference
- Produce a single scatter plot with no population labels
This skill encodes the correct methodological decisions:
- Uses SGDP (the gold-standard reference for global diversity)
- Handles contig normalisation (chr1 vs 1)
- Filters to common biallelic SNPs shared between datasets
- Removes related individuals via IBD checks
- Produces publication-quality multi-panel figures with confidence ellipses
- Differentiates your samples (circles) from reference (triangles)
Reference Panel
The skill bundles the SGDP v4 dataset (Mallick et al., 2016, Nature):
- 345 samples from 164 populations
- Whole-genome sequencing at high coverage
- MAF > 0.1% filter applied
- Populations span: Africa, Americas, Central/South Asia, East Asia, Europe, Middle East, Oceania
Usage
python ancestry_pca.py \
--vcf your_cohort.vcf.gz \
--pop-map your_populations.tsv \
--output ancestry_report
Demo (works out of the box)
python ancestry_pca.py --demo --output demo_report
The demo uses pre-computed PCA results from the Peruvian Genome Project (736 samples, 28 populations) and generates the full 4-panel figure instantly.
Example Output
Ancestry Decomposition PCA
==========================
Cohort: 736 samples, 28 populations
Reference: SGDP (345 samples, 164 populations)
Common variants: 42,831 biallelic SNPs
Variance explained:
PC1: 51.44% PC2: 21.70% PC3: 6.70%
Panel D — Global Context:
Cohort samples cluster between European and East Asian
reference populations, with Amazonian groups showing
distinct positioning from Highland and Coastal groups.
Figures saved to: ancestry_report/
Figure3_PCA_composite.png (300 dpi)
Figure3_PCA_composite.pdf (vector)
Reproducibility:
commands.sh | environment.yml | checksums.sha256
Interpretation Guide
- PC1 typically captures the largest axis of global differentiation (often Africa vs non-Africa)
- PC2 separates major continental groups (Europe, East Asia, Americas)
- PC3 often reveals finer substructure within continental groups
- Confidence ellipses show 2.5 standard deviations around each population cluster
- Your samples shown as circles, SGDP reference as triangles
Citation
If you use this skill in a publication, please cite:
- Mallick, S. et al. (2016). The Simons Genome Diversity Project. Nature, 538, 201-206.
- Corpas, M. (2026). ClawBio. https://github.com/ClawBio/ClawBio