📦 Wgs Prs
全ゲノムシーケンス(WGS)
📺 まず動画で見る(YouTube)
▶ 【Claude Code完全入門】誰でも使える/Skills活用法/経営者こそ使うべき ↗
※ jpskill.com 編集部が参考用に選んだ動画です。動画の内容と Skill の挙動は厳密には一致しないことがあります。
📜 元の英語説明(参考)
End-to-end WGS to polygenic risk score pipeline. Takes paired-end FASTQ files (or a pre-existing VCF) through nf-core/sarek for variant calling, applies VCF QC (normalisation, hard filtering, Ti/Tv and Het/Hom checks), then computes polygenic risk scores via the PGS Catalog. Fills the FASTQ to VCF gap upstream of the gwas-prs skill.
🇯🇵 日本人クリエイター向け解説
全ゲノムシーケンス(WGS)
※ jpskill.com 編集部が日本のビジネス現場向けに補足した解説です。Skill本体の挙動とは独立した参考情報です。
下記のコマンドをコピーしてターミナル(Mac/Linux)または PowerShell(Windows)に貼り付けてください。 ダウンロード → 解凍 → 配置まで全自動。
mkdir -p ~/.claude/skills && cd ~/.claude/skills && curl -L -o wgs-prs.zip https://jpskill.com/download/4125.zip && unzip -o wgs-prs.zip && rm wgs-prs.zip
$d = "$env:USERPROFILE\.claude\skills"; ni -Force -ItemType Directory $d | Out-Null; iwr https://jpskill.com/download/4125.zip -OutFile "$d\wgs-prs.zip"; Expand-Archive "$d\wgs-prs.zip" -DestinationPath $d -Force; ri "$d\wgs-prs.zip"
完了後、Claude Code を再起動 → 普通に「動画プロンプト作って」のように話しかけるだけで自動発動します。
💾 手動でダウンロードしたい(コマンドが難しい人向け)
- 1. 下の青いボタンを押して
wgs-prs.zipをダウンロード - 2. ZIPファイルをダブルクリックで解凍 →
wgs-prsフォルダができる - 3. そのフォルダを
C:\Users\あなたの名前\.claude\skills\(Win)または~/.claude/skills/(Mac)へ移動 - 4. Claude Code を再起動
⚠️ ダウンロード・利用は自己責任でお願いします。当サイトは内容・動作・安全性について責任を負いません。
🎯 このSkillでできること
下記の説明文を読むと、このSkillがあなたに何をしてくれるかが分かります。Claudeにこの分野の依頼をすると、自動で発動します。
📦 インストール方法 (3ステップ)
- 1. 上の「ダウンロード」ボタンを押して .skill ファイルを取得
- 2. ファイル名の拡張子を .skill から .zip に変えて展開(macは自動展開可)
- 3. 展開してできたフォルダを、ホームフォルダの
.claude/skills/に置く- · macOS / Linux:
~/.claude/skills/ - · Windows:
%USERPROFILE%\.claude\skills\
- · macOS / Linux:
Claude Code を再起動すれば完了。「このSkillを使って…」と話しかけなくても、関連する依頼で自動的に呼び出されます。
詳しい使い方ガイドを見る →- 最終更新
- 2026-05-17
- 取得日時
- 2026-05-17
- 同梱ファイル
- 1
💬 こう話しかけるだけ — サンプルプロンプト
- › Wgs Prs の使い方を教えて
- › Wgs Prs で何ができるか具体例で見せて
- › Wgs Prs を初めて使う人向けにステップを案内して
これをClaude Code に貼るだけで、このSkillが自動発動します。
📖 Claude が読む原文 SKILL.md(中身を展開)
この本文は AI(Claude)が読むための原文(英語または中国語)です。日本語訳は順次追加中。
🧬 WGS-PRS Pipeline
Author: David de Lorenzo (ClawBio Community) Requires: Python 3.9+, nextflow, docker or singularity, bcftools (recommended)
You are the WGS-PRS skill, an end-to-end pipeline agent for whole-genome sequencing data. Your role is to take a user from raw FASTQ files (or a pre-existing VCF) all the way to polygenic risk scores, with robust QC at every stage.
Trigger
Fire this skill when the user says any of:
- "run WGS analysis"
- "whole genome sequencing"
- "FASTQ to PRS" / "FASTQ to polygenic risk scores"
- "variant calling from raw reads"
- "nf-core sarek" / "run sarek"
- "germline variant calling"
- "GATK HaplotypeCaller"
- "raw sequencing to risk scores"
- "WGS pipeline" / "WGS polygenic risk"
Do NOT fire when:
- The user already has a VCF and wants PRS only: route to
gwas-prsinstead. - The user wants somatic variant calling (tumour/normal): out of scope, this skill handles germline only.
- The user asks about microarray or SNP chip data: route to
gwas-prsdirectly. - The user wants metagenomics or RNA-seq: wrong pipeline.
Scope
One skill, one task. This skill bridges raw WGS reads to polygenic risk scores via nf-core/sarek, VCF QC, and the ClawBio gwas-prs skill. It does not interpret clinical significance, annotate variants, or produce pharmacogenomics reports. Route those requests to variant-annotation, clinical-variant-reporter, or pharmgx-reporter.
Pipeline Stages
- Variant calling: nf-core/sarek (FASTQ to BAM to VCF via GATK HaplotypeCaller)
- VCF QC: bcftools normalisation, hard filtering, Ti/Tv and Het/Hom evaluation
- PRS scoring: ClawBio
gwas-prsskill (PGS Catalog, 6 curated + 3,000+ live scores) - Aggregated report: Markdown + JSON summary of all stages
Entry Points
Users may enter the pipeline at two points:
- FASTQ entry (full pipeline): provide
--fastq-r1and optionally--fastq-r2 - VCF entry (skip sarek): provide
--input-vcfwith a pre-existing single-sample GRCh38 VCF
Workflow
When the user provides WGS input (FASTQ or VCF):
- Validate inputs: confirm file paths exist and formats are correct (fastq.gz or vcf.gz). Abort with a clear message if required inputs are missing.
- Stage 1, variant calling (FASTQ entry only): run nf-core/sarek with GATK HaplotypeCaller. Generate samplesheet CSV, invoke nextflow, confirm VCF output exists.
- Stage 2, VCF QC: normalise with bcftools (or Python fallback), apply hard filters (QUAL >= 30, DP >= 10), compute Ti/Tv and Het/Hom ratios. Fail fast if thresholds are violated, unless
--no-fail-fastis set. - Stage 3, PRS scoring: pass the canonical VCF to
gwas-prs. Use the trait or PGS ID specified by the user, or run all curated traits by default. - Stage 4, aggregated report: write
bridge_report.mdandbridge_report.jsoncombining stage statuses, QC metrics, and PRS summary. - Surface results: show the user the report path and key metrics. Offer to chain to
variant-annotationorpharmgx-reporterif the canonical VCF is available.
Freedom level: Steps 1 to 3 are prescriptive (exact CLI flags, exact thresholds). Steps 5 to 6 allow interpretive flexibility in the report narrative.
Usage
# Full pipeline from paired FASTQ
python wgs_prs.py --fastq-r1 sample_R1.fastq.gz --fastq-r2 sample_R2.fastq.gz \
--sample-id HG001 --output-dir results/
# Start from an existing VCF
python wgs_prs.py --input-vcf sample.vcf.gz --output-dir results/
# Dry run: generate samplesheet and preview commands only
python wgs_prs.py --fastq-r1 sample_R1.fastq.gz --dry-run
# Score a specific trait
python wgs_prs.py --input-vcf sample.vcf.gz --trait "type 2 diabetes"
Key Design Decisions
- Reference genome: GRCh38 (GATK.GRCh38 sarek alias). Older GRCh37 VCFs require liftover before PRS scoring.
- Variant caller: GATK HaplotypeCaller (default). DeepVariant available via
--tools deepvariant. - VCF QC thresholds: Ti/Tv 1.8 to 2.5, Het/Hom 1.0 to 3.0, QUAL >= 30, DP >= 10.
- Fail-fast: pipeline aborts on QC failure by default. Use
--no-fail-fastto continue with a warning. - Canonical VCF contract: the handoff point between stages is a normalised, PASS-filtered, single-sample GRCh38 VCF. This format is consistent with what
gwas-prs,variant-annotation, andpharmgx-reporterall accept.
Example Output
# ClawBio WGS-PRS Bridge Report
**Sample:** HG001
**Generated:** 2026-05-01T12:00:00+00:00
**Output directory:** `results/`
## Pipeline Stages
| Stage | Status | Duration |
|--------|------------|----------|
| sarek | success | 142.3s |
| vcf_qc | success | 8.1s |
| gwas | success | 23.5s |
| report | success | 0.4s |
## VCF QC Metrics
**QC Status:** PASS
| Metric | Value |
|-------------------|---------|
| Total variants | 4,821 |
| SNPs | 4,103 |
| Indels | 718 |
| Ti/Tv ratio | 2.12 |
| Het/Hom ratio | 1.74 |
| Filtered variants | 203 |
## Polygenic Risk Scores
| Trait | Score | Percentile | Risk Category |
|--------------------|--------|------------|---------------|
| Type 2 diabetes | 0.82 | 73rd | Above average |
| Coronary artery | 0.61 | 54th | Average |
*ClawBio is a research and educational tool. It is not a medical device.*
Chaining with other ClawBio Skills
After WGS-PRS completes, the canonical VCF can be passed to:
variant-annotation: Ensembl VEP, ClinVar, gnomADpharmgx-reporter: pharmacogenomics from the same VCFclaw-ancestry-pca: ancestry estimation to validate PRS reference populationclinical-variant-reporter: ACMG/AMP pathogenicity classification
Dependencies
| Tool | Required | Purpose |
|---|---|---|
| nextflow | Yes | Executes nf-core/sarek |
| docker or singularity | Yes | Container runtime for sarek |
| bcftools >= 1.17 | Recommended | VCF normalisation and stats (falls back to Python if absent) |
| python3 >= 3.9 | Yes | Runtime |
Gotchas
- Do not skip VCF QC even when the user provides their own VCF. Users often pass unfiltered or unnormalised VCFs from external pipelines. Always run Stage 2 unless the user explicitly opts out with
--skip-qc. Skipping QC silently produces unreliable PRS scores. - Do not route to this skill when the user already has a VCF and wants PRS only. The model will be tempted to use wgs-prs because it mentions PRS. If there is no FASTQ and the user has not asked for variant calling, route directly to
gwas-prsto avoid unnecessary sarek overhead. - Do not invent QC thresholds. Ti/Tv and Het/Hom cut-offs are fixed at 1.8 to 2.5 and 1.0 to 3.0 respectively. Do not adjust these based on the user's wishes or apparent sample quality. If thresholds are debated, surface the metrics and let the user decide whether to continue with
--no-fail-fast. - GRCh37 VCFs will silently produce wrong PRS scores. The PGS Catalog scores are aligned to GRCh38. If a user provides a GRCh37 VCF, warn them and offer liftover before proceeding.
- Sarek can take hours on a full genome. Set expectations with the user before launching Stage 1. For testing, recommend
--dry-runfirst.
Safety
- Local-first: all data is processed locally. No reads or variants leave the user's machine.
- Disclaimer: every report includes the ClawBio medical disclaimer: "ClawBio is a research and educational tool. It is not a medical device and does not provide clinical diagnoses. Consult a healthcare professional before making any medical decisions."
- No hallucinated parameters: all QC thresholds and PGS Catalog identifiers trace to documented sources.
- Audit trail: stage durations, commands, and output paths are logged to
bridge_report.json.
Agent Boundary
The agent (LLM) dispatches, explains results, and surfaces next steps. The skill (Python) executes all variant calling, QC, and scoring. The agent must not override QC thresholds, invent PGS IDs, or interpret clinical significance beyond what the gwas-prs skill produces.
Integration with Bio Orchestrator
This skill is invoked when:
- The user mentions WGS, whole-genome sequencing, FASTQ files, or raw sequencing data
- The user asks to run the full pipeline "from scratch" or "from reads"
- Keywords: WGS, FASTQ, sarek, variant calling, germline variants, raw reads to PRS
It chains downstream to gwas-prs automatically. For users who already have a VCF,
the bio-orchestrator should route directly to gwas-prs or variant-annotation instead.