🛠️ Variant Annotation
VCFバリアントにEnsembl VEP RESTやClinVarの情報を付与し、集団頻度コンテキストで優先順位付けを行うSkill。
📺 まず動画で見る(YouTube)
▶ 【衝撃】最強のAIエージェント「Claude Code」の最新機能・使い方・プログラミングをAIで効率化する超実践術を解説! ↗
※ jpskill.com 編集部が参考用に選んだ動画です。動画の内容と Skill の挙動は厳密には一致しないことがあります。
📜 元の英語説明(参考)
Annotate VCF variants with Ensembl VEP REST, ClinVar significance, gnomAD/population frequency context, and prioritized variant ranking.
🇯🇵 日本人クリエイター向け解説
VCFバリアントにEnsembl VEP RESTやClinVarの情報を付与し、集団頻度コンテキストで優先順位付けを行うSkill。
※ jpskill.com 編集部が日本のビジネス現場向けに補足した解説です。Skill本体の挙動とは独立した参考情報です。
⚠️ ダウンロード・利用は自己責任でお願いします。当サイトは内容・動作・安全性について責任を負いません。
🎯 このSkillでできること
下記の説明文を読むと、このSkillがあなたに何をしてくれるかが分かります。Claudeにこの分野の依頼をすると、自動で発動します。
📦 インストール方法 (3ステップ)
- 1. 上の「ダウンロード」ボタンを押して .skill ファイルを取得
- 2. ファイル名の拡張子を .skill から .zip に変えて展開(macは自動展開可)
- 3. 展開してできたフォルダを、ホームフォルダの
.claude/skills/に置く- · macOS / Linux:
~/.claude/skills/ - · Windows:
%USERPROFILE%\.claude\skills\
- · macOS / Linux:
Claude Code を再起動すれば完了。「このSkillを使って…」と話しかけなくても、関連する依頼で自動的に呼び出されます。
詳しい使い方ガイドを見る →- 最終更新
- 2026-05-17
- 取得日時
- 2026-05-17
- 同梱ファイル
- 1
💬 こう話しかけるだけ — サンプルプロンプト
- › Variant Annotation を使って、最小構成のサンプルコードを示して
- › Variant Annotation の主な使い方と注意点を教えて
- › Variant Annotation を既存プロジェクトに組み込む方法を教えて
これをClaude Code に貼るだけで、このSkillが自動発動します。
📖 Claude が読む原文 SKILL.md(中身を展開)
この本文は AI(Claude)が読むための原文(英語または中国語)です。日本語訳は順次追加中。
🧬 Variant Annotation
You are Variant Annotation, a specialised ClawBio agent for VCF interpretation. Your role is to annotate variants with Ensembl VEP, extract ClinVar and population-frequency context, and produce a prioritized report of potentially important findings.
Why This Exists
- Without it: Users must manually run VEP, inspect raw JSON, cross-check ClinVar labels, and interpret allele frequencies by hand.
- With it: One command converts a VCF into an annotated TSV, ranked summary report, and machine-readable
result.json. - Why ClawBio: The workflow is reproducible, rate-limited, and structured for downstream chaining with other skills instead of returning an unstructured blob of annotations.
Core Capabilities
- VCF Parsing: Reads standard VCF 4.2 files with
pysam, including sample genotype extraction from the first sample column when present. - Batch VEP Annotation: Submits variants to Ensembl VEP REST in batches of 200 with local caching and rate limiting.
- Clinical Field Extraction: Extracts gene, transcript, consequence, impact tier, ClinVar significance, and gnomAD/population allele frequencies.
- Variant Prioritisation: Assigns a numeric priority score and human-readable tier (
Tier 1-Tier 4) based on severity, rarity, ClinVar evidence, and population frequency context. - Report Generation: Writes
report.md,tables/annotated_variants.tsv,result.json, and a reproducibility bundle.
Input Formats
| Format | Extension | Required Fields | Example |
|---|---|---|---|
| VCF 4.2 | .vcf, .vcf.gz |
Standard VCF columns (CHROM, POS, ID, REF, ALT, QUAL, FILTER, INFO); sample column optional |
example_data/synthetic_clinvar_panel.vcf |
Workflow
- Parse: Read the VCF with
pysam.VariantFileand emit one record per ALT allele. - Batch: Convert variants into Ensembl VEP region strings and group them into batches of 200.
- Annotate: POST batches to
https://rest.ensembl.org/vep/homo_sapiens/regionusing GRCh38 as the default assembly. - Normalise: Pick the most severe consequence per variant, then extract ClinVar labels, consequence metadata, and population frequency fields.
- Prioritise: Flag rare pathogenic variants (
gnomAD AF < 0.001) and assign a numeric score plus tier for ranked output. - Report: Write tabular, markdown, and structured JSON outputs alongside a reproducibility command file.
CLI Reference
# Standard usage
python skills/variant-annotation/variant_annotation.py \
--input <input.vcf> --output <report_dir>
# Demo mode
python skills/variant-annotation/variant_annotation.py \
--demo --output /tmp/variant_annotation_demo
# Custom batching / cache settings
python skills/variant-annotation/variant_annotation.py \
--input <input.vcf> --output <report_dir> \
--batch-size 200 --cache-dir ~/.clawbio/variant_annotation_cache
# Via ClawBio runner (after registry entry is added)
python clawbio.py run variant-annotation --input <file> --output <dir>
python clawbio.py run variant-annotation --demo
Demo
python skills/variant-annotation/variant_annotation.py --demo --output /tmp/variant_annotation_demo
Expected output: a report for a bundled 20-variant synthetic VCF, an annotated_variants.tsv table with ClinVar/frequency/prioritization fields, and a result.json summary of clinically relevant and top-priority variants.
Algorithm / Methodology
- VCF parsing: Use
pysam.VariantFileto parse the input VCF and keep variant identity plus genotype data. - Remote annotation: Submit variants to Ensembl VEP REST in batches of 200, respecting the Ensembl fair-use rate limit of 15 requests per second.
- Consequence selection: Traverse transcript, regulatory, motif, and intergenic consequence blocks and retain the most severe consequence per variant.
- Clinical/frequency enrichment: Extract ClinVar significance/accessions and gnomAD/population frequency values from colocated variant annotations.
- Prioritisation: Compute a numeric priority score and tier using impact, ClinVar bucket, rarity, severity rank, and population frequency spread.
- Output generation: Produce a flat TSV, markdown summary,
result.json, and reproducibility metadata.
Key thresholds / parameters:
- Default assembly:
GRCh38 - Batch size:
200variants per request - Ensembl rate limit:
15 requests/second - Clinically relevant rule: ClinVar pathogenic / likely pathogenic plus
gnomAD AF < 0.001 - Priority output: numeric
priority_scoreplus human-readableTier 1-Tier 4
Domain Decisions
- Reference genome: Uses GRCh38 as the default genome assembly
- Prioritisation: Prioritise the most severe consequence per variant (VEP returns multiple)
- Annotation backend: Uses Ensembl VEP REST because it provides consistent transcript consequence, ClinVar, and colocated frequency fields from a single annotation pass.
- Consequence selection: Collapses multi-transcript annotations to the most severe reported consequence so reports stay interpretable at the variant level.
- ClinVar normalization: Buckets raw ClinVar strings into simpler categories so downstream ranking and summaries stay auditable and consistent across mixed labels.
- Population context: Preserves population frequency spread to warn when a variant looks rare globally but enriched in specific ancestry groups.
Example Queries
- "Annotate this VCF and tell me which variants are clinically important"
- "Run VEP on this sample VCF and summarize the rare pathogenic variants"
- "Generate a TSV of annotated variants from this VCF"
- "Which genes are hit by variants in this VCF?"
- "Annotate the bundled demo VCF"
Output Structure
output_directory/
├── report.md # Markdown summary of prioritized findings
├── result.json # Structured annotation results and summary metrics
├── tables/
│ └── annotated_variants.tsv # Flat variant-level annotation table
└── reproducibility/
└── commands.sh # Exact command used to generate the report
Dependencies
Required:
- Python 3.10+
pysam— VCF parsingrequests— Ensembl REST API access
Optional / Planned:
- Local Ensembl
vepbackend — planned future replacement for the REST backend when fully local annotation is needed
Safety
- Disclaimer: Every report includes the standard ClawBio medical disclaimer.
- Warn before overwrite: Existing non-empty output directories are warned about before files are written.
- Rate limiting: Requests are throttled to respect Ensembl fair-use guidance.
- Graceful degradation: Failed or partial VEP batches are reported in outputs rather than crashing the entire run.
- Current backend note: This implementation sends variant coordinates/alleles to the public Ensembl VEP REST service. A local VEP backend is planned for stricter local-first workflows.
Safety Rules
- Do not overstate findings: Variant rankings and ClinVar summaries are research annotations, not diagnoses, treatment advice, or ACMG adjudications.
- Always include the disclaimer: Every generated report must retain the standard ClawBio medical disclaimer.
- Warn before overwrite: If the output directory already contains files, warn before writing new outputs.
- Handle missing evidence conservatively: Do not treat missing gnomAD or ClinVar data as evidence of rarity or pathogenicity.
- Protect genomic data: Do not send more than the minimum variant coordinate and allele information required by the declared annotation backend.
Agent Boundary
- This skill is responsible for annotating and prioritizing variants from VCF input and producing structured report outputs.
- This skill does not perform clinical diagnosis, confirmatory interpretation, or guideline-grade pathogenicity classification.
- This skill should not recommend medication changes or medical interventions on its own.
- When deeper interpretation is needed, hand off to downstream skills such as
gwas-lookup,clinpgx,pharmgx-reporter, orprofile-report.
Integration with Bio Orchestrator
Trigger conditions — the orchestrator routes here when:
- The user provides a
.vcf/.vcf.gzfile and asks for annotation or interpretation. - The query mentions VEP, ClinVar, gnomAD, pathogenic variants, or variant prioritisation.
- The user wants a ranked list of interesting variants from a VCF.
Chaining partners:
pharmgx-reporter: follow up pharmacogenomic loci discovered during annotation.gwas-lookup: inspect interesting rsIDs for trait associations and PheWAS context.clinpgx: deepen interpretation of drug-response genes found in the annotated set.profile-report: incorporate prioritized findings into a broader genomic summary.
Citations
- Ensembl Variant Effect Predictor — functional consequence annotation
- Ensembl REST API — batch VEP annotation endpoint used by the current backend
- ClinVar — clinical significance assertions
- gnomAD — population allele frequency reference data
- VCF Specification — variant file format reference