🎬 動画AI コミュニティ

seedance-v2

ByteDance Seedance 2.0 Proを活用し、複数画像や動画、音声データを参照しながら、自然なリップシンクと映画のような動きで4〜15秒の短尺動画を生成できるSkill。

📜 元の英語説明(参考)

Generate cinematic short-form video with ByteDance Seedance 2.0 Pro on RunComfy. Documents Seedance 2.0 Pro's strengths (multi-modal references — up to 9 images, 3 videos, 3 audio — synchronized in-pass audio with natural lip-sync, cinematic motion refinement), the 4–15s duration schema, and when to route to HappyHorse 1.0 / Wan 2.7 / Kling instead. Calls `runcomfy run bytedance/seedance-v2/pro` through the local RunComfy CLI. Triggers on "seedance", "seedance 2", "seedance v2", "seedance pro", "bytedance video", or any explicit ask to generate video with this model.

🇯🇵 日本人クリエイター向け解説

一言でいうと

※ jpskill.com 編集部が日本のビジネス現場向けに補足した解説です。Skill本体の挙動とは独立した参考情報です。

⚡ おすすめ: コマンド1行でインストール(60秒)

下記のコマンドをコピーしてターミナル(Mac/Linux)または PowerShell(Windows)に貼り付けてください。ダウンロード → 解凍 → 配置まで全自動。

🍎 Mac / 🐧 Linux

mkdir -p ~/.claude/skills && cd ~/.claude/skills && curl -L -o seedance-v2.zip https://jpskill.com/download/10376.zip && unzip -o seedance-v2.zip && rm seedance-v2.zip

🪟 Windows (PowerShell)

$d = "$env:USERPROFILE\.claude\skills"; ni -Force -ItemType Directory $d | Out-Null; iwr https://jpskill.com/download/10376.zip -OutFile "$d\seedance-v2.zip"; Expand-Archive "$d\seedance-v2.zip" -DestinationPath $d -Force; ri "$d\seedance-v2.zip"

完了後、Claude Code を再起動 → 普通に「動画プロンプト作って」のように話しかけるだけで自動発動します。

💾 手動でダウンロードしたい(コマンドが難しい人向け)

1. 下の青いボタンを押して seedance-v2.zip をダウンロード
2. ZIPファイルをダブルクリックで解凍 → seedance-v2 フォルダができる
3. そのフォルダを C:\Users\あなたの名前\.claude\skills\(Win)または ~/.claude/skills/(Mac)へ移動
4. Claude Code を再起動

⬇ .zip でダウンロード(推奨) ⬇ .skill 形式(上級者用) 元のソース ↗

⚠️ ダウンロード・利用は自己責任でお願いします。当サイトは内容・動作・安全性について責任を負いません。

🎯 このSkillでできること

下記の説明文を読むと、このSkillがあなたに何をしてくれるかが分かります。Claudeにこの分野の依頼をすると、自動で発動します。

📦 インストール方法 (3ステップ)

1. 上の「ダウンロード」ボタンを押して .skill ファイルを取得
2. ファイル名の拡張子を .skill から .zip に変えて展開(macは自動展開可)
3. 展開してできたフォルダを、ホームフォルダの .claude/skills/ に置く
- · macOS / Linux: ~/.claude/skills/
- · Windows: %USERPROFILE%\.claude\skills\

Claude Code を再起動すれば完了。「このSkillを使って…」と話しかけなくても、関連する依頼で自動的に呼び出されます。

詳しい使い方ガイドを見る →

最終更新: 2026-05-18
取得日時: 2026-05-18
同梱ファイル: 1

📖 Skill本文(日本語訳)

※ 原文(英語/中国語)を Gemini で日本語化したものです。Claude 自身は原文を読みます。誤訳がある場合は原文をご確認ください。

Seedance 2.0 Pro — RunComfy上のPro Pack

runcomfy.com · Seedance 2.0 Pro · GitHub

ByteDance Seedance 2.0 Pro — ネイティブなリップシンクオーディオを備えたマルチモーダルなシネマティックビデオジェネレーター — RunComfy Model API上でホストされています。

npx skills add agentspace-so/runcomfy-skills --skill seedance-v2 -g

このモデルを選ぶべき時 (類似モデルとの比較)

Seedance 2.0 Proの際立った強みは、マルチモーダルなシネマティックショートフォームです。キャラクター画像 + シーンビデオ + リファレンスオーディオを組み合わせて、一貫性のあるショットを作成します。リファレンスのアイデンティティ/シーンへの忠実さが重要で、ネイティブなリップシンクが必要な場合に選択してください。

必要なもの	使用するモデル
リップシンクされたスポークスパーソン/対話広告	Seedance 2.0 Pro
マルチモーダルなリファレンス (画像 + ビデオ + オーディオ)	Seedance 2.0 Pro
ブランド一貫性のある多言語ナラティブ	Seedance 2.0 Pro
現在#1のブラインド投票によるビデオ品質	HappyHorse 1.0
独自のトラックからの音声駆動リップシンク	Wan 2.7 (`audio_url`)
既存の映像に対するモーション編集	Kling Video O1
超高速なイテレーション	LTX 2

ユーザーが明示的に "Seedance" / "Seedance 2" / "ByteDance video" と言った場合は、必ずここにルーティングしてください。

前提条件

RunComfy CLI — npm i -g @runcomfy/cli
RunComfy アカウント — runcomfy login はブラウザのデバイスコードフローを開きます。
CI / コンテナ — runcomfy login の代わりに RUNCOMFY_TOKEN=<token> を設定します。

エンドポイント + 入力スキーマ

`bytedance/seedance-v2/pro`

フィールド	タイプ	必須	デフォルト	注
`prompt`	string	yes	—	CN ≤ 500 文字 OR EN ≤ 1000 語。
`image_url`	array	no	`[]`	0–9 個のリファレンス (JPEG/PNG/WebP/BMP/TIFF/GIF)。
`video_url`	array	no	`[]`	0–3 個のクリップ (MP4/MOV)、各 2–15 秒。
`audio_url`	array	no	`[]`	0–3 個のオーディオリファレンス (WAV/MP3)、各 2–15 秒、< 15MB。
`aspect_ratio`	enum	no	`adaptive`	`adaptive`, `16:9`, `9:16`, `4:3`, `3:4`, `1:1`, `21:9`。
`duration`	int	no	5	4–15 (整数秒)。
`resolution`	enum	no	`720p`	`480p` または `720p`。
`generate_audio`	bool	no	true	インパス同期されたスピーチ/SFX/音楽。
`seed`	int	no	—	再現性。

呼び出し方法

デフォルト (テキストのみ、5秒、720p、オーディオあり):

runcomfy run bytedance/seedance-v2/pro \
  --input '{"prompt": "<user prompt>"}' \
  --output-dir <absolute/path>

キャラクターリファレンス付きのリップシンク広告 (画像は安定、テキストは進化):

runcomfy run bytedance/seedance-v2/pro \
  --input '{
    "prompt": "ミディアムクローズアップ。女性が今日のスペシャルを温かくフレンドリーな口調で説明し、ゆっくりとプッシュイン、柔らかな窓の光、穏やかなカフェの雰囲気。",
    "image_url": ["https://.../barista-headshot.jpg"],
    "duration": 8,
    "aspect_ratio": "9:16"
  }' \
  --output-dir <absolute/path>

マルチモーダル (画像 + ビデオ + オーディオリファレンス):

runcomfy run bytedance/seedance-v2/pro \
  --input '{
    "prompt": "画像1の被写体がビデオ1のカフェを歩き、声のトーンはオーディオ1と一致します。",
    "image_url": ["https://.../subject.jpg"],
    "video_url": ["https://.../cafe-locked-shot.mp4"],
    "audio_url": ["https://.../voice-ref.mp3"]
  }' \
  --output-dir <absolute/path>

CLIは、送信、ポーリング、結果の取得を行い、*.runcomfy.net/*.runcomfy.com URLを--output-dirにダウンロードします。

プロンプト — 実際に効果があるもの

画像とテキストの分割。 これは最も重要なルールです。安定したアイデンティティ (顔、衣装、ブランドマーク、ロゴ) → image_url に入れます。進化するナラティブ (アクション、ムード、照明、カメラ) → prompt に入れます。顔を詳細に言葉で説明しようとすると、トークンが無駄になり、ずれが生じます。

平易な言葉でのカメラ + モーション。 "Medium close-up"、"slow push-in"、"handheld follow"、"locked-off wide" はすべて指示として機能します。組み合わせる: "Medium close-up. Slow push-in over 3 seconds. Handheld, slight breathing motion."

generate_audio: true を使用したオーディオディレクション — トーンを伝えます: "warm friendly conversational"、"calm instructional"、"crisp newsroom delivery"。アンビエントの場合: "gentle cafe chatter, distant traffic, no foreground music"。

リファレンスメディアの仕様 — ビデオは 2–15 秒である必要があります。オーディオは ≤15MB で、2–15 秒である必要があります。範囲外のファイルは拒否されます。クロップを避けるために、リファレンスのアスペクト比を出力と一致させます。

アンチパターン:

根本的に異なる美的リファレンス (水彩 + フォトリアル) を混ぜる → 混乱します。
プロンプト内の矛盾するスタイルの手がかり → 矛盾を取り除くことで簡素化します。
安定したアイデンティティを言葉で説明しようとする → 代わりに image_url を使用します。
15 秒のクリップを要求する → 422; 複数の呼び出しに分割します。

強みを発揮する場所

ユースケース	Seedance 2.0 Pro を使う理由
スポークスパーソン/対話広告	ネイティブなインパスリップシンク、個別の TTS ステップは不要
ブランド一貫性のある多言語ナラティブ	画像リファレンスがアイデンティティを保持し、テキストが翻訳を推進
シネマティックショートフォームフィルムのプレビズ	カメラショットの文法 + マルチモーダルリファレンス
リファレンス音楽/VOトーンを使用した広告クリエイティブ	オーディオリファレンスがリップシンクをロックせずに声/ムードをガイド
再現可能なバリアントテスト	シード制御 + 固定スキーマ

サンプルプロンプト (強力な結果を生み出すことが確認されています)

デフォルトのプレイグラウンドの例:

静かなカフェのテラスでのゴールデンアワー：バリスタがカウンターを拭き、
顔を上げて今日のスペシャルをフレンドリーな口調で説明し、自然な
リップシンク。ミディアムクローズアップ、ゆっくりとしたプッシュイン。暖かいサイドライト、ガラス越しのソフトなボケ
穏やかなカフェの雰囲気と微妙なフィルムグレイン。

マルチモーダルリップシンク (テキスト + 画像):


ソフトな照明の録音ブースにいる画像1と同じ人物が、
マイクに身を乗り出して、「今年最大のアップデートをリリースしました」と言います。
穏やかな会話調。ミディアムクローズアップ、ロックされた三脚、浅いDOF、
暖かいキーライト

(原文がここで切り詰められています)

📜 原文 SKILL.md(Claudeが読む英語/中国語)を展開

Seedance 2.0 Pro — Pro Pack on RunComfy

runcomfy.com · Seedance 2.0 Pro · GitHub

ByteDance Seedance 2.0 Pro — multimodal cinematic video generator with native lip-synced audio — hosted on the RunComfy Model API.

npx skills add agentspace-so/runcomfy-skills --skill seedance-v2 -g

When to pick this model (vs siblings)

Seedance 2.0 Pro's distinct strength is multi-modal cinematic short-form: combine character images + scene videos + reference audio into one coherent shot. Pick it when fidelity to a reference identity / scene matters and you want native lip-sync.

You want	Use
Lip-synced spokesperson / dialogue ad	Seedance 2.0 Pro
Multi-modal references (image + video + audio)	Seedance 2.0 Pro
Brand-consistent multi-language narrative	Seedance 2.0 Pro
Currently-#1 blind-vote video quality	HappyHorse 1.0
Audio-driven lip-sync from your own track	Wan 2.7 (`audio_url`)
Motion editing on existing footage	Kling Video O1
Ultra-fast iteration	LTX 2

If the user said "Seedance" / "Seedance 2" / "ByteDance video" explicitly, route here regardless.

Prerequisites

RunComfy CLI — npm i -g @runcomfy/cli
RunComfy account — runcomfy login opens a browser device-code flow.
CI / containers — set RUNCOMFY_TOKEN=<token> instead of runcomfy login.

Endpoints + input schema

`bytedance/seedance-v2/pro`

Field	Type	Required	Default	Notes
`prompt`	string	yes	—	CN ≤ 500 chars OR EN ≤ 1000 words.
`image_url`	array	no	`[]`	0–9 references (JPEG/PNG/WebP/BMP/TIFF/GIF).
`video_url`	array	no	`[]`	0–3 clips (MP4/MOV), 2–15s each.
`audio_url`	array	no	`[]`	0–3 audio refs (WAV/MP3), 2–15s, < 15MB each.
`aspect_ratio`	enum	no	`adaptive`	`adaptive`, `16:9`, `9:16`, `4:3`, `3:4`, `1:1`, `21:9`.
`duration`	int	no	5	4–15 (whole seconds).
`resolution`	enum	no	`720p`	`480p` or `720p`.
`generate_audio`	bool	no	true	In-pass synchronized speech / SFX / music.
`seed`	int	no	—	Reproducibility.

How to invoke

Default (text only, 5s, 720p with audio):

runcomfy run bytedance/seedance-v2/pro \
  --input '{"prompt": "<user prompt>"}' \
  --output-dir <absolute/path>

Lip-synced ad with character reference (image-stable, text-evolves):

runcomfy run bytedance/seedance-v2/pro \
  --input '{
    "prompt": "Medium close-up. The woman explains today'\''s special in a warm friendly tone, slow push-in, soft window light, gentle cafe ambience.",
    "image_url": ["https://.../barista-headshot.jpg"],
    "duration": 8,
    "aspect_ratio": "9:16"
  }' \
  --output-dir <absolute/path>

Multi-modal (image + video + audio refs):

runcomfy run bytedance/seedance-v2/pro \
  --input '{
    "prompt": "Subject from image 1 walks through the café from video 1, voice tone matches audio 1.",
    "image_url": ["https://.../subject.jpg"],
    "video_url": ["https://.../cafe-locked-shot.mp4"],
    "audio_url": ["https://.../voice-ref.mp3"]
  }' \
  --output-dir <absolute/path>

The CLI submits, polls, fetches the result, downloads *.runcomfy.net/*.runcomfy.com URLs into --output-dir.

Prompting — what actually works

Image vs text division. This is the single most important rule. Stable identity (face, costume, brand mark, logo) → put in image_url. Evolving narrative (action, mood, lighting, camera) → put in prompt. Trying to verbally describe a face in detail wastes tokens and produces drift.

Camera + motion in plain language. "Medium close-up", "slow push-in", "handheld follow", "locked-off wide" all work as directives. Combine: "Medium close-up. Slow push-in over 3 seconds. Handheld, slight breathing motion."

Audio direction with generate_audio: true — say the tone: "warm friendly conversational", "calm instructional", "crisp newsroom delivery". For ambient: "gentle cafe chatter, distant traffic, no foreground music".

Reference media specs — videos must be 2–15s; audio must be ≤15MB and 2–15s. Out-of-range files reject. Match aspect ratio of refs to your output to avoid crops.

Anti-patterns:

Mixing radically different aesthetic refs (watercolor + photoreal) → confuses.
Conflicting style cues in prompt → simplify by removing contradictions.
Trying to describe stable identity verbally → use image_url instead.
Asking for >15s clips → 422; segment into multiple calls.

Where it shines

Use case	Why Seedance 2.0 Pro
Spokesperson / dialogue ads	Native in-pass lip-sync, no separate TTS step
Brand-consistent multi-language narratives	Image refs hold identity; text drives translation
Cinematic short-form film previs	Camera-shot grammar + multi-modal refs
Ad creatives with reference music / VO tone	Audio refs guide voice / mood without locking lip-sync
Reproducible variant testing	Seed control + fixed schema

Sample prompts (verified to produce strong results)

Default playground example:

Golden hour on a quiet cafe terrace: a barista wipes the counter, then
looks up and explains today's special in a friendly tone, natural
lip-sync. Medium close-up, slow push-in; warm side light, soft bokeh
through glass, gentle cafe ambience and subtle film grain.

Multi-modal lip-sync (text + image):

Same person as image 1 in a softly-lit recording booth, leaning into
the mic, says: "We just shipped the biggest update of the year."
Calm conversational tone. Medium close-up, locked tripod, shallow DOF,
warm key light from camera-left.

Limitations

Duration 4–15s — no longer clips on this endpoint.
Resolution ceiling 720p on the playground variant.
Reference media specs — videos / audio must be 2–15s; audio < 15MB.
Lip-sync quality — depends on prompt clarity; not guaranteed perfect under all conditions.
No @-syntax for character binding — relies on image refs + prompt alignment.

Exit codes

code	meaning
0	success
64	bad CLI args
65	bad input JSON / schema mismatch
69	upstream 5xx
75	retryable: timeout / 429
77	not signed in or token rejected

Full reference: docs.runcomfy.com/cli/troubleshooting.

How it works

The skill invokes runcomfy run bytedance/seedance-v2/pro with a JSON body matching the schema. The CLI POSTs to https://model-api.runcomfy.net/v1/models/bytedance/seedance-v2/pro, polls the request, fetches the result, and downloads any .runcomfy.net/.runcomfy.com URL into --output-dir. Ctrl-C cancels the remote request before exit.

Security & Privacy

Token storage: runcomfy login writes the API token to ~/.config/runcomfy/token.json with mode 0600 (owner-only read/write). Set RUNCOMFY_TOKEN env var to bypass the file entirely in CI / containers.
Input boundary: the user prompt is passed as a JSON string to the CLI via --input. The CLI does NOT shell-expand the prompt; it transmits the JSON body directly to the Model API over HTTPS. No shell injection surface from prompt content.
Third-party content: image / mask / video URLs you pass are fetched by the RunComfy model server, not by the CLI on your machine. Treat external URLs as untrusted; image-based prompt injection is a known risk for any image-edit / video-edit model.
Outbound endpoints: only model-api.runcomfy.net (request submission) and *.runcomfy.net / *.runcomfy.com (download whitelist for generated outputs). No telemetry, no callbacks.
Generated-file size cap: the CLI aborts any single download > 2 GiB to prevent disk-fill from a malicious or runaway model output.