🎬 動画AI コミュニティ

kling-3-prompting

Kling 3.0というAI動画生成ツールで、テキストや画像から動画、複数シーン、会話場面などを生成する際のプロンプト作成、改善、洗練を支援するSkill。

📜 元の英語説明(参考)

Write better prompts for Kling 3.0 AI video generation. Use when the user wants to create, write, improve, or refine prompts — text-to-video, image-to-video, keyframes, multi-shot sequences, or dialogue scenes.

🇯🇵 日本人クリエイター向け解説

一言でいうと

※ jpskill.com 編集部が日本のビジネス現場向けに補足した解説です。Skill本体の挙動とは独立した参考情報です。

⚡ おすすめ: コマンド1行でインストール(60秒)

下記のコマンドをコピーしてターミナル(Mac/Linux)または PowerShell(Windows)に貼り付けてください。ダウンロード → 解凍 → 配置まで全自動。

🍎 Mac / 🐧 Linux

mkdir -p ~/.claude/skills && cd ~/.claude/skills && curl -L -o kling-3-prompting.zip https://jpskill.com/download/9762.zip && unzip -o kling-3-prompting.zip && rm kling-3-prompting.zip

🪟 Windows (PowerShell)

$d = "$env:USERPROFILE\.claude\skills"; ni -Force -ItemType Directory $d | Out-Null; iwr https://jpskill.com/download/9762.zip -OutFile "$d\kling-3-prompting.zip"; Expand-Archive "$d\kling-3-prompting.zip" -DestinationPath $d -Force; ri "$d\kling-3-prompting.zip"

完了後、Claude Code を再起動 → 普通に「動画プロンプト作って」のように話しかけるだけで自動発動します。

💾 手動でダウンロードしたい(コマンドが難しい人向け)

1. 下の青いボタンを押して kling-3-prompting.zip をダウンロード
2. ZIPファイルをダブルクリックで解凍 → kling-3-prompting フォルダができる
3. そのフォルダを C:\Users\あなたの名前\.claude\skills\(Win)または ~/.claude/skills/(Mac)へ移動
4. Claude Code を再起動

⬇ .zip でダウンロード(推奨) ⬇ .skill 形式(上級者用) 元のソース ↗

⚠️ ダウンロード・利用は自己責任でお願いします。当サイトは内容・動作・安全性について責任を負いません。

🎯 このSkillでできること

下記の説明文を読むと、このSkillがあなたに何をしてくれるかが分かります。Claudeにこの分野の依頼をすると、自動で発動します。

📦 インストール方法 (3ステップ)

1. 上の「ダウンロード」ボタンを押して .skill ファイルを取得
2. ファイル名の拡張子を .skill から .zip に変えて展開(macは自動展開可)
3. 展開してできたフォルダを、ホームフォルダの .claude/skills/ に置く
- · macOS / Linux: ~/.claude/skills/
- · Windows: %USERPROFILE%\.claude\skills\

Claude Code を再起動すれば完了。「このSkillを使って…」と話しかけなくても、関連する依頼で自動的に呼び出されます。

詳しい使い方ガイドを見る →

最終更新: 2026-05-18
取得日時: 2026-05-18
同梱ファイル: 1

📖 Skill本文(日本語訳)

※ 原文(英語/中国語)を Gemini で日本語化したものです。Claude 自身は原文を読みます。誤訳がある場合は原文をご確認ください。

概要

Kling 3.0 は、統合されたマルチモーダルビデオモデルです。キーワードリストではなく、映画的な演出を理解します。監督のようにプロンプトを書きましょう。観客が見て、聞いて、感じることを時間経過とともに記述します。

コアとなる変化: 説明 → 演出。「画像を説明する」のではなく、「シーンを演出する」と考えましょう。

インタラクティブなビルダーワークフロー

起動されたら、AskUserQuestion を使用して、以下のステップをユーザーに案内します。

digraph builder {
  "1. Generation mode?" [shape=diamond];
  "Text-to-Video" [shape=box];
  "Image-to-Video" [shape=box];
  "Multi-Shot Sequence" [shape=box];
  "Keyframe Transition" [shape=box];
  "2. Gather scene details" [shape=box];
  "3. Assemble prompt" [shape=box];
  "4. Present & refine" [shape=box];

  "1. Generation mode?" -> "Text-to-Video";
  "1. Generation mode?" -> "Image-to-Video";
  "1. Generation mode?" -> "Multi-Shot Sequence";
  "1. Generation mode?" -> "Keyframe Transition";
  "Text-to-Video" -> "2. Gather scene details";
  "Image-to-Video" -> "2. Gather scene details";
  "Multi-Shot Sequence" -> "2. Gather scene details";
  "Keyframe Transition" -> "2. Gather scene details";
  "2. Gather scene details" -> "3. Assemble prompt";
  "3. Assemble prompt" -> "4. Present & refine";
}

ステップ 1: 生成モードの決定

ユーザーにどのモードにするか尋ねます。

Text-to-Video — プロンプトをゼロから作成
Image-to-Video — 参照画像をアニメーション化
Multi-Shot Sequence — 2〜6 ショットのストーリーボード (最大 15 秒)
Keyframe Transition — 開始フレーム → 終了フレームを補間されたモーションでつなぐ

ステップ 2: シーンの詳細の収集

各要素について質問します (モードに合わせて質問を調整します)。

要素	質問	重要な理由
Subject	誰/何に焦点を当てますか？具体的な外観の詳細は何ですか？	一貫性を保つためのアンカー — 際立った特徴を早期に定義する
Action	何が起こりますか？タイムラインを記述してください (最初 → 次に → 最後に)	Kling 3.0 は、15 秒間の連続したアクションに優れています
Environment	どこですか？具体的に (「通り」ではなく「狭い東京の路地、格子からの蒸気」のように)	シーンを物理的に固定する
Camera	ショットの種類と動きは？ (以下のカメラのリファレンスを参照)	映画的な言語は、はるかに良い結果を生み出します
Lighting	どのような光源ですか？具体的に名前を挙げてください	「ちらつくネオン」は「劇的な照明」よりも優れています
Mood/Emotion	観客にどのような感情を抱かせたいですか？	カラーグレーディング、ペース、音楽を左右する
Audio	セリフは？環境音は？音楽は？	Kling 3.0 はネイティブオーディオ + リップシンクを生成します
Duration	どのくらいの長さですか？ (3〜15 秒)	長いほど、時間経過に伴う進行を記述します
Aspect Ratio	16:9 / 9:16 / 1:1 / 21:9?	16:9 は映画的、9:16 はソーシャル、21:9 はウルトラワイド

Image-to-Video: シーンが画像からどのように進化するかに焦点を当てます — 動き、カメラの動き、環境の変化。モデルはソースからのアイデンティティ/レイアウトを保持します。

Keyframes: 開始フレームと終了フレームの説明を求めます。フレームは、色、スタイル、照明が一致している必要があります。プロンプトは控えめに — Kling はモーションをうまく推測します。

Multi-Shot: 各ショットを、独自のフレーミング、被写体、アクション、およびデュレーションで個別に定義します。ショットを明示的にラベル付けします。

ステップ 3: プロンプトの組み立て

マスターフォーミュラを使用します。

[シーン/環境] + [被写体と外観] + [アクションのタイムライン] + [カメラの動き] + [オーディオと雰囲気] + [技術仕様]

記述ルール:

映画的なモーション動詞を使用します: dolly push, whip-pan, crash zoom, rack focus, tracking shot — 「moves」や「goes」は使用しません
実際の光源の名前を挙げます: neon signs, candlelight, golden hour, LED panels — 「劇的な照明」は使用しません
信頼性を高めるためにテクスチャを含めます: grain, lens flares, condensation, fabric sheen, smoke, sweat
時間的な流れを記述します: beginning → middle → end
1 ショットあたり 1〜3 つの豊かな文章に抑えます (長さよりも具体性)
セリフの場合: キャラクターラベルを使用し、声のトーン/感情を割り当て、移行語 ("Immediately," "Pause") を使用します

ステップ 4: 提示と洗練

組み立てられたプロンプトを提示します。以下を行うかどうかを尋ねます。

いずれかの要素を調整する
ネガティブプロンプトを追加する
バリエーションを生成する (異なるデュレーション、異なるカメラ、異なるムード)

クイックリファレンス

カメラの動き

動き	効果	フレーズの例
Dolly push-in	親密さ/緊張感を高める	"slow dolly push-in toward her face"
Dolly zoom	めまい/劇的な暴露	"dolly zoom creating disorienting depth shift"
Tracking shot	被写体を横方向に追跡する	"camera tracks alongside as she walks"
Whip-pan	エネルギー/驚き	"whip-pan to reveal the door"
Crash zoom	ショック/強調	"sudden crash zoom on the object"
Rack focus	注意をそらす	"rack focus from foreground hand to background figure"
Handheld/shoulder-cam	生/ドキュメンタリーのような雰囲気	"handheld shoulder-cam with subtle sway"
Static tripod	構成された/観察的な	"locked-off static tripod, wide shot"
FPV drone	ハイエネルギーな没入感	"dynamic FPV drone shot chasing through corridor"
Low-angle tracking	英雄的/威圧的	"low-angle tracking shot, subject towers above"
Truck left/right	横方向の暴露	"camera trucks right revealing the cityscape"
Tilt up/down	垂直方向の暴露	"slow tilt up from boots to face"

レンズとフィルムストック

フレーズ	効果
"Shot on 35mm film"	温かい粒子感、有機的なテクスチャ
"Macro 85mm lens"	細部の描写、浅い被写界深度
"Wide-angle steadicam"	スムーズ、没入感、空間的
"Handheld camcorder"	生の VHS のエネルギー、ノスタルジック
"Anamorphic lens flare"	映画的な水平方向の筋

照明

形容詞ではなく、特定の光源を使用します。

"Golden hour sun cutting through dusty warehouse windows"
"Flickering neon casting magenta/cyan across wet pavement"
"Single bare bulb swinging, casting moving shadows"
"Cool blue LED panels reflecting off glass surfaces"
"Candlelight warming skin tones, deep shadows beyond"

色とグレーディング

"Desaturated teal grade, crushed blacks"
"Amber nightclub strobe cutting through smoke"
"Cool blu

📜 原文 SKILL.md(Claudeが読む英語/中国語)を展開

Overview

Kling 3.0 is a unified multimodal video model. It understands cinematic direction, not keyword lists. Write prompts like a director — describe what the audience sees, hears, and feels over time.

Core shift: Description → Direction. Think "direct a scene" not "describe an image."

Interactive Builder Workflow

When invoked, guide the user through these steps using AskUserQuestion:

digraph builder {
  "1. Generation mode?" [shape=diamond];
  "Text-to-Video" [shape=box];
  "Image-to-Video" [shape=box];
  "Multi-Shot Sequence" [shape=box];
  "Keyframe Transition" [shape=box];
  "2. Gather scene details" [shape=box];
  "3. Assemble prompt" [shape=box];
  "4. Present & refine" [shape=box];

  "1. Generation mode?" -> "Text-to-Video";
  "1. Generation mode?" -> "Image-to-Video";
  "1. Generation mode?" -> "Multi-Shot Sequence";
  "1. Generation mode?" -> "Keyframe Transition";
  "Text-to-Video" -> "2. Gather scene details";
  "Image-to-Video" -> "2. Gather scene details";
  "Multi-Shot Sequence" -> "2. Gather scene details";
  "Keyframe Transition" -> "2. Gather scene details";
  "2. Gather scene details" -> "3. Assemble prompt";
  "3. Assemble prompt" -> "4. Present & refine";
}

Step 1: Determine Generation Mode

Ask the user which mode:

Text-to-Video — prompt from scratch
Image-to-Video — animate a reference image
Multi-Shot Sequence — 2-6 shot storyboard (up to 15s)
Keyframe Transition — start frame → end frame with interpolated motion

Step 2: Gather Scene Details

Ask about each element (adapt questions to mode):

Element	Question	Why it matters
Subject	Who/what is the focus? Specific appearance details?	Anchors consistency — define distinguishing traits early
Action	What happens? Describe the timeline (first → then → finally)	Kling 3.0 excels at sequential action over 15s arcs
Environment	Where? Be specific (not "a street" but "narrow Tokyo alley, steam from grates")	Grounds the scene physically
Camera	Shot type and movement? (See camera reference below)	Cinematic language produces far better results
Lighting	What light sources? Name them specifically	"Flickering neon" beats "dramatic lighting"
Mood/Emotion	What should the audience feel?	Drives color grade, pacing, music
Audio	Dialogue? Ambient sound? Music?	Kling 3.0 generates native audio + lip-sync
Duration	How long? (3-15s)	Longer = describe progression over time
Aspect Ratio	16:9 / 9:16 / 1:1 / 21:9?	16:9 cinematic, 9:16 social, 21:9 ultra-wide

Image-to-Video: Focus on how the scene evolves from the image — movement, camera motion, environmental change. The model preserves identity/layout from the source.

Keyframes: Ask for start and end frame descriptions. Frames should match in color, style, and lighting. Prompt sparingly — Kling infers motion well.

Multi-Shot: Define each shot separately with its own framing, subject, action, and duration. Label shots explicitly.

Step 3: Assemble the Prompt

Use the Master Formula:

[Scene/Environment] + [Subject & Appearance] + [Action Timeline] + [Camera Movement] + [Audio & Atmosphere] + [Technical Specs]

Writing rules:

Use cinematic motion verbs: dolly push, whip-pan, crash zoom, rack focus, tracking shot — NOT "moves" or "goes"
Name real light sources: neon signs, candlelight, golden hour, LED panels — NOT "dramatic lighting"
Include texture for credibility: grain, lens flares, condensation, fabric sheen, smoke, sweat
Describe temporal flow: beginning → middle → end
Keep to 1-3 rich sentences per shot (specificity > length)
For dialogue: use character labels, assign voice tone/emotion, use transitional words ("Immediately," "Pause")

Step 4: Present & Refine

Present the assembled prompt. Ask if they want to:

Adjust any element
Add a negative prompt
Generate variations (different duration, different camera, different mood)

Quick Reference

Camera Movements

Movement	Effect	Example phrase
Dolly push-in	Builds intimacy/tension	"slow dolly push-in toward her face"
Dolly zoom	Vertigo/dramatic reveal	"dolly zoom creating disorienting depth shift"
Tracking shot	Follows subject laterally	"camera tracks alongside as she walks"
Whip-pan	Energy/surprise	"whip-pan to reveal the door"
Crash zoom	Shock/emphasis	"sudden crash zoom on the object"
Rack focus	Shift attention	"rack focus from foreground hand to background figure"
Handheld/shoulder-cam	Raw/documentary feel	"handheld shoulder-cam with subtle sway"
Static tripod	Composed/observational	"locked-off static tripod, wide shot"
FPV drone	High-energy immersion	"dynamic FPV drone shot chasing through corridor"
Low-angle tracking	Heroic/imposing	"low-angle tracking shot, subject towers above"
Truck left/right	Lateral reveal	"camera trucks right revealing the cityscape"
Tilt up/down	Vertical reveal	"slow tilt up from boots to face"

Lens & Film Stock

Phrase	Effect
"Shot on 35mm film"	Warm grain, organic texture
"Macro 85mm lens"	Tight detail, shallow depth of field
"Wide-angle steadicam"	Smooth, immersive, spatial
"Handheld camcorder"	Raw VHS energy, nostalgic
"Anamorphic lens flare"	Cinematic horizontal streaks

Lighting

Use specific sources, not adjectives:

"Golden hour sun cutting through dusty warehouse windows"
"Flickering neon casting magenta/cyan across wet pavement"
"Single bare bulb swinging, casting moving shadows"
"Cool blue LED panels reflecting off glass surfaces"
"Candlelight warming skin tones, deep shadows beyond"

Color & Grade

"Desaturated teal grade, crushed blacks"
"Amber nightclub strobe cutting through smoke"
"Cool blue haze filling the corridor"
"Magenta neon reflecting off wet asphalt"
"Overexposed highlights, blown-out whites"

Multi-Character Dialogue

Rule	Do	Don't
Name characters	`[Character A: Silver-haired CEO]`	`[Man] says...`
Anchor to action	Agent slams table. [Agent, angrily]: "Where is it?"	Just dialogue without visual action
Assign voice tone	`[CEO, deep authoritative gravelly voice]`	Generic "says"
Control timing	"Immediately," "Pause," "After a beat"	Back-to-back dialogue without transitions

Multi-Shot Structure

Shot 1 (0-5s): [Wide establishing shot description]
Shot 2 (5-10s): [Medium/close-up with action progression]
Shot 3 (10-15s): [Resolution/reaction with camera payoff]

Atmosphere: [Overall mood, color grade]
Audio: [Sound design, music, dialogue]

Label every shot. Assign durations. Describe framing + subject + motion per shot.

Start & End Frame Tips

Frames should match in color palette, style, and lighting
Identical start/end frames = seamless loop
Prompt sparingly — Kling infers motion between frames well
Simple camera directions: zoom in/out, pan left/right, tilt up/down
5s for dynamic transitions, 10s for complex transformations
Start frame aspect ratio drives the whole clip

Negative Prompts

Use to prevent common AI defaults:

smiling, laughing, cartoonish, bright saturated colors, low resolution,
morphing, blurry text, disfigured hands, extra fingers, static pose,
frozen expression, stock photo aesthetic

Customize based on scene — remove items that conflict with your intent.

Weak → Strong

Element	Weak	Strong
Camera	"Camera follows person"	"Handheld shoulder-cam drifts behind subject with subtle sway"
Subject	"A woman walking"	"Woman in red dress, heels clicking wet cobblestone"
Environment	"In a city"	"Narrow Tokyo alley, steam from grates, glowing vending machines"
Lighting	"Dramatic lighting"	"Flickering neon casting magenta/cyan across wet pavement"
Texture	"It looks realistic"	"Rain beading on leather jacket, condensation on glass, visible breath"
Motion	"She walks away"	"She turns slowly, hair catches light, disappears around corner"

Common Mistakes

Mistake	Fix
Keyword lists instead of scene direction	Write like directing a shot: subject + action + camera + environment
Vague motion ("moves," "goes")	Use cinematic verbs: dolly, track, whip-pan, crash zoom
Generic lighting ("dramatic")	Name the source: neon, candle, golden hour, LED panel
Overlong prompts	1-3 rich sentences per shot; specificity > length
No temporal progression	Describe beginning → middle → end of the shot
Mismatched keyframes	Match color, lighting, and style between start/end frames
Unattributed dialogue	Label every speaker with name, tone, and emotion
Cramming multi-shot into one paragraph	Separate and label each shot with duration