この本文は AI(Claude)が読むための原文(英語または中国語)です。日本語訳は順次追加中。

Media Generation

Handle image generation, image editing, and short video generation through one workflow: choose the right modality, pass caller intent through to the provider, save outputs under tmp/images/ or tmp/videos/, and prefer the bundled helpers over ad-hoc one-off API calls.

Workflow decision

If the user wants a brand-new still image, use an image-generation model.
If the user supplies an image or wants a specific existing image changed, use an image-edit workflow.
If the user wants motion / a clip / a short video, use a video-generation model.
If the request includes one or more reference images, use the helper that supports reference-image transport.

Standard workflow

Determine whether the task is image generation, image editing, or video generation.
Clarify only when required to execute the request correctly.
Prefer scripts/generate_image.py for still-image generation.
Prefer scripts/edit_image.py for direct image edits.
Prefer scripts/mask_inpaint.py for localized edits with masks or generated regions.
Prefer scripts/outpaint_image.py for canvas expansion / outpainting.
Prefer scripts/generate_consistent_media.py when reference images need to be passed through.
Prefer scripts/generate_video.py for video generation, especially when the provider may return async job payloads.
Prefer scripts/generate_batch_media.py for repeatable batch jobs, templated variations, or auditable manifests.
Prefer scripts/object_select_edit.py for simple object-vs-background edits on transparent assets or clean backdrops.
If the provider returns a URL, path, HTML snippet, markdown snippet, data: URL, or b64_json, use scripts/fetch_generated_media.py.
Save outputs under:
- images → tmp/images/
- videos → tmp/videos/
If the user wants files sent in chat, prefer sending the local downloaded file.
Keep the original remote reference as fallback when local retrieval fails.

Prompt handling

Default to prompt pass-through.

Pass the caller's prompt through unchanged.
Use optional request fields only when the caller provides them.
Keep prompt semantics under caller control.

Use the scripts mainly as functional helpers:

normalize arguments
map fields to provider-specific JSON
upload files
poll async jobs
download returned media
save outputs under tmp/images/ or tmp/videos/

Delivery rules

Save generated or edited images in tmp/images/.
Save generated videos in tmp/videos/.
Never scatter generated files in the workspace root.
If message delivery blocks remote URLs, download locally first and then send the local file.
If a remote file cannot be fetched locally but the raw link may still help, provide the original link clearly.

Image generation helper

Use scripts/generate_image.py for direct still-image generation.

Example:

python3 skills/media-generation/scripts/generate_image.py \
  --prompt 'person' \
  --size '1024x1024' \
  --out-dir 'tmp/images' \
  --prefix 'generated'

The helper:

reads provider credentials from OpenClaw config (~/.openclaw/openclaw.json by default, or --config / $OPENCLAW_CONFIG)
calls /images/generations by default
supports size, quality, style, background, n, seed, extra-json, and extra-json-file
downloads the returned image into tmp/images/ by default
handles providers that reply with URL/path, data: URL, or b64_json

Image edit helper

Use scripts/edit_image.py for direct image-edit calls.

Example:

python3 skills/media-generation/scripts/edit_image.py \
  --image 'tmp/images/source.jpg' \
  --prompt 'replace the background' \
  --out-dir 'tmp/images' \
  --prefix 'edited'

The helper:

reads provider credentials from OpenClaw config
calls /images/edits by default
supports optional --mask input for localized edits
downloads the returned image into tmp/images/ by default
handles URL/path, data: URL, or b64_json

Mask inpaint helper

Use scripts/mask_inpaint.py for localized repainting tasks.

Example:

python3 skills/media-generation/scripts/mask_inpaint.py \
  --image 'tmp/images/source.jpg' \
  --x 120 --y 80 --width 220 --height 180 \
  --prompt 'replace the masked area' \
  --out-dir 'tmp/images' \
  --prefix 'mask-result'

The helper:

accepts either an existing --mask image or generated regions
supports rectangle / ellipse regions and repeatable --region specs
supports percentage-based regions like rect-pct / ellipse-pct
supports --expand / --shrink before feathering
supports --mask-only for local preparation / testing without a live API call
forwards --config, --provider, --model, and --endpoint to scripts/edit_image.py
reuses scripts/edit_image.py for the final edit call

Outpaint helper

Use scripts/outpaint_image.py for extension / canvas expansion tasks.

Example:

python3 skills/media-generation/scripts/outpaint_image.py \
  --image 'tmp/images/source.jpg' \
  --left 512 --right 512 --top 128 --bottom 128 \
  --mode blur \
  --prompt 'extend outward' \
  --out-dir 'tmp/images' \
  --prefix 'outpaint-result'

The helper:

expands the canvas locally before calling the model
supports directional expansion on each side
supports transparent, blur, and solid initialization modes
forwards --config, --provider, --model, and --endpoint to scripts/edit_image.py
reuses scripts/edit_image.py for the final edit call

Reference-image helper

Use scripts/generate_consistent_media.py when one or more reference images need to be passed through to the provider.

Note: the script name is historical; its current role is reference-image transport and delegation.

Example:

python3 skills/media-generation/scripts/generate_consistent_media.py \
  --mode image \
  --reference-image 'tmp/images/reference.png' \
  --prompt 'character' \
  --size '1024x1024' \
  --out-dir 'tmp/images' \
  --prefix 'reference-output'

The helper:

can pass encoded reference images in provider JSON (default key: reference_images)
can retry without provider-json references when transport is auto
delegates to scripts/generate_image.py or scripts/generate_video.py

Batch generation helper

Use scripts/generate_batch_media.py when the user wants several related outputs, repeatable batch rendering, or a manifest-driven workflow.

Example:

python3 skills/media-generation/scripts/generate_batch_media.py \
  --manifest 'tmp/images/media-batch.jsonl' \
  --vars-json '{"subject":"item"}' \
  --summary-out 'tmp/images/media-batch-summary.json' \
  --continue-on-error \
  --print-json

The helper supports:

JSON array or JSONL manifests
image generation, video generation, and reference-image generation
shared templating vars via --vars-json or --vars-file
item-local vars objects for per-item string rendering such as {index}
--summary-out to persist the resolved batch result JSON
--dry-run to validate a manifest before spending live generation calls

Object-select edit helper

Use scripts/object_select_edit.py when the source has a transparent background or a simple clean backdrop and the user wants a one-step object or background edit workflow.

Example:

python3 skills/media-generation/scripts/object_select_edit.py \
  --image 'tmp/images/product.png' \
  --selection-mode alpha \
  --edit-target background \
  --prompt 'replace the background' \
  --out-dir 'tmp/images' \
  --prefix 'product-bg-edit'

The helper:

prepares an object/background mask with prepare_object_mask.py
flips the mask automatically when editing the background instead of the object
passes the prepared mask into mask_inpaint.py
supports --prepare-only for local inspection/testing without a live edit call

Video generation helper

Use scripts/generate_video.py for direct video-generation calls.

Example:

python3 skills/media-generation/scripts/generate_video.py \
  --prompt 'motion clip' \
  --size '720x1280' \
  --seconds 6 \
  --out-dir 'tmp/videos' \
  --prefix 'generated-video'

The helper:

reads provider credentials from OpenClaw config
calls /videos by default
supports size, seconds / duration, fps, seed, optional input image, extra-json, and extra-json-file
can resolve both immediate-result and async job responses by polling when the provider returns job metadata instead of the final media directly
downloads the returned video into tmp/videos/ by default

Retrieval helper

Use scripts/fetch_generated_media.py for both images and videos. It can extract downloadable refs from markdown / HTML / JSON, and can also persist data: URLs or b64_json payloads directly to local files.

Quick compatibility checklist

Before blaming the skill, check these first:

config exists and is valid JSON
config.models.providers.<provider> exists
the selected provider has both baseUrl and apiKey
the chosen endpoint actually exists on that provider
the chosen model name is valid for that endpoint
any provider-specific fields passed through --extra-json or --extra-json-file match that provider's schema

Defaults used by the bundled scripts:

config path: ~/.openclaw/openclaw.json or $OPENCLAW_CONFIG
default provider: $OPENCLAW_MEDIA_PROVIDER, otherwise the first provider found in config
default model names: placeholders unless overridden by env vars or --model
- image → $OPENCLAW_MEDIA_IMAGE_MODEL or image-model
- edit → $OPENCLAW_MEDIA_EDIT_MODEL or image-edit-model
- video → $OPENCLAW_MEDIA_VIDEO_MODEL or video-model
output root: tmp/ or $MEDIA_GENERATION_OUTPUT_ROOT
output paths are resolved relative to the current working directory unless you pass an absolute --out-dir

Quick troubleshooting

Common failure patterns:

provider not found → pass --provider explicitly or set $OPENCLAW_MEDIA_PROVIDER
placeholder model warning (image-model / image-edit-model / video-model) → pass --model explicitly or set the matching $OPENCLAW_MEDIA_*_MODEL env var
config not found / invalid JSON → pass --config explicitly or fix the OpenClaw config file
HTTP 404 → check --endpoint and video polling paths
HTTP 400 → check model name and provider-specific payload fields in --extra-json / --extra-json-file
HTTP 401/403 → check the provider apiKey
request failed before HTTP response → check base URL, proxy/TLS, or network reachability
video accepted then failed later → check request payload, provider logs, or switch provider/model

Use --print-json when debugging so the response body, resolved endpoint, and failure hints stay visible.

References

Batch workflow reference: references/batch-workflows.md
Model capability matrix: references/model-capabilities.md
Reference-image workflow: references/reference-image-workflow.md
Image generation helper: scripts/generate_image.py
Reference-image helper: scripts/generate_consistent_media.py
Image edit helper: scripts/edit_image.py
Mask inpaint helper: scripts/mask_inpaint.py
Outpaint helper: scripts/outpaint_image.py
Video generation helper: scripts/generate_video.py
Batch generation helper: scripts/generate_batch_media.py
Object-select edit helper: scripts/object_select_edit.py
Object mask prep helper: scripts/prepare_object_mask.py
Shared request utility: scripts/media_request_common.py
Smoke tests: scripts/smoke_test.py
Unified fetch helper: scripts/fetch_generated_media.py

🎬 メディアGeneration

🇯🇵 日本人クリエイター向け解説

🎯 このSkillでできること

📦 インストール方法 (3ステップ)

💬 こう話しかけるだけ — サンプルプロンプト

Media Generation

Workflow decision

Standard workflow

Prompt handling

Delivery rules

Image generation helper

Image edit helper

Mask inpaint helper

Outpaint helper

Reference-image helper

Batch generation helper

Object-select edit helper

Video generation helper

Retrieval helper

Quick compatibility checklist

Quick troubleshooting

References

同梱ファイル