🛠️ 開発・MCP コミュニティ

ai-infrastructure-replicate

TypeScript/Node.jsでReplicate SDKを使い、モデルの実行、ストリーミング、Webhook設定、ファイル管理、バージョン管理、デプロイ、トレーニングなど、AI基盤構築に必要な機能を効率的に実装するSkill。

📜 元の英語説明(参考)

Replicate SDK patterns for TypeScript/Node.js -- client setup, predictions, streaming, webhooks, file handling, model versioning, deployments, and training

🇯🇵 日本人クリエイター向け解説

一言でいうと

※ jpskill.com 編集部が日本のビジネス現場向けに補足した解説です。Skill本体の挙動とは独立した参考情報です。

⚡ おすすめ: コマンド1行でインストール(60秒)

下記のコマンドをコピーしてターミナル(Mac/Linux)または PowerShell(Windows)に貼り付けてください。ダウンロード → 解凍 → 配置まで全自動。

🍎 Mac / 🐧 Linux

mkdir -p ~/.claude/skills && cd ~/.claude/skills && curl -L -o ai-infrastructure-replicate.zip https://jpskill.com/download/10209.zip && unzip -o ai-infrastructure-replicate.zip && rm ai-infrastructure-replicate.zip

🪟 Windows (PowerShell)

$d = "$env:USERPROFILE\.claude\skills"; ni -Force -ItemType Directory $d | Out-Null; iwr https://jpskill.com/download/10209.zip -OutFile "$d\ai-infrastructure-replicate.zip"; Expand-Archive "$d\ai-infrastructure-replicate.zip" -DestinationPath $d -Force; ri "$d\ai-infrastructure-replicate.zip"

完了後、Claude Code を再起動 → 普通に「動画プロンプト作って」のように話しかけるだけで自動発動します。

💾 手動でダウンロードしたい(コマンドが難しい人向け)

1. 下の青いボタンを押して ai-infrastructure-replicate.zip をダウンロード
2. ZIPファイルをダブルクリックで解凍 → ai-infrastructure-replicate フォルダができる
3. そのフォルダを C:\Users\あなたの名前\.claude\skills\(Win)または ~/.claude/skills/(Mac)へ移動
4. Claude Code を再起動

⬇ .zip でダウンロード(推奨) ⬇ .skill 形式(上級者用) 元のソース ↗

⚠️ ダウンロード・利用は自己責任でお願いします。当サイトは内容・動作・安全性について責任を負いません。

🎯 このSkillでできること

下記の説明文を読むと、このSkillがあなたに何をしてくれるかが分かります。Claudeにこの分野の依頼をすると、自動で発動します。

📦 インストール方法 (3ステップ)

1. 上の「ダウンロード」ボタンを押して .skill ファイルを取得
2. ファイル名の拡張子を .skill から .zip に変えて展開(macは自動展開可)
3. 展開してできたフォルダを、ホームフォルダの .claude/skills/ に置く
- · macOS / Linux: ~/.claude/skills/
- · Windows: %USERPROFILE%\.claude\skills\

Claude Code を再起動すれば完了。「このSkillを使って…」と話しかけなくても、関連する依頼で自動的に呼び出されます。

詳しい使い方ガイドを見る →

最終更新: 2026-05-18
取得日時: 2026-05-18
同梱ファイル: 1

📖 Skill本文(日本語訳)

※ 原文(英語/中国語)を Gemini で日本語化したものです。Claude 自身は原文を読みます。誤訳がある場合は原文をご確認ください。

Replicate SDK のパターン

クイックガイド: replicate npm パッケージを使用して、サーバーレス GPU 上でオープンソースの ML モデルを実行します。replicate.run() は、出力を直接返す同期実行に使用し、replicate.stream() は SSE ベースのストリーミングに使用し、replicate.predictions.create() は、webhook 通知による非同期バックグラウンドジョブに使用します。モデルは owner/model (最新バージョンを使用) または owner/model:version (固定) として参照されます。ファイル出力は ReadableStream を実装する FileOutput オブジェクトです。使用頻度の低いモデルではコールドスタートが予想されます。モデルをウォーム状態に保つには、min_instances を使用してデプロイメントを使用します。

<critical_requirements>

重要: この Skill を使用する前に

すべてのコードは CLAUDE.md のプロジェクト規約に従う必要があります (kebab-case、名前付きエクスポート、インポート順序、import type、名前付き定数)

(API トークンをハードコードしてはなりません -- 常に process.env.REPLICATE_API_TOKEN を介して環境変数を使用してください)

(ファイルを返すモデルについては、FileOutput オブジェクトを処理する必要があります -- 出力がプレーンな文字列または URL であると想定しないでください)

(replicate パッケージの validateWebhook() を使用して webhook を検証する必要があります -- 検証されていない webhook ペイロードを信頼しないでください)

(使用頻度の低いモデルを実行する場合は、コールドスタートを考慮する必要があります -- レイテンシに敏感なアプリケーションにはデプロイメントを使用してください)

(再現可能な結果を保証するために、本番環境ではモデルバージョン (owner/model:version) を指定する必要があります -- バージョンなしの参照は最新のものを使用するため、変更される可能性があります)

</critical_requirements>

自動検出: Replicate, replicate, replicate.run, replicate.stream, replicate.predictions, replicate.deployments, replicate.trainings, replicate.models, FileOutput, validateWebhook, REPLICATE_API_TOKEN, serverless GPU, cold start, webhook_events_filter

使用場面:

GPU インフラストラクチャを管理せずに、オープンソースの ML モデル (Llama、Stable Diffusion、Whisper など) を実行する場合
API 経由で画像生成、音声の文字起こし、LLM の実行、またはあらゆる ML 推論を行う場合
サーバー送信イベントを使用して LLM 出力をリアルタイムでストリーミングする場合
webhook 通知を使用して予測を非同期的に処理する場合
カスタムトレーニングデータでモデルをファインチューニングする場合
デプロイメントを介してカスタムスケーリングを備えた専用ハードウェアでモデルを実行する場合

カバーする主なパターン:

クライアントの初期化と構成 (認証、ユーザーエージェント、ファイルエンコーディング)
予測の実行 (replicate.run()、replicate.predictions.create()、replicate.wait())
出力のストリーミング (replicate.stream() と SSE イベント)
モデルのバージョニング (owner/model 対 owner/model:version)
ファイルの入出力処理 (FileOutput、ファイルのアップロード、Buffer 入力)
Webhook (セットアップ、イベントフィルタリング、署名検証)
デプロイメント (カスタムハードウェア、スケーリング、モデルのウォーム状態の維持)
トレーニング/ファインチューニング

使用しない場面:

統一されたマルチプロバイダー LLM SDK (OpenAI、Anthropic、Google) が必要な場合 -- プロバイダーに依存しない SDK を使用してください
モデルをローカルで実行したい場合 -- Replicate はクラウド専用のサーバーレスプラットフォームです
デプロイメントなしで 1 秒未満のレイテンシ保証が必要な場合 -- コールドスタートには数分かかる場合があります

例のインデックス

コア: セットアップ、予測とファイル -- クライアントの初期化、run()、predictions.create()、wait()、ファイルの入出力、エラー処理
ストリーミングと Webhook -- stream()、SSE イベント、webhook のセットアップ、署名検証
デプロイメントとトレーニング -- カスタムハードウェア、スケーリング、ファインチューニング、モデル管理
クイック API リファレンス -- メソッドシグネチャ、コンストラクタオプション、エラータイプ、モデル参照形式

哲学

Replicate は、オープンソースの ML モデルを実行するための サーバーレス GPU インフラストラクチャ を提供します。入力データを送信すると、Replicate は GPU ハードウェアを割り当て、モデルを実行し、出力を返します。Docker、CUDA ドライバー、GPU プロビジョニングは不要です。

コア原則:

サーバーレス実行 -- モデルは Replicate のインフラストラクチャ上でオンデマンドで実行されます。コンピューティング時間に対してのみ料金が発生します。コールドスタートは、常時オンの GPU を維持しないこととのトレードオフです。
モデルマーケットプレイス -- replicate.com/explore で数千のコミュニティモデルと公式モデルを利用できます。識別子だけで任意のパブリックモデルを実行できます。
再現性のためのバージョン固定 -- モデルは SHA-256 ハッシュでバージョン管理されます。本番環境でバージョン (owner/model:abc123...) に固定して、デプロイメント間で同一の動作を保証します。
3 つの実行モード -- 同期待機用の replicate.run()、リアルタイム SSE 出力用の replicate.stream()、webhook を使用した fire-and-forget 用の replicate.predictions.create()。
ファイルファーストの入出力 -- 多くのモデルはファイル (画像、音声、ビデオ) を受け入れ、生成します。SDK はファイルのアップロードを自動的に処理し、ファイル出力に対して FileOutput オブジェクトを返します。

</philosophy>

コアパターン

パターン 1: クライアントのセットアップ

Replicate クライアントを初期化します。環境から REPLICATE_API_TOKEN を自動的に読み取ります。

// lib/replicate.ts -- 基本的なセットアップ
import Replicate from "replicate";

const replicate = new Replicate();

export { replicate };

// lib/replicate.ts -- 明示的な認証 + カスタムユーザーエージェント
import Replicate from "replicate";

const replicate = new Replicate({
  auth: process.env.REPLICATE_API_TOKEN, // 省略された場合、環境から自動的に読み取ります
  userAgent: "my-app/1.0.0",
});

export { replicate };

良い点: 最小限のセットアップ、環境変数が自動検出される、明示的な認証はオプションだが明確にするのに役立つ

// 悪い例: ハードコードされたトークン
const replicate = new Replicate({
  auth: "r8_abc123...",
});

悪い点: ハードコードされた API トークンはセキュリティリスクであり、バージョン管理で漏洩する

参照: 完全なコンストラクタオプション、エラー処理パターンについては、examples/core.md を参照してください

パターン 2: 予測の実行

同期実行には replicate.run() を使用します。モデルの出力を直接返します。


// 画像生成モデルを実行します
const [output] = await replicate.run("black-forest-labs/flux-sch

(原文がここで切り詰められています)

📜 原文 SKILL.md(Claudeが読む英語/中国語)を展開

Replicate SDK Patterns

Quick Guide: Use the replicate npm package to run open-source ML models on serverless GPUs. Use replicate.run() for synchronous execution that returns output directly, replicate.stream() for SSE-based streaming, or replicate.predictions.create() for async background jobs with webhook notifications. Models are referenced as owner/model (uses latest version) or owner/model:version (pinned). File outputs are FileOutput objects implementing ReadableStream. Cold starts are expected for infrequently-used models -- use deployments with min_instances to keep models warm.

<critical_requirements>

CRITICAL: Before Using This Skill

All code must follow project conventions in CLAUDE.md (kebab-case, named exports, import ordering, import type, named constants)

(You MUST never hardcode API tokens -- always use environment variables via process.env.REPLICATE_API_TOKEN)

(You MUST handle FileOutput objects for models that return files -- do not assume outputs are plain strings or URLs)

(You MUST validate webhooks using validateWebhook() from the replicate package -- never trust unverified webhook payloads)

(You MUST account for cold starts when running infrequently-used models -- use deployments for latency-sensitive applications)

(You MUST specify model versions (owner/model:version) in production to ensure reproducible results -- unversioned references use the latest, which can change)

</critical_requirements>

Auto-detection: Replicate, replicate, replicate.run, replicate.stream, replicate.predictions, replicate.deployments, replicate.trainings, replicate.models, FileOutput, validateWebhook, REPLICATE_API_TOKEN, serverless GPU, cold start, webhook_events_filter

When to use:

Running open-source ML models (Llama, Stable Diffusion, Whisper, etc.) without managing GPU infrastructure
Generating images, transcribing audio, running LLMs, or any ML inference via API
Streaming LLM output in real-time with server-sent events
Processing predictions asynchronously with webhook notifications
Fine-tuning models with custom training data
Running models on dedicated hardware with custom scaling via deployments

Key patterns covered:

Client initialization and configuration (auth, user agent, file encoding)
Running predictions (replicate.run(), replicate.predictions.create(), replicate.wait())
Streaming output (replicate.stream() with SSE events)
Model versioning (owner/model vs owner/model:version)
File input/output handling (FileOutput, file uploads, Buffer inputs)
Webhooks (setup, event filtering, signature validation)
Deployments (custom hardware, scaling, keeping models warm)
Training / fine-tuning

When NOT to use:

You need a unified multi-provider LLM SDK (OpenAI, Anthropic, Google) -- use a provider-agnostic SDK
You want to run models locally -- Replicate is a cloud-only serverless platform
You need sub-second latency guarantees without deployments -- cold starts can take minutes

Examples Index

Core: Setup, Predictions & Files -- Client init, run(), predictions.create(), wait(), file I/O, error handling
Streaming & Webhooks -- stream(), SSE events, webhook setup, signature validation
Deployments & Training -- Custom hardware, scaling, fine-tuning, model management
Quick API Reference -- Method signatures, constructor options, error types, model reference format

Philosophy

Replicate provides serverless GPU infrastructure for running open-source ML models. You send inputs, Replicate allocates GPU hardware, runs the model, and returns outputs. No Docker, no CUDA drivers, no GPU provisioning.

Core principles:

Serverless execution -- Models run on-demand on Replicate's infrastructure. You pay only for compute time. Cold starts are a trade-off for not maintaining always-on GPUs.
Model marketplace -- Thousands of community and official models available at replicate.com/explore. Run any public model with just its identifier.
Version pinning for reproducibility -- Models are versioned with SHA-256 hashes. Pin to a version in production (owner/model:abc123...) to guarantee identical behavior across deploys.
Three execution modes -- replicate.run() for synchronous wait, replicate.stream() for real-time SSE output, replicate.predictions.create() for fire-and-forget with webhooks.
File-first I/O -- Many models accept and produce files (images, audio, video). The SDK handles file uploads automatically and returns FileOutput objects for file outputs.

</philosophy>

Core Patterns

Pattern 1: Client Setup

Initialize the Replicate client. It auto-reads REPLICATE_API_TOKEN from the environment.

// lib/replicate.ts -- basic setup
import Replicate from "replicate";

const replicate = new Replicate();

export { replicate };

// lib/replicate.ts -- explicit auth + custom user agent
import Replicate from "replicate";

const replicate = new Replicate({
  auth: process.env.REPLICATE_API_TOKEN, // Auto-reads from env if omitted
  userAgent: "my-app/1.0.0",
});

export { replicate };

Why good: Minimal setup, env var auto-detected, explicit auth optional but useful for clarity

// BAD: Hardcoded token
const replicate = new Replicate({
  auth: "r8_abc123...",
});

Why bad: Hardcoded API token is a security risk, will leak in version control

See: examples/core.md for full constructor options, error handling patterns

Pattern 2: Running Predictions

Use replicate.run() for synchronous execution. Returns the model output directly.

// Run an image generation model
const [output] = await replicate.run("black-forest-labs/flux-schnell", {
  input: {
    prompt: "a serene mountain landscape at sunset",
  },
});

// output is a FileOutput object for image models
console.log(output.url()); // URL of generated image

// Run an LLM -- output is a string for text models
const output = await replicate.run("meta/meta-llama-3-70b-instruct", {
  input: {
    prompt: "Explain TypeScript generics in 3 sentences.",
    max_tokens: 512,
  },
});

console.log(output); // Text response

Why good: Simple API, returns output directly, destructuring works for array outputs (images)

// BAD: Not pinning version in production
const output = await replicate.run("community-user/experimental-model", {
  input: { prompt: "hello" },
});

Why bad: Community models without version pinning can change behavior unexpectedly when authors push updates

See: examples/core.md for version pinning, predictions.create() + wait(), and progress callbacks

Pattern 3: Streaming

Use replicate.stream() for real-time SSE output from language models.

const stream = replicate.stream("meta/meta-llama-3-70b-instruct", {
  input: {
    prompt: "Write a short poem about TypeScript.",
    max_tokens: 512,
  },
});

for await (const event of stream) {
  if (event.event === "output") {
    process.stdout.write(event.data);
  }
}

Why good: Progressive output for better UX, event-based with typed event and data fields

// BAD: Using replicate.run() for user-facing LLM output
const output = await replicate.run("meta/meta-llama-3-70b-instruct", {
  input: { prompt: "Write a long essay..." },
});
// User waits for entire generation to complete before seeing anything

Why bad: No progressive feedback, user sees a blank screen for seconds

See: examples/streaming-webhooks.md for event types, error handling, cancellation

Pattern 4: Model Versioning

Models are referenced as owner/model (latest version) or owner/model:sha256hash (pinned version).

// Development: use latest version for convenience
const output = await replicate.run("stability-ai/sdxl", {
  input: { prompt: "a cat" },
});

// Production: pin to a specific version for reproducibility
const VERSION_HASH =
  "39ed52f2a78e934b3ba6e2a89f5b1c712de7dfea535525255b1aa35c5565e08b";
const output = await replicate.run(`stability-ai/sdxl:${VERSION_HASH}`, {
  input: { prompt: "a cat" },
});

Why good: Pinned version guarantees identical behavior, hash is immutable

See: examples/core.md for listing model versions, getting version details

Pattern 5: File Handling

Models that output files return FileOutput objects implementing ReadableStream.

import { writeFile } from "node:fs/promises";

const [output] = await replicate.run("black-forest-labs/flux-schnell", {
  input: { prompt: "a sunset over mountains" },
});

// FileOutput has .url() and .blob() methods
console.log(output.url()); // Underlying URL

// Save to disk
const blob = await output.blob();
const buffer = Buffer.from(await blob.arrayBuffer());
await writeFile("./output.png", buffer);

// File inputs: pass URLs, Buffers, or ReadStreams
import { readFile } from "node:fs/promises";

const imageBuffer = await readFile("./input.png");

const output = await replicate.run("some-user/image-model", {
  input: {
    image: imageBuffer, // Auto-uploaded (max 100 MiB)
  },
});

Why good: FileOutput is a ReadableStream, works with Node.js stream APIs, .url() for the underlying URL

// BAD: Treating file output as a plain URL string
const [output] = await replicate.run("black-forest-labs/flux-schnell", {
  input: { prompt: "hello" },
});
const url = output; // WRONG: output is a FileOutput object, not a string

Why bad: FileOutput is an object, not a string -- use .url() to get the URL

See: examples/core.md for file uploads, large file handling, encoding strategies

Pattern 6: Async Predictions with Webhooks

Use replicate.predictions.create() for background jobs with webhook notifications.

const prediction = await replicate.predictions.create({
  model: "owner/model", // OR version: "sha256hash" for pinned version
  input: { prompt: "a painting of a cat" },
  webhook: "https://my.app/webhooks/replicate",
  webhook_events_filter: ["completed"],
});

console.log(prediction.id); // Use to track status
console.log(prediction.status); // "starting"

// Webhook signature validation (CRITICAL for security)
import { validateWebhook } from "replicate";

async function handleWebhook(request: Request): Promise<Response> {
  const secret = process.env.REPLICATE_WEBHOOK_SIGNING_SECRET;
  const isValid = await validateWebhook(request, secret);

  if (!isValid) {
    return new Response("Invalid signature", { status: 401 });
  }

  const prediction = await request.json();
  // Process prediction.output safely
  return new Response("OK", { status: 200 });
}

Why good: Decoupled processing, secure signature validation, filtered events reduce noise

See: examples/streaming-webhooks.md for webhook event types, polling alternative

Pattern 7: Deployments

Deployments give you a private, fixed endpoint with custom hardware and scaling.

// Create a prediction on a deployment (no cold start if min_instances > 0)
const prediction = await replicate.deployments.predictions.create(
  "my-org/my-deployment",
  {
    input: { prompt: "hello world" },
  },
);

const result = await replicate.wait(prediction);
console.log(result.output);

Why good: Predictable latency with min_instances, private endpoint, custom hardware selection

See: examples/deployments-training.md for creating/managing deployments, training API

Pattern 8: Error Handling

Catch API errors with status codes. The SDK auto-retries on 429 and 5xx errors (5 retries by default with exponential backoff).

try {
  const output = await replicate.run("owner/model", {
    input: { prompt: "hello" },
  });
} catch (error) {
  if (error instanceof Error) {
    console.error(`Replicate error: ${error.message}`);

    // Check for specific HTTP status codes in the error
    if ("status" in error) {
      const status = (error as { status: number }).status;
      if (status === 401) {
        throw new Error("Invalid API token. Check REPLICATE_API_TOKEN.");
      }
      if (status === 422) {
        console.error("Invalid input parameters");
      }
      if (status === 429) {
        console.error(
          "Rate limited -- SDK auto-retries (5 attempts) exhausted",
        );
      }
    }
  }
  throw error;
}

Why good: Checks error type, handles specific status codes, re-throws unexpected errors

See: examples/core.md for full error handling example with status code handling

</patterns>

Performance Optimization

Cold Start Mitigation

Frequent model with varying load   -> Use deployments with min_instances >= 1
One-off batch jobs                  -> Use predictions.create() with webhooks (no waiting)
Popular public models               -> Usually warm, replicate.run() is fine
Custom/niche models                 -> Expect 30s-5min cold start on first run

Key Optimization Patterns

Use deployments for latency-sensitive applications -- set min_instances: 1 to eliminate cold starts
Use webhooks instead of polling for async jobs -- reduces API calls and latency
Batch file inputs as URLs instead of uploading buffers -- avoids 100 MiB upload limit and is faster
Pin model versions in production -- avoids unexpected behavior changes and enables caching
Use replicate.stream() for LLMs -- progressive output feels faster than waiting for full completion
Cancel unneeded predictions with replicate.predictions.cancel() -- stops billing immediately

</performance>

<decision_framework>

Decision Framework

Which Execution Method to Use

Is this a user-facing LLM response?
+-- YES -> Use replicate.stream() for real-time SSE output
+-- NO -> Do you need the result immediately?
    +-- YES -> Use replicate.run() (blocks until complete)
    +-- NO -> Use replicate.predictions.create() + webhook
        +-- Need to poll instead? -> Use replicate.wait(prediction)

Model Reference Format

Are you in development/prototyping?
+-- YES -> Use owner/model (latest version, convenient)
+-- NO -> Are you in production?
    +-- YES -> Use owner/model:version_hash (pinned, reproducible)
    +-- Does the model change frequently?
        +-- YES -> Pin version, test updates explicitly
        +-- NO -> Either format works, prefer pinned

Deployments vs Direct API

Do you need consistent low latency?
+-- YES -> Create a deployment with min_instances >= 1
+-- NO -> Do you need custom hardware (A100, H100)?
    +-- YES -> Create a deployment with specific hardware
    +-- NO -> Use replicate.run() / replicate.stream() directly
        (Replicate auto-allocates hardware)

When to Use This SDK vs Other AI SDKs

Are you running open-source models on serverless GPUs?
+-- YES -> Use Replicate SDK
+-- NO -> Are you calling proprietary APIs (OpenAI, Anthropic)?
    +-- YES -> Not this skill's scope -- use provider-specific SDKs
    +-- NO -> Do you need to switch between multiple providers?
        +-- YES -> Not this skill's scope -- use a unified provider SDK
        +-- NO -> Do you want to self-host models?
            +-- YES -> Not this skill's scope -- consider Cog or vLLM
            +-- NO -> Replicate SDK is appropriate

</decision_framework>

<red_flags>

RED FLAGS

High Priority Issues:

Hardcoding REPLICATE_API_TOKEN in source code (security breach risk)
Treating FileOutput as a string (it is a ReadableStream object -- use .url() or .blob())
Not validating webhook signatures with validateWebhook() (allows forged webhook payloads)
Using replicate.run() for long-running models in request handlers (blocks the response, can timeout)

Medium Priority Issues:

Not pinning model versions in production (owner/model uses latest, which can change without notice)
Relying solely on default retry behavior for production (5 retries with exponential backoff may be too aggressive for some use cases)
Uploading large files as Buffer instead of hosting them at a URL (100 MiB limit on uploads)
Ignoring cold start latency for infrequently-used models (first request can take minutes)

Common Mistakes:

Confusing replicate.run() (returns output directly) with replicate.predictions.create() (returns a prediction object with status/id)
Destructuring image output incorrectly: const output = await replicate.run(...) instead of const [output] = await replicate.run(...) (image models return arrays)
Using replicate.stream() with models that do not support streaming (only language models with SSE support)
Forgetting that replicate.predictions.create() accepts either a version hash or a model string (owner/model) -- use version for pinned reproducibility, model for latest-version convenience
Not consuming the async iterator from replicate.stream() (events are lost)

Gotchas & Edge Cases:

Prediction inputs and outputs are automatically deleted after one hour -- persist outputs via webhooks or download immediately
The SDK auto-retries on 429 (rate limit) and 5xx errors -- 5 retries by default with exponential backoff. GET requests retry on 429 and 5xx; non-GET requests retry only on 429
replicate.stream() returns ServerSentEvent objects with .event ("output", "error", "done") and .data (string) properties
File uploads are limited to 100 MiB -- for larger files, host them at a URL and pass the URL as input
Browser usage is not supported -- the SDK requires a server-side environment (Node.js 18+, Bun, Deno, Cloudflare Workers)
webhook_events_filter accepts ["start", "output", "logs", "completed"] -- use ["completed"] unless you need intermediate status updates
The Prefer: wait header enables sync mode on the HTTP API (up to 60s), but replicate.run() already handles this automatically
Community models may disappear or change without warning -- pin versions and maintain fallbacks for critical workflows
replicate.wait() polls the API until the prediction completes -- use webhooks for production to avoid polling overhead
FileOutput.url() returns the underlying URL, but these URLs are temporary -- download or persist the file before it expires

</red_flags>

<critical_reminders>

CRITICAL REMINDERS