🛠️ 開発・MCP コミュニティ

ai-provider-anthropic-sdk

TypeScript/Node.js向けのAnthropic公式SDKを活用し、Messages API、ストリーミング、ツール利用、画像認識、高度な思考、構造化出力、プロンプトキャッシュ、バッチAPI、本番環境でのベストプラクティスを実装するSkill。

📜 元の英語説明(参考)

Official Anthropic SDK patterns for TypeScript/Node.js — client setup, Messages API, streaming, tool use, vision, extended thinking, structured outputs, prompt caching, batch API, and production best practices

🇯🇵 日本人クリエイター向け解説

一言でいうと

※ jpskill.com 編集部が日本のビジネス現場向けに補足した解説です。Skill本体の挙動とは独立した参考情報です。

⚡ おすすめ: コマンド1行でインストール(60秒)

下記のコマンドをコピーしてターミナル(Mac/Linux)または PowerShell(Windows)に貼り付けてください。ダウンロード → 解凍 → 配置まで全自動。

🍎 Mac / 🐧 Linux

mkdir -p ~/.claude/skills && cd ~/.claude/skills && curl -L -o ai-provider-anthropic-sdk.zip https://jpskill.com/download/10214.zip && unzip -o ai-provider-anthropic-sdk.zip && rm ai-provider-anthropic-sdk.zip

🪟 Windows (PowerShell)

$d = "$env:USERPROFILE\.claude\skills"; ni -Force -ItemType Directory $d | Out-Null; iwr https://jpskill.com/download/10214.zip -OutFile "$d\ai-provider-anthropic-sdk.zip"; Expand-Archive "$d\ai-provider-anthropic-sdk.zip" -DestinationPath $d -Force; ri "$d\ai-provider-anthropic-sdk.zip"

完了後、Claude Code を再起動 → 普通に「動画プロンプト作って」のように話しかけるだけで自動発動します。

💾 手動でダウンロードしたい(コマンドが難しい人向け)

1. 下の青いボタンを押して ai-provider-anthropic-sdk.zip をダウンロード
2. ZIPファイルをダブルクリックで解凍 → ai-provider-anthropic-sdk フォルダができる
3. そのフォルダを C:\Users\あなたの名前\.claude\skills\(Win)または ~/.claude/skills/(Mac)へ移動
4. Claude Code を再起動

⬇ .zip でダウンロード(推奨) ⬇ .skill 形式(上級者用) 元のソース ↗

⚠️ ダウンロード・利用は自己責任でお願いします。当サイトは内容・動作・安全性について責任を負いません。

🎯 このSkillでできること

下記の説明文を読むと、このSkillがあなたに何をしてくれるかが分かります。Claudeにこの分野の依頼をすると、自動で発動します。

📦 インストール方法 (3ステップ)

1. 上の「ダウンロード」ボタンを押して .skill ファイルを取得
2. ファイル名の拡張子を .skill から .zip に変えて展開(macは自動展開可)
3. 展開してできたフォルダを、ホームフォルダの .claude/skills/ に置く
- · macOS / Linux: ~/.claude/skills/
- · Windows: %USERPROFILE%\.claude\skills\

Claude Code を再起動すれば完了。「このSkillを使って…」と話しかけなくても、関連する依頼で自動的に呼び出されます。

詳しい使い方ガイドを見る →

最終更新: 2026-05-18
取得日時: 2026-05-18
同梱ファイル: 1

📖 Skill本文(日本語訳)

※ 原文(英語/中国語)を Gemini で日本語化したものです。Claude 自身は原文を読みます。誤訳がある場合は原文をご確認ください。

Anthropic SDK のパターン

クイックガイド: Claude モデルと直接やり取りするには、公式の @anthropic-ai/sdk パッケージを使用します。単一ターンおよび複数ターンの会話には client.messages.create() を使用します。イベントベースの消費でストリーミングするには client.messages.stream() を使用します。max_tokens は常に必須です。コンテンツブロックは型付きユニオン (text、tool_use、thinking) です。構造化された出力には、zodOutputFormat() と共に client.messages.parse() を使用します。ツールの使用にはツール結果ループが必要です -- Claude は tool_use ブロックを返し、ツールを実行して tool_result ブロックを返送します。拡張された思考は、応答の前に thinking コンテンツブロックを追加します。

<critical_requirements>

重要: この Skill を使用する前に

すべてのコードは、CLAUDE.md のプロジェクト規則に従う必要があります (kebab-case、名前付きエクスポート、インポート順序、import type、名前付き定数)

(messages.create() / messages.stream() のすべての呼び出しで、常に max_tokens を指定する必要があります -- これは必須であり、デフォルトはありません)

(stop_reason フィールドを処理して、end_turn、max_tokens、tool_use、および stop_sequence を検出する必要があります -- これを無視すると、サイレントな切り捨てまたは壊れたツールループが発生します)

(response.content ブロックを反復処理する必要があります (単一のテキストブロックを想定しないでください) -- 応答には、text、tool_use、および thinking ブロックが混在している可能性があります)

(Anthropic.APIError とそのサブクラスを使用してエラーを処理する必要があります -- エラータイプチェックなしでベア catch ブロックを使用しないでください)

(process.env.ANTHROPIC_API_KEY を介して、API キーをハードコードしないでください -- 常に環境変数を使用してください)

</critical_requirements>

自動検出: Anthropic, @anthropic-ai/sdk, client.messages.create, client.messages.stream, client.messages.parse, client.messages.countTokens, client.messages.batches, ANTHROPIC_API_KEY, claude-sonnet, claude-opus, claude-haiku, ContentBlock, ToolUseBlock, tool_use, tool_result, thinking, budget_tokens, adaptive, cache_control, zodOutputFormat, betaZodTool, toolRunner

使用する場合:

Claude モデル (Opus、Sonnet、Haiku ファミリー) を直接呼び出すアプリケーションを構築する場合
イベントベースのテキスト蓄積によるストリーミングチャット応答を実装する場合
Claude がどのツールを呼び出すかを決定するツール使用/関数呼び出しを使用する場合
テキストプロンプトと一緒に画像、PDF、またはドキュメントを処理する場合
複雑な推論タスクのために拡張された思考を有効にする場合
Zod スキーマ検証を使用して応答から構造化されたデータを抽出する場合
コスト削減のために大規模なシステムプロンプトまたは会話プレフィックスをキャッシュする場合
大量の非同期処理のためにバッチジョブを実行する場合
コスト見積のためにリクエストを送信する前にトークンをカウントする場合

カバーする主要なパターン:

クライアントの初期化と構成 (再試行、タイムアウト、API キー)
Messages API (messages.create、システムプロンプト、複数ターンの会話)
.stream() ヘルパーと stream: true ローレベル SSE を使用したストリーミング
ツール使用/関数呼び出し (ツール配列、tool_use / tool_result コンテンツブロック)
Vision (base64 画像、URL 画像、PDF/ドキュメント)
拡張された思考 (thinking 構成、budget_tokens、thinking コンテンツブロック)
構造化された出力 (zodOutputFormat、messages.parse、output_config)
プロンプトキャッシュ (cache_control: { type: "ephemeral" })
Batch API (messages.batches.create)
トークンカウント (messages.countTokens)
エラー処理、再試行、および本番環境のベストプラクティス

使用しない場合:

複数の LLM プロバイダーを切り替える必要があるマルチプロバイダーアプリケーション -- 代わりに統合されたプロバイダー SDK を使用してください
React 固有のチャット UI フック (useChat、useCompletion) -- フレームワーク統合された AI SDK を使用してください
より高レベルのエージェントフレームワークが必要な場合 -- Claude Agent SDK (@anthropic-ai/claude-agent-sdk) を検討してください

例のインデックス

コア: セットアップと構成 -- クライアントの初期化、本番環境構成、エラー処理、トークンカウント
ストリーミング -- .stream() ヘルパー、stream: true SSE、イベントタイプ、中断
ツール使用/関数呼び出し -- ツールの定義、ツールループ、並列ツール呼び出し、自動ツールランナー
Vision & ドキュメント -- Base64 画像、URL 画像、PDF、マルチモーダル
拡張された思考 -- Thinking 構成、ストリーミング思考、ツール使用による思考
クイック API リファレンス -- モデル ID、メソッドシグネチャ、エラータイプ、ストリーミングイベント、コンテンツブロックタイプ

哲学

公式の Anthropic SDK は、Claude API への 直接的で型付きのアクセス を提供します。これは、Stainless を使用して Anthropic の API 仕様から自動生成され、Anthropic がドキュメント化している正確な API サーフェスを完全な TypeScript 型で提供します。

コア原則:

文字列ではなくコンテンツブロック -- 応答は、プレーンな文字列ではなく、型付きコンテンツブロック (TextBlock、ToolUseBlock、ThinkingBlock) の配列です。常に response.content を反復処理し、block.type で切り替えます。
明示的なリソース制限 -- max_tokens は常に必須です。デフォルトはありません。API はそれなしでリクエストを拒否します。
ツールの使用は会話ループです -- stop_reason === "tool_use" の場合、Claude はツールを実行するように要求しています。会話を続けるには、結果を tool_result コンテンツブロックとして返送する必要があります。
組み込みの回復力 -- SDK は、429、409、408、529、および 5xx エラーに対して、デフォルトで 2 回、指数バックオフで再試行します。
ファーストクラスのパターンとしてのストリーミング -- .on("text", ...) を持つイベントベースの API には .stream() を使用し、生の SSE 反復には stream: true を使用します。

Anthropic SDK を直接使用する場合:

Claude モデルのみを使用し、最もシンプルで直接的な統合が必要な場合
Anthropic 固有の機能 (拡張された思考、プロンプトキャッシュ、バッチ API) へのアクセスが必要な場合
最小限の依存関係とゼロ抽象化オーバーヘッドが必要な場合
最新の API 機能を初日から必要とする場合

使用しない場合:

複数の LLM プロバイダーを切り替える必要がある場合

(原文がここで切り詰められています)

📜 原文 SKILL.md(Claudeが読む英語/中国語)を展開

Anthropic SDK Patterns

Quick Guide: Use the official @anthropic-ai/sdk package to interact with Claude models directly. Use client.messages.create() for single-turn and multi-turn conversations. Use client.messages.stream() for streaming with event-based consumption. max_tokens is always required. Content blocks are typed unions (text, tool_use, thinking). Use client.messages.parse() with zodOutputFormat() for structured outputs. Tool use requires a tool-result loop -- Claude returns tool_use blocks, you execute the tool and send back tool_result blocks. Extended thinking adds thinking content blocks before the response.

<critical_requirements>

CRITICAL: Before Using This Skill

All code must follow project conventions in CLAUDE.md (kebab-case, named exports, import ordering, import type, named constants)

(You MUST always provide max_tokens in every messages.create() / messages.stream() call -- it is required and has no default)

(You MUST handle the stop_reason field to detect end_turn, max_tokens, tool_use, and stop_sequence -- ignoring it causes silent truncation or broken tool loops)

(You MUST iterate over response.content blocks (not assume a single text block) -- responses can contain text, tool_use, and thinking blocks mixed together)

(You MUST handle errors using Anthropic.APIError and its subclasses -- never use bare catch blocks without error type checking)

(You MUST never hardcode API keys -- always use environment variables via process.env.ANTHROPIC_API_KEY)

</critical_requirements>

Auto-detection: Anthropic, @anthropic-ai/sdk, client.messages.create, client.messages.stream, client.messages.parse, client.messages.countTokens, client.messages.batches, ANTHROPIC_API_KEY, claude-sonnet, claude-opus, claude-haiku, ContentBlock, ToolUseBlock, tool_use, tool_result, thinking, budget_tokens, adaptive, cache_control, zodOutputFormat, betaZodTool, toolRunner

When to use:

Building applications that call Claude models directly (Opus, Sonnet, Haiku families)
Implementing streaming chat responses with event-based text accumulation
Using tool use / function calling where Claude decides which tools to invoke
Processing images, PDFs, or documents alongside text prompts
Enabling extended thinking for complex reasoning tasks
Extracting structured data from responses with Zod schema validation
Caching large system prompts or conversation prefixes for cost savings
Running batch jobs for high-volume, asynchronous processing
Counting tokens before sending requests for cost estimation

Key patterns covered:

Client initialization and configuration (retries, timeouts, API key)
Messages API (messages.create, system prompts, multi-turn conversations)
Streaming with .stream() helper and stream: true low-level SSE
Tool use / function calling (tools array, tool_use / tool_result content blocks)
Vision (base64 images, URL images, PDFs/documents)
Extended thinking (thinking config, budget_tokens, thinking content blocks)
Structured outputs (zodOutputFormat, messages.parse, output_config)
Prompt caching (cache_control: { type: "ephemeral" })
Batch API (messages.batches.create)
Token counting (messages.countTokens)
Error handling, retries, and production best practices

When NOT to use:

Multi-provider applications where you need to switch between multiple LLM providers -- use a unified provider SDK instead
React-specific chat UI hooks (useChat, useCompletion) -- use a framework-integrated AI SDK
When you need a higher-level agent framework -- consider the Claude Agent SDK (@anthropic-ai/claude-agent-sdk)

Examples Index

Core: Setup & Configuration -- Client init, production config, error handling, token counting
Streaming -- .stream() helper, stream: true SSE, event types, abort
Tool Use / Function Calling -- Tool definitions, tool loops, parallel tool calls, automated tool runner
Vision & Documents -- Base64 images, URL images, PDFs, multi-modal
Extended Thinking -- Thinking config, streaming thinking, thinking with tool use
Quick API Reference -- Model IDs, method signatures, error types, streaming events, content block types

Philosophy

The official Anthropic SDK provides direct, typed access to the Claude API. It is auto-generated from Anthropic's API specification using Stainless, giving you the exact API surface that Anthropic documents with full TypeScript types.

Core principles:

Content blocks, not strings -- Responses are arrays of typed content blocks (TextBlock, ToolUseBlock, ThinkingBlock), not plain strings. Always iterate over response.content and switch on block.type.
Explicit resource limits -- max_tokens is always required. There is no default. The API will reject requests without it.
Tool use is a conversation loop -- When stop_reason === "tool_use", Claude is requesting you execute a tool. You must send the result back as a tool_result content block to continue the conversation.
Built-in resilience -- The SDK retries 2 times by default on 429, 409, 408, 529, and 5xx errors with exponential backoff.
Streaming as a first-class pattern -- Use .stream() for an event-based API with .on("text", ...), or stream: true for raw SSE iteration.

When to use the Anthropic SDK directly:

You only use Claude models and want the simplest, most direct integration
You need access to Anthropic-specific features (extended thinking, prompt caching, batch API)
You want minimal dependencies and zero abstraction overhead
You need the latest API features on day one

When NOT to use:

You need to switch between multiple LLM providers -- use a unified provider SDK
You want React-specific chat UI hooks -- use a framework-integrated AI SDK
You want a higher-level agent framework -- consider the Claude Agent SDK

</philosophy>

Core Patterns

Pattern 1: Client Setup

Initialize the Anthropic client. It auto-reads ANTHROPIC_API_KEY from the environment.

// lib/anthropic.ts -- basic setup
import Anthropic from "@anthropic-ai/sdk";
const client = new Anthropic();
export { client };

// lib/anthropic.ts -- production configuration
const TIMEOUT_MS = 30_000;
const MAX_RETRIES = 3;
const client = new Anthropic({ timeout: TIMEOUT_MS, maxRetries: MAX_RETRIES });

Why good: Minimal setup, env var auto-detected, named constants for production settings

// BAD: Hardcoded API key
const client = new Anthropic({ apiKey: "sk-ant-api03-..." });

Why bad: Hardcoded keys get committed to version control, causing security breaches

See: examples/core.md for per-request overrides, error handling patterns, token counting

Pattern 2: Messages API

All interactions use client.messages.create(). max_tokens is always required.

const MAX_TOKENS = 1024;

const message = await client.messages.create({
  model: "claude-sonnet-4-6",
  max_tokens: MAX_TOKENS,
  system: "You are a helpful coding assistant.",
  messages: [{ role: "user", content: "Explain TypeScript generics." }],
});

// Response is an array of content blocks -- iterate, don't assume
for (const block of message.content) {
  if (block.type === "text") {
    console.log(block.text);
  }
}

Why good: Named constant for max_tokens, system prompt separated from messages, content blocks iterated

// BAD: Assuming content is a single text string
const text = message.content[0].text; // Crashes if block is tool_use or thinking

Why bad: Content can contain multiple blocks of different types -- direct index access without type checking crashes at runtime

See: examples/core.md for multi-turn conversations, system prompts, token tracking

Pattern 3: Streaming

Use .stream() for event-based streaming with text accumulation helpers.

const MAX_TOKENS = 1024;

const stream = client.messages.stream({
  model: "claude-sonnet-4-6",
  max_tokens: MAX_TOKENS,
  messages: [{ role: "user", content: "Explain async/await." }],
});

stream.on("text", (text) => {
  process.stdout.write(text);
});

const finalMessage = await stream.finalMessage();

Why good: Event-based API handles accumulation, finalMessage() gives the complete response object

// BAD: Using stream: true without consuming events
const response = await client.messages.create({
  model: "claude-sonnet-4-6",
  max_tokens: MAX_TOKENS,
  messages: [{ role: "user", content: "Hello" }],
  stream: true,
});
// Response is an async iterable, not a Message -- must iterate

Why bad: stream: true returns an async iterable of raw SSE events, not a Message. Treating it as a Message silently breaks.

See: examples/streaming.md for raw SSE iteration, abort, stream events, streaming with thinking

Pattern 4: Tool Use / Function Calling

Define tools Claude can invoke. Handle the tool_use -> tool_result conversation loop.

const tools: Anthropic.Messages.Tool[] = [
  {
    name: "get_weather",
    description: "Get current weather for a location",
    input_schema: {
      type: "object" as const,
      properties: {
        location: { type: "string", description: "City name" },
      },
      required: ["location"],
    },
  },
];

const MAX_TOKENS = 1024;

const response = await client.messages.create({
  model: "claude-sonnet-4-6",
  max_tokens: MAX_TOKENS,
  tools,
  messages: [{ role: "user", content: "Weather in Paris?" }],
});

// Check stop_reason to know if Claude wants to call a tool
if (response.stop_reason === "tool_use") {
  const toolBlock = response.content.find(
    (block): block is Anthropic.Messages.ToolUseBlock =>
      block.type === "tool_use",
  );
  if (toolBlock) {
    console.log(`Call ${toolBlock.name} with:`, toolBlock.input);
  }
}

Why good: Typed tool definitions, stop_reason checked, type guard for ToolUseBlock

// BAD: Not checking stop_reason, not sending tool_result back
const response = await client.messages.create({
  /* ... with tools */
});
console.log(response.content[0]); // May be a tool_use block, not text!

Why bad: When Claude wants to call a tool, there is no text content -- only tool_use blocks. You must execute the tool and send back a tool_result to get the final answer.

See: examples/tool-use.md for complete tool loops, parallel tool calls, automated tool runner

Pattern 5: Vision & Documents

Pass images and PDFs as content blocks alongside text.

import { readFileSync } from "node:fs";

const MAX_TOKENS = 1024;
const imageData = readFileSync("photo.jpg").toString("base64");

const message = await client.messages.create({
  model: "claude-sonnet-4-6",
  max_tokens: MAX_TOKENS,
  messages: [
    {
      role: "user",
      content: [
        {
          type: "image",
          source: { type: "base64", media_type: "image/jpeg", data: imageData },
        },
        { type: "text", text: "What's in this image?" },
      ],
    },
  ],
});

Why good: Multi-part content array, explicit media type, text and image combined in one message

See: examples/vision-documents.md for URL images, PDFs, multiple images

Pattern 6: Extended Thinking

Enable extended thinking for complex reasoning. Responses include thinking content blocks. Use adaptive thinking on Opus 4.6 and Sonnet 4.6 (recommended). Use manual budget_tokens on older models.

const MAX_TOKENS = 16_000;

// Adaptive thinking (recommended for 4.6 models)
const response = await client.messages.create({
  model: "claude-sonnet-4-6",
  max_tokens: MAX_TOKENS,
  thinking: { type: "adaptive" },
  messages: [
    { role: "user", content: "Prove there are infinitely many primes." },
  ],
} as unknown as Anthropic.MessageCreateParamsNonStreaming);

for (const block of response.content) {
  if (block.type === "thinking") {
    console.log("Thinking:", block.thinking);
  } else if (block.type === "text") {
    console.log("Answer:", block.text);
  }
}

Why good: Adaptive thinking lets Claude decide how much to reason, iterates content blocks, handles both thinking and text blocks

// Manual thinking (deprecated on 4.6 models, required on older models)
const THINKING_BUDGET = 10_000;

const response = await client.messages.create({
  model: "claude-sonnet-4-5",
  max_tokens: MAX_TOKENS,
  thinking: { type: "enabled", budget_tokens: THINKING_BUDGET },
  messages: [
    { role: "user", content: "Prove there are infinitely many primes." },
  ],
});

Note: The TypeScript SDK does not yet have "adaptive" in its type definitions. The as unknown as Anthropic.MessageCreateParamsNonStreaming assertion is required until the SDK types are updated.

See: examples/extended-thinking.md for streaming thinking, thinking with tools, display options

Pattern 7: Structured Outputs

Use zodOutputFormat() and messages.parse() for type-safe structured responses.

import { zodOutputFormat } from "@anthropic-ai/sdk/helpers/zod";
import { z } from "zod";

const ContactInfo = z.object({
  name: z.string(),
  email: z.string(),
  topics: z.array(z.string()),
});

const MAX_TOKENS = 1024;

const response = await client.messages.parse({
  model: "claude-sonnet-4-6",
  max_tokens: MAX_TOKENS,
  messages: [
    {
      role: "user",
      content:
        "Extract info: John (john@example.com) asked about billing and API limits.",
    },
  ],
  output_config: { format: zodOutputFormat(ContactInfo) },
});

const parsed = response.parsed_output; // Fully typed: { name, email, topics }

Why good: Auto-converts Zod schema, validates output, fully typed result

See: examples/core.md for raw JSON schema, combined with tool use

Pattern 8: Prompt Caching

Cache large system prompts and conversation prefixes for cost savings.

const MAX_TOKENS = 1024;

const response = await client.messages.create({
  model: "claude-sonnet-4-6",
  max_tokens: MAX_TOKENS,
  system: [
    {
      type: "text",
      text: "You are a legal document analyst.",
    },
    {
      type: "text",
      text: largeDocumentText, // 50+ pages of legal text
      cache_control: { type: "ephemeral" },
    },
  ],
  messages: [{ role: "user", content: "What are the key terms?" }],
});

// Check cache performance
console.log("Cache read tokens:", response.usage.cache_read_input_tokens);
console.log("Cache write tokens:", response.usage.cache_creation_input_tokens);

Why good: Cache breakpoint on the large static content, cache metrics tracked

See: reference.md for cache pricing, TTL options, automatic caching

Pattern 9: Error Handling

Always catch Anthropic.APIError and its subclasses. Re-throw unexpected errors.

try {
  const message = await client.messages.create({
    model: "claude-sonnet-4-6",
    max_tokens: 1024,
    messages: [{ role: "user", content: "Hello" }],
  });
} catch (error) {
  if (error instanceof Anthropic.APIError) {
    console.error(`API Error [${error.status}]: ${error.message}`);

    if (error instanceof Anthropic.RateLimitError) {
      console.error("Rate limited -- SDK will auto-retry 2 times");
    }
    if (error instanceof Anthropic.AuthenticationError) {
      throw new Error("Invalid API key. Check ANTHROPIC_API_KEY.");
    }
  } else {
    throw error; // Re-throw non-API errors
  }
}

Why good: Specific error types, status code access, re-throws unexpected errors

See: examples/core.md for full error hierarchy, stream error handling

</patterns>

Performance Optimization

Model Selection for Cost/Speed

Most capable, complex reasoning -> claude-opus-4-6  (1M context, 128K output)
General purpose, best value      -> claude-sonnet-4-6 (1M context, 64K output)
Fast + cheap, simple tasks       -> claude-haiku-4-5  (200K context, 64K output)
Extended thinking                -> claude-sonnet-4-6 or claude-opus-4-6 (use adaptive thinking)
Vision / multimodal              -> claude-sonnet-4-6 or claude-opus-4-6
Batch processing                 -> Any model at 50% batch discount

Key Optimization Patterns

Track token usage via message.usage for cost visibility (input_tokens, output_tokens)
Check stop_reason === "max_tokens" to detect truncated output
Use prompt caching for large system prompts -- cache reads cost 0.1x base input price
Use messages.countTokens() before sending to estimate costs
Use Batch API for high-volume async jobs at 50% cost reduction
Use AbortController to cancel long-running requests
Set temperature: 0 for deterministic output when caching matters

</performance>

<decision_framework>

Decision Framework

Which Model to Choose

What is your task?
+-- Complex reasoning / analysis    -> claude-opus-4-6
+-- General purpose (best balance)  -> claude-sonnet-4-6
+-- Fast + cheap, high throughput   -> claude-haiku-4-5
+-- Extended thinking needed        -> claude-sonnet-4-6 (or opus-4-6 with adaptive thinking)
+-- Vision / image analysis         -> claude-sonnet-4-6 or claude-opus-4-6
+-- Batch processing                -> Any model (50% discount)

Streaming vs Non-Streaming

Is the response user-facing?
+-- YES -> Use streaming (client.messages.stream())
|   +-- Need event-level control? -> .on("text", ...) + .on("contentBlock", ...)
|   +-- Just want final message?  -> stream.finalMessage() (avoids HTTP timeouts on large responses)
+-- NO -> Use non-streaming (client.messages.create())
    +-- Background processing  -> messages.create()
    +-- Structured output      -> messages.parse()
    +-- High volume            -> Batch API

When to Use Extended Thinking

Does the task require multi-step reasoning?
+-- YES -> Which model?
|   +-- Opus 4.6 or Sonnet 4.6? -> Use adaptive: thinking: { type: "adaptive" }
|   |   +-- Control depth?       -> Add output_config: { effort: "high" | "medium" | "low" }
|   |   +-- Opus only max depth? -> effort: "max"
|   +-- Older models?            -> Manual: thinking: { type: "enabled", budget_tokens: N }
+-- NO -> Standard messages.create() is sufficient (omit thinking param or type: "disabled")

</decision_framework>

<red_flags>

RED FLAGS

High Priority Issues:

Not providing max_tokens (request will be rejected -- it has no default)
Hardcoding API keys instead of using environment variables (security breach risk)
Treating response.content as a string instead of iterating content blocks (crashes on tool_use or thinking blocks)
Not checking stop_reason for "tool_use" (breaks function calling flows -- Claude is waiting for tool results)
Using bare catch blocks without checking Anthropic.APIError (hides API-specific error information)

Medium Priority Issues:

Not setting maxRetries / timeout for production deployments (default timeout is 10 minutes, which may be too long)
Ignoring stop_reason === "max_tokens" (response was truncated but you are using it as complete)
Ignoring usage data (no cost visibility or budget tracking)
Not sending thinking blocks back in multi-turn conversations when using extended thinking (Claude loses reasoning context)
Changing thinking parameters between turns in a tool use loop (invalidates message cache, causes errors)

Common Mistakes:

Using system as a message role instead of the top-level system parameter (there is no system role in messages -- use the system parameter)
Assuming response.content has exactly one block (it can have multiple text, tool_use, and thinking blocks)
Not passing tool_result back after a tool_use response (Claude cannot continue without it)
Using max_completion_tokens instead of max_tokens (the Anthropic API uses max_tokens, not max_completion_tokens)
Using response_format instead of output_config for structured outputs (wrong parameter name)
Forgetting that budget_tokens must be less than max_tokens (except with interleaved thinking)

Gotchas & Edge Cases:

The SDK auto-retries on 429 (rate limit), 529 (overloaded), 408 (timeout), 409 (conflict), and 5xx errors -- 2 retries by default with exponential backoff. Disable with maxRetries: 0.
client.messages.stream() returns a MessageStream with event helpers. client.messages.create({ stream: true }) returns a raw async iterable of SSE events. They are different APIs.
When using extended thinking with tool use, you must include the thinking blocks unmodified when sending conversation history back. Omitting or modifying them causes errors.
tool_choice: { type: "any" } forces Claude to call a tool but cannot be used with extended thinking. Only "auto" and "none" work with thinking enabled.
Prompt caching requires a minimum of 1024-4096 tokens (model-dependent) to be cacheable. Small prompts will not be cached.
Cache breakpoints on messages are invalidated when thinking parameters change between requests. System prompt cache is preserved.
budget_tokens is deprecated on both Claude Opus 4.6 and Sonnet 4.6 -- use thinking: { type: "adaptive" } instead. budget_tokens still works but will be removed in a future release.
The display field on thinking config controls whether thinking text is returned: "summarized" (default) or "omitted" (only signature, faster streaming).
Adaptive thinking automatically enables interleaved thinking (thinking between tool calls). Manual mode on Sonnet 4.6 requires the interleaved-thinking-2025-05-14 beta header for interleaved thinking.
The effort parameter (output_config: { effort: "high" | "medium" | "low" | "max" }) works with adaptive thinking to control thinking depth. "max" is Opus 4.6 only.
The TypeScript SDK does not yet include "adaptive" in its type definitions -- use a type assertion when passing thinking: { type: "adaptive" }.
Multi-turn conversations require you to include the full assistant response (all content blocks) in the conversation history, not just the text.
Batch API requests have a 24-hour completion window. Use messages.batches.results() to retrieve completed results.

</red_flags>

<critical_reminders>

CRITICAL REMINDERS