ai-provider-anthropic-sdk
TypeScript/Node.js向けのAnthropic公式SDKを活用し、Messages API、ストリーミング、ツール利用、画像認識、高度な思考、構造化出力、プロンプトキャッシュ、バッチAPI、本番環境でのベストプラクティスを実装するSkill。
📜 元の英語説明(参考)
Official Anthropic SDK patterns for TypeScript/Node.js — client setup, Messages API, streaming, tool use, vision, extended thinking, structured outputs, prompt caching, batch API, and production best practices
🇯🇵 日本人クリエイター向け解説
TypeScript/Node.js向けのAnthropic公式SDKを活用し、Messages API、ストリーミング、ツール利用、画像認識、高度な思考、構造化出力、プロンプトキャッシュ、バッチAPI、本番環境でのベストプラクティスを実装するSkill。
※ jpskill.com 編集部が日本のビジネス現場向けに補足した解説です。Skill本体の挙動とは独立した参考情報です。
下記のコマンドをコピーしてターミナル(Mac/Linux)または PowerShell(Windows)に貼り付けてください。 ダウンロード → 解凍 → 配置まで全自動。
mkdir -p ~/.claude/skills && cd ~/.claude/skills && curl -L -o ai-provider-anthropic-sdk.zip https://jpskill.com/download/10214.zip && unzip -o ai-provider-anthropic-sdk.zip && rm ai-provider-anthropic-sdk.zip
$d = "$env:USERPROFILE\.claude\skills"; ni -Force -ItemType Directory $d | Out-Null; iwr https://jpskill.com/download/10214.zip -OutFile "$d\ai-provider-anthropic-sdk.zip"; Expand-Archive "$d\ai-provider-anthropic-sdk.zip" -DestinationPath $d -Force; ri "$d\ai-provider-anthropic-sdk.zip"
完了後、Claude Code を再起動 → 普通に「動画プロンプト作って」のように話しかけるだけで自動発動します。
💾 手動でダウンロードしたい(コマンドが難しい人向け)
- 1. 下の青いボタンを押して
ai-provider-anthropic-sdk.zipをダウンロード - 2. ZIPファイルをダブルクリックで解凍 →
ai-provider-anthropic-sdkフォルダができる - 3. そのフォルダを
C:\Users\あなたの名前\.claude\skills\(Win)または~/.claude/skills/(Mac)へ移動 - 4. Claude Code を再起動
⚠️ ダウンロード・利用は自己責任でお願いします。当サイトは内容・動作・安全性について責任を負いません。
🎯 このSkillでできること
下記の説明文を読むと、このSkillがあなたに何をしてくれるかが分かります。Claudeにこの分野の依頼をすると、自動で発動します。
📦 インストール方法 (3ステップ)
- 1. 上の「ダウンロード」ボタンを押して .skill ファイルを取得
- 2. ファイル名の拡張子を .skill から .zip に変えて展開(macは自動展開可)
- 3. 展開してできたフォルダを、ホームフォルダの
.claude/skills/に置く- · macOS / Linux:
~/.claude/skills/ - · Windows:
%USERPROFILE%\.claude\skills\
- · macOS / Linux:
Claude Code を再起動すれば完了。「このSkillを使って…」と話しかけなくても、関連する依頼で自動的に呼び出されます。
詳しい使い方ガイドを見る →- 最終更新
- 2026-05-18
- 取得日時
- 2026-05-18
- 同梱ファイル
- 1
📖 Skill本文(日本語訳)
※ 原文(英語/中国語)を Gemini で日本語化したものです。Claude 自身は原文を読みます。誤訳がある場合は原文をご確認ください。
Anthropic SDK のパターン
クイックガイド: Claude モデルと直接やり取りするには、公式の
@anthropic-ai/sdkパッケージを使用します。単一ターンおよび複数ターンの会話にはclient.messages.create()を使用します。イベントベースの消費でストリーミングするにはclient.messages.stream()を使用します。max_tokensは常に必須です。コンテンツブロックは型付きユニオン (text、tool_use、thinking) です。構造化された出力には、zodOutputFormat()と共にclient.messages.parse()を使用します。ツールの使用にはツール結果ループが必要です -- Claude はtool_useブロックを返し、ツールを実行してtool_resultブロックを返送します。拡張された思考は、応答の前にthinkingコンテンツブロックを追加します。
<critical_requirements>
重要: この Skill を使用する前に
すべてのコードは、CLAUDE.md のプロジェクト規則に従う必要があります (kebab-case、名前付きエクスポート、インポート順序、
import type、名前付き定数)
(messages.create() / messages.stream() のすべての呼び出しで、常に max_tokens を指定する必要があります -- これは必須であり、デフォルトはありません)
(stop_reason フィールドを処理して、end_turn、max_tokens、tool_use、および stop_sequence を検出する必要があります -- これを無視すると、サイレントな切り捨てまたは壊れたツールループが発生します)
(response.content ブロックを反復処理する必要があります (単一のテキストブロックを想定しないでください) -- 応答には、text、tool_use、および thinking ブロックが混在している可能性があります)
(Anthropic.APIError とそのサブクラスを使用してエラーを処理する必要があります -- エラータイプチェックなしでベア catch ブロックを使用しないでください)
(process.env.ANTHROPIC_API_KEY を介して、API キーをハードコードしないでください -- 常に環境変数を使用してください)
</critical_requirements>
自動検出: Anthropic, @anthropic-ai/sdk, client.messages.create, client.messages.stream, client.messages.parse, client.messages.countTokens, client.messages.batches, ANTHROPIC_API_KEY, claude-sonnet, claude-opus, claude-haiku, ContentBlock, ToolUseBlock, tool_use, tool_result, thinking, budget_tokens, adaptive, cache_control, zodOutputFormat, betaZodTool, toolRunner
使用する場合:
- Claude モデル (Opus、Sonnet、Haiku ファミリー) を直接呼び出すアプリケーションを構築する場合
- イベントベースのテキスト蓄積によるストリーミングチャット応答を実装する場合
- Claude がどのツールを呼び出すかを決定するツール使用/関数呼び出しを使用する場合
- テキストプロンプトと一緒に画像、PDF、またはドキュメントを処理する場合
- 複雑な推論タスクのために拡張された思考を有効にする場合
- Zod スキーマ検証を使用して応答から構造化されたデータを抽出する場合
- コスト削減のために大規模なシステムプロンプトまたは会話プレフィックスをキャッシュする場合
- 大量の非同期処理のためにバッチジョブを実行する場合
- コスト見積のためにリクエストを送信する前にトークンをカウントする場合
カバーする主要なパターン:
- クライアントの初期化と構成 (再試行、タイムアウト、API キー)
- Messages API (
messages.create、システムプロンプト、複数ターンの会話) .stream()ヘルパーとstream: trueローレベル SSE を使用したストリーミング- ツール使用/関数呼び出し (ツール配列、
tool_use/tool_resultコンテンツブロック) - Vision (base64 画像、URL 画像、PDF/ドキュメント)
- 拡張された思考 (
thinking構成、budget_tokens、thinking コンテンツブロック) - 構造化された出力 (
zodOutputFormat、messages.parse、output_config) - プロンプトキャッシュ (
cache_control: { type: "ephemeral" }) - Batch API (
messages.batches.create) - トークンカウント (
messages.countTokens) - エラー処理、再試行、および本番環境のベストプラクティス
使用しない場合:
- 複数の LLM プロバイダーを切り替える必要があるマルチプロバイダーアプリケーション -- 代わりに統合されたプロバイダー SDK を使用してください
- React 固有のチャット UI フック (
useChat、useCompletion) -- フレームワーク統合された AI SDK を使用してください - より高レベルのエージェントフレームワークが必要な場合 -- Claude Agent SDK (
@anthropic-ai/claude-agent-sdk) を検討してください
例のインデックス
- コア: セットアップと構成 -- クライアントの初期化、本番環境構成、エラー処理、トークンカウント
- ストリーミング --
.stream()ヘルパー、stream: trueSSE、イベントタイプ、中断 - ツール使用/関数呼び出し -- ツールの定義、ツールループ、並列ツール呼び出し、自動ツールランナー
- Vision & ドキュメント -- Base64 画像、URL 画像、PDF、マルチモーダル
- 拡張された思考 -- Thinking 構成、ストリーミング思考、ツール使用による思考
- クイック API リファレンス -- モデル ID、メソッドシグネチャ、エラータイプ、ストリーミングイベント、コンテンツブロックタイプ
<philosophy>
哲学
公式の Anthropic SDK は、Claude API への 直接的で型付きのアクセス を提供します。これは、Stainless を使用して Anthropic の API 仕様から自動生成され、Anthropic がドキュメント化している正確な API サーフェスを完全な TypeScript 型で提供します。
コア原則:
- 文字列ではなくコンテンツブロック -- 応答は、プレーンな文字列ではなく、型付きコンテンツブロック (
TextBlock、ToolUseBlock、ThinkingBlock) の配列です。常にresponse.contentを反復処理し、block.typeで切り替えます。 - 明示的なリソース制限 --
max_tokensは常に必須です。デフォルトはありません。API はそれなしでリクエストを拒否します。 - ツールの使用は会話ループです --
stop_reason === "tool_use"の場合、Claude はツールを実行するように要求しています。会話を続けるには、結果をtool_resultコンテンツブロックとして返送する必要があります。 - 組み込みの回復力 -- SDK は、429、409、408、529、および 5xx エラーに対して、デフォルトで 2 回、指数バックオフで再試行します。
- ファーストクラスのパターンとしてのストリーミング --
.on("text", ...)を持つイベントベースの API には.stream()を使用し、生の SSE 反復にはstream: trueを使用します。
Anthropic SDK を直接使用する場合:
- Claude モデルのみを使用し、最もシンプルで直接的な統合が必要な場合
- Anthropic 固有の機能 (拡張された思考、プロンプトキャッシュ、バッチ API) へのアクセスが必要な場合
- 最小限の依存関係とゼロ抽象化オーバーヘッドが必要な場合
- 最新の API 機能を初日から必要とする場合
使用しない場合:
- 複数の LLM プロバイダーを切り替える必要がある場合
(原文がここで切り詰められています)
📜 原文 SKILL.md(Claudeが読む英語/中国語)を展開
Anthropic SDK Patterns
Quick Guide: Use the official
@anthropic-ai/sdkpackage to interact with Claude models directly. Useclient.messages.create()for single-turn and multi-turn conversations. Useclient.messages.stream()for streaming with event-based consumption.max_tokensis always required. Content blocks are typed unions (text,tool_use,thinking). Useclient.messages.parse()withzodOutputFormat()for structured outputs. Tool use requires a tool-result loop -- Claude returnstool_useblocks, you execute the tool and send backtool_resultblocks. Extended thinking addsthinkingcontent blocks before the response.
<critical_requirements>
CRITICAL: Before Using This Skill
All code must follow project conventions in CLAUDE.md (kebab-case, named exports, import ordering,
import type, named constants)
(You MUST always provide max_tokens in every messages.create() / messages.stream() call -- it is required and has no default)
(You MUST handle the stop_reason field to detect end_turn, max_tokens, tool_use, and stop_sequence -- ignoring it causes silent truncation or broken tool loops)
(You MUST iterate over response.content blocks (not assume a single text block) -- responses can contain text, tool_use, and thinking blocks mixed together)
(You MUST handle errors using Anthropic.APIError and its subclasses -- never use bare catch blocks without error type checking)
(You MUST never hardcode API keys -- always use environment variables via process.env.ANTHROPIC_API_KEY)
</critical_requirements>
Auto-detection: Anthropic, @anthropic-ai/sdk, client.messages.create, client.messages.stream, client.messages.parse, client.messages.countTokens, client.messages.batches, ANTHROPIC_API_KEY, claude-sonnet, claude-opus, claude-haiku, ContentBlock, ToolUseBlock, tool_use, tool_result, thinking, budget_tokens, adaptive, cache_control, zodOutputFormat, betaZodTool, toolRunner
When to use:
- Building applications that call Claude models directly (Opus, Sonnet, Haiku families)
- Implementing streaming chat responses with event-based text accumulation
- Using tool use / function calling where Claude decides which tools to invoke
- Processing images, PDFs, or documents alongside text prompts
- Enabling extended thinking for complex reasoning tasks
- Extracting structured data from responses with Zod schema validation
- Caching large system prompts or conversation prefixes for cost savings
- Running batch jobs for high-volume, asynchronous processing
- Counting tokens before sending requests for cost estimation
Key patterns covered:
- Client initialization and configuration (retries, timeouts, API key)
- Messages API (
messages.create, system prompts, multi-turn conversations) - Streaming with
.stream()helper andstream: truelow-level SSE - Tool use / function calling (tools array,
tool_use/tool_resultcontent blocks) - Vision (base64 images, URL images, PDFs/documents)
- Extended thinking (
thinkingconfig,budget_tokens, thinking content blocks) - Structured outputs (
zodOutputFormat,messages.parse,output_config) - Prompt caching (
cache_control: { type: "ephemeral" }) - Batch API (
messages.batches.create) - Token counting (
messages.countTokens) - Error handling, retries, and production best practices
When NOT to use:
- Multi-provider applications where you need to switch between multiple LLM providers -- use a unified provider SDK instead
- React-specific chat UI hooks (
useChat,useCompletion) -- use a framework-integrated AI SDK - When you need a higher-level agent framework -- consider the Claude Agent SDK (
@anthropic-ai/claude-agent-sdk)
Examples Index
- Core: Setup & Configuration -- Client init, production config, error handling, token counting
- Streaming --
.stream()helper,stream: trueSSE, event types, abort - Tool Use / Function Calling -- Tool definitions, tool loops, parallel tool calls, automated tool runner
- Vision & Documents -- Base64 images, URL images, PDFs, multi-modal
- Extended Thinking -- Thinking config, streaming thinking, thinking with tool use
- Quick API Reference -- Model IDs, method signatures, error types, streaming events, content block types
<philosophy>
Philosophy
The official Anthropic SDK provides direct, typed access to the Claude API. It is auto-generated from Anthropic's API specification using Stainless, giving you the exact API surface that Anthropic documents with full TypeScript types.
Core principles:
- Content blocks, not strings -- Responses are arrays of typed content blocks (
TextBlock,ToolUseBlock,ThinkingBlock), not plain strings. Always iterate overresponse.contentand switch onblock.type. - Explicit resource limits --
max_tokensis always required. There is no default. The API will reject requests without it. - Tool use is a conversation loop -- When
stop_reason === "tool_use", Claude is requesting you execute a tool. You must send the result back as atool_resultcontent block to continue the conversation. - Built-in resilience -- The SDK retries 2 times by default on 429, 409, 408, 529, and 5xx errors with exponential backoff.
- Streaming as a first-class pattern -- Use
.stream()for an event-based API with.on("text", ...), orstream: truefor raw SSE iteration.
When to use the Anthropic SDK directly:
- You only use Claude models and want the simplest, most direct integration
- You need access to Anthropic-specific features (extended thinking, prompt caching, batch API)
- You want minimal dependencies and zero abstraction overhead
- You need the latest API features on day one
When NOT to use:
- You need to switch between multiple LLM providers -- use a unified provider SDK
- You want React-specific chat UI hooks -- use a framework-integrated AI SDK
- You want a higher-level agent framework -- consider the Claude Agent SDK
</philosophy>
<patterns>
Core Patterns
Pattern 1: Client Setup
Initialize the Anthropic client. It auto-reads ANTHROPIC_API_KEY from the environment.
// lib/anthropic.ts -- basic setup
import Anthropic from "@anthropic-ai/sdk";
const client = new Anthropic();
export { client };
// lib/anthropic.ts -- production configuration
const TIMEOUT_MS = 30_000;
const MAX_RETRIES = 3;
const client = new Anthropic({ timeout: TIMEOUT_MS, maxRetries: MAX_RETRIES });
Why good: Minimal setup, env var auto-detected, named constants for production settings
// BAD: Hardcoded API key
const client = new Anthropic({ apiKey: "sk-ant-api03-..." });
Why bad: Hardcoded keys get committed to version control, causing security breaches
See: examples/core.md for per-request overrides, error handling patterns, token counting
Pattern 2: Messages API
All interactions use client.messages.create(). max_tokens is always required.
const MAX_TOKENS = 1024;
const message = await client.messages.create({
model: "claude-sonnet-4-6",
max_tokens: MAX_TOKENS,
system: "You are a helpful coding assistant.",
messages: [{ role: "user", content: "Explain TypeScript generics." }],
});
// Response is an array of content blocks -- iterate, don't assume
for (const block of message.content) {
if (block.type === "text") {
console.log(block.text);
}
}
Why good: Named constant for max_tokens, system prompt separated from messages, content blocks iterated
// BAD: Assuming content is a single text string
const text = message.content[0].text; // Crashes if block is tool_use or thinking
Why bad: Content can contain multiple blocks of different types -- direct index access without type checking crashes at runtime
See: examples/core.md for multi-turn conversations, system prompts, token tracking
Pattern 3: Streaming
Use .stream() for event-based streaming with text accumulation helpers.
const MAX_TOKENS = 1024;
const stream = client.messages.stream({
model: "claude-sonnet-4-6",
max_tokens: MAX_TOKENS,
messages: [{ role: "user", content: "Explain async/await." }],
});
stream.on("text", (text) => {
process.stdout.write(text);
});
const finalMessage = await stream.finalMessage();
Why good: Event-based API handles accumulation, finalMessage() gives the complete response object
// BAD: Using stream: true without consuming events
const response = await client.messages.create({
model: "claude-sonnet-4-6",
max_tokens: MAX_TOKENS,
messages: [{ role: "user", content: "Hello" }],
stream: true,
});
// Response is an async iterable, not a Message -- must iterate
Why bad: stream: true returns an async iterable of raw SSE events, not a Message. Treating it as a Message silently breaks.
See: examples/streaming.md for raw SSE iteration, abort, stream events, streaming with thinking
Pattern 4: Tool Use / Function Calling
Define tools Claude can invoke. Handle the tool_use -> tool_result conversation loop.
const tools: Anthropic.Messages.Tool[] = [
{
name: "get_weather",
description: "Get current weather for a location",
input_schema: {
type: "object" as const,
properties: {
location: { type: "string", description: "City name" },
},
required: ["location"],
},
},
];
const MAX_TOKENS = 1024;
const response = await client.messages.create({
model: "claude-sonnet-4-6",
max_tokens: MAX_TOKENS,
tools,
messages: [{ role: "user", content: "Weather in Paris?" }],
});
// Check stop_reason to know if Claude wants to call a tool
if (response.stop_reason === "tool_use") {
const toolBlock = response.content.find(
(block): block is Anthropic.Messages.ToolUseBlock =>
block.type === "tool_use",
);
if (toolBlock) {
console.log(`Call ${toolBlock.name} with:`, toolBlock.input);
}
}
Why good: Typed tool definitions, stop_reason checked, type guard for ToolUseBlock
// BAD: Not checking stop_reason, not sending tool_result back
const response = await client.messages.create({
/* ... with tools */
});
console.log(response.content[0]); // May be a tool_use block, not text!
Why bad: When Claude wants to call a tool, there is no text content -- only tool_use blocks. You must execute the tool and send back a tool_result to get the final answer.
See: examples/tool-use.md for complete tool loops, parallel tool calls, automated tool runner
Pattern 5: Vision & Documents
Pass images and PDFs as content blocks alongside text.
import { readFileSync } from "node:fs";
const MAX_TOKENS = 1024;
const imageData = readFileSync("photo.jpg").toString("base64");
const message = await client.messages.create({
model: "claude-sonnet-4-6",
max_tokens: MAX_TOKENS,
messages: [
{
role: "user",
content: [
{
type: "image",
source: { type: "base64", media_type: "image/jpeg", data: imageData },
},
{ type: "text", text: "What's in this image?" },
],
},
],
});
Why good: Multi-part content array, explicit media type, text and image combined in one message
See: examples/vision-documents.md for URL images, PDFs, multiple images
Pattern 6: Extended Thinking
Enable extended thinking for complex reasoning. Responses include thinking content blocks. Use adaptive thinking on Opus 4.6 and Sonnet 4.6 (recommended). Use manual budget_tokens on older models.
const MAX_TOKENS = 16_000;
// Adaptive thinking (recommended for 4.6 models)
const response = await client.messages.create({
model: "claude-sonnet-4-6",
max_tokens: MAX_TOKENS,
thinking: { type: "adaptive" },
messages: [
{ role: "user", content: "Prove there are infinitely many primes." },
],
} as unknown as Anthropic.MessageCreateParamsNonStreaming);
for (const block of response.content) {
if (block.type === "thinking") {
console.log("Thinking:", block.thinking);
} else if (block.type === "text") {
console.log("Answer:", block.text);
}
}
Why good: Adaptive thinking lets Claude decide how much to reason, iterates content blocks, handles both thinking and text blocks
// Manual thinking (deprecated on 4.6 models, required on older models)
const THINKING_BUDGET = 10_000;
const response = await client.messages.create({
model: "claude-sonnet-4-5",
max_tokens: MAX_TOKENS,
thinking: { type: "enabled", budget_tokens: THINKING_BUDGET },
messages: [
{ role: "user", content: "Prove there are infinitely many primes." },
],
});
Note: The TypeScript SDK does not yet have "adaptive" in its type definitions. The as unknown as Anthropic.MessageCreateParamsNonStreaming assertion is required until the SDK types are updated.
See: examples/extended-thinking.md for streaming thinking, thinking with tools, display options
Pattern 7: Structured Outputs
Use zodOutputFormat() and messages.parse() for type-safe structured responses.
import { zodOutputFormat } from "@anthropic-ai/sdk/helpers/zod";
import { z } from "zod";
const ContactInfo = z.object({
name: z.string(),
email: z.string(),
topics: z.array(z.string()),
});
const MAX_TOKENS = 1024;
const response = await client.messages.parse({
model: "claude-sonnet-4-6",
max_tokens: MAX_TOKENS,
messages: [
{
role: "user",
content:
"Extract info: John (john@example.com) asked about billing and API limits.",
},
],
output_config: { format: zodOutputFormat(ContactInfo) },
});
const parsed = response.parsed_output; // Fully typed: { name, email, topics }
Why good: Auto-converts Zod schema, validates output, fully typed result
See: examples/core.md for raw JSON schema, combined with tool use
Pattern 8: Prompt Caching
Cache large system prompts and conversation prefixes for cost savings.
const MAX_TOKENS = 1024;
const response = await client.messages.create({
model: "claude-sonnet-4-6",
max_tokens: MAX_TOKENS,
system: [
{
type: "text",
text: "You are a legal document analyst.",
},
{
type: "text",
text: largeDocumentText, // 50+ pages of legal text
cache_control: { type: "ephemeral" },
},
],
messages: [{ role: "user", content: "What are the key terms?" }],
});
// Check cache performance
console.log("Cache read tokens:", response.usage.cache_read_input_tokens);
console.log("Cache write tokens:", response.usage.cache_creation_input_tokens);
Why good: Cache breakpoint on the large static content, cache metrics tracked
See: reference.md for cache pricing, TTL options, automatic caching
Pattern 9: Error Handling
Always catch Anthropic.APIError and its subclasses. Re-throw unexpected errors.
try {
const message = await client.messages.create({
model: "claude-sonnet-4-6",
max_tokens: 1024,
messages: [{ role: "user", content: "Hello" }],
});
} catch (error) {
if (error instanceof Anthropic.APIError) {
console.error(`API Error [${error.status}]: ${error.message}`);
if (error instanceof Anthropic.RateLimitError) {
console.error("Rate limited -- SDK will auto-retry 2 times");
}
if (error instanceof Anthropic.AuthenticationError) {
throw new Error("Invalid API key. Check ANTHROPIC_API_KEY.");
}
} else {
throw error; // Re-throw non-API errors
}
}
Why good: Specific error types, status code access, re-throws unexpected errors
See: examples/core.md for full error hierarchy, stream error handling
</patterns>
<performance>
Performance Optimization
Model Selection for Cost/Speed
Most capable, complex reasoning -> claude-opus-4-6 (1M context, 128K output)
General purpose, best value -> claude-sonnet-4-6 (1M context, 64K output)
Fast + cheap, simple tasks -> claude-haiku-4-5 (200K context, 64K output)
Extended thinking -> claude-sonnet-4-6 or claude-opus-4-6 (use adaptive thinking)
Vision / multimodal -> claude-sonnet-4-6 or claude-opus-4-6
Batch processing -> Any model at 50% batch discount
Key Optimization Patterns
- Track token usage via
message.usagefor cost visibility (input_tokens,output_tokens) - Check
stop_reason === "max_tokens"to detect truncated output - Use prompt caching for large system prompts -- cache reads cost 0.1x base input price
- Use
messages.countTokens()before sending to estimate costs - Use Batch API for high-volume async jobs at 50% cost reduction
- Use
AbortControllerto cancel long-running requests - Set
temperature: 0for deterministic output when caching matters
</performance>
<decision_framework>
Decision Framework
Which Model to Choose
What is your task?
+-- Complex reasoning / analysis -> claude-opus-4-6
+-- General purpose (best balance) -> claude-sonnet-4-6
+-- Fast + cheap, high throughput -> claude-haiku-4-5
+-- Extended thinking needed -> claude-sonnet-4-6 (or opus-4-6 with adaptive thinking)
+-- Vision / image analysis -> claude-sonnet-4-6 or claude-opus-4-6
+-- Batch processing -> Any model (50% discount)
Streaming vs Non-Streaming
Is the response user-facing?
+-- YES -> Use streaming (client.messages.stream())
| +-- Need event-level control? -> .on("text", ...) + .on("contentBlock", ...)
| +-- Just want final message? -> stream.finalMessage() (avoids HTTP timeouts on large responses)
+-- NO -> Use non-streaming (client.messages.create())
+-- Background processing -> messages.create()
+-- Structured output -> messages.parse()
+-- High volume -> Batch API
When to Use Extended Thinking
Does the task require multi-step reasoning?
+-- YES -> Which model?
| +-- Opus 4.6 or Sonnet 4.6? -> Use adaptive: thinking: { type: "adaptive" }
| | +-- Control depth? -> Add output_config: { effort: "high" | "medium" | "low" }
| | +-- Opus only max depth? -> effort: "max"
| +-- Older models? -> Manual: thinking: { type: "enabled", budget_tokens: N }
+-- NO -> Standard messages.create() is sufficient (omit thinking param or type: "disabled")
</decision_framework>
<red_flags>
RED FLAGS
High Priority Issues:
- Not providing
max_tokens(request will be rejected -- it has no default) - Hardcoding API keys instead of using environment variables (security breach risk)
- Treating
response.contentas a string instead of iterating content blocks (crashes ontool_useorthinkingblocks) - Not checking
stop_reasonfor"tool_use"(breaks function calling flows -- Claude is waiting for tool results) - Using bare
catchblocks without checkingAnthropic.APIError(hides API-specific error information)
Medium Priority Issues:
- Not setting
maxRetries/timeoutfor production deployments (default timeout is 10 minutes, which may be too long) - Ignoring
stop_reason === "max_tokens"(response was truncated but you are using it as complete) - Ignoring
usagedata (no cost visibility or budget tracking) - Not sending
thinkingblocks back in multi-turn conversations when using extended thinking (Claude loses reasoning context) - Changing
thinkingparameters between turns in a tool use loop (invalidates message cache, causes errors)
Common Mistakes:
- Using
systemas a message role instead of the top-levelsystemparameter (there is nosystemrole in messages -- use thesystemparameter) - Assuming
response.contenthas exactly one block (it can have multipletext,tool_use, andthinkingblocks) - Not passing
tool_resultback after atool_useresponse (Claude cannot continue without it) - Using
max_completion_tokensinstead ofmax_tokens(the Anthropic API usesmax_tokens, notmax_completion_tokens) - Using
response_formatinstead ofoutput_configfor structured outputs (wrong parameter name) - Forgetting that
budget_tokensmust be less thanmax_tokens(except with interleaved thinking)
Gotchas & Edge Cases:
- The SDK auto-retries on 429 (rate limit), 529 (overloaded), 408 (timeout), 409 (conflict), and 5xx errors -- 2 retries by default with exponential backoff. Disable with
maxRetries: 0. client.messages.stream()returns aMessageStreamwith event helpers.client.messages.create({ stream: true })returns a raw async iterable of SSE events. They are different APIs.- When using extended thinking with tool use, you must include the
thinkingblocks unmodified when sending conversation history back. Omitting or modifying them causes errors. tool_choice: { type: "any" }forces Claude to call a tool but cannot be used with extended thinking. Only"auto"and"none"work with thinking enabled.- Prompt caching requires a minimum of 1024-4096 tokens (model-dependent) to be cacheable. Small prompts will not be cached.
- Cache breakpoints on messages are invalidated when
thinkingparameters change between requests. System prompt cache is preserved. budget_tokensis deprecated on both Claude Opus 4.6 and Sonnet 4.6 -- usethinking: { type: "adaptive" }instead.budget_tokensstill works but will be removed in a future release.- The
displayfield on thinking config controls whether thinking text is returned:"summarized"(default) or"omitted"(only signature, faster streaming). - Adaptive thinking automatically enables interleaved thinking (thinking between tool calls). Manual mode on Sonnet 4.6 requires the
interleaved-thinking-2025-05-14beta header for interleaved thinking. - The
effortparameter (output_config: { effort: "high" | "medium" | "low" | "max" }) works with adaptive thinking to control thinking depth."max"is Opus 4.6 only. - The TypeScript SDK does not yet include
"adaptive"in its type definitions -- use a type assertion when passingthinking: { type: "adaptive" }. - Multi-turn conversations require you to include the full assistant response (all content blocks) in the conversation history, not just the text.
- Batch API requests have a 24-hour completion window. Use
messages.batches.results()to retrieve completed results.
</red_flags>
<critical_reminders>
CRITICAL REMINDERS
All code must follow project conventions in CLAUDE.md (kebab-case, named exports, import ordering,
import type, named constants)
(You MUST always provide max_tokens in every messages.create() / messages.stream() call -- it is required and has no default)
(You MUST handle the stop_reason field to detect end_turn, max_tokens, tool_use, and stop_sequence -- ignoring it causes silent truncation or broken tool loops)
(You MUST iterate over response.content blocks (not assume a single text block) -- responses can contain text, tool_use, and thinking blocks mixed together)
(You MUST handle errors using Anthropic.APIError and its subclasses -- never use bare catch blocks without error type checking)
(You MUST never hardcode API keys -- always use environment variables via process.env.ANTHROPIC_API_KEY)
Failure to follow these rules will produce broken tool loops, silent truncation, security vulnerabilities, or untyped AI integrations.
</critical_reminders>