🛠️ 開発・MCP コミュニティ

phoenix-evals

Build and run evaluators for AI/LLM applications using Phoenix.

⚡ おすすめ: コマンド1行でインストール(60秒)

下記のコマンドをコピーしてターミナル(Mac/Linux)または PowerShell(Windows)に貼り付けてください。ダウンロード → 解凍 → 配置まで全自動。

🍎 Mac / 🐧 Linux

mkdir -p ~/.claude/skills && cd ~/.claude/skills && curl -L -o phoenix-evals.zip https://jpskill.com/download/23149.zip && unzip -o phoenix-evals.zip && rm phoenix-evals.zip

🪟 Windows (PowerShell)

$d = "$env:USERPROFILE\.claude\skills"; ni -Force -ItemType Directory $d | Out-Null; iwr https://jpskill.com/download/23149.zip -OutFile "$d\phoenix-evals.zip"; Expand-Archive "$d\phoenix-evals.zip" -DestinationPath $d -Force; ri "$d\phoenix-evals.zip"

完了後、Claude Code を再起動 → 普通に「動画プロンプト作って」のように話しかけるだけで自動発動します。

💾 手動でダウンロードしたい(コマンドが難しい人向け)

1. 下の青いボタンを押して phoenix-evals.zip をダウンロード
2. ZIPファイルをダブルクリックで解凍 → phoenix-evals フォルダができる
3. そのフォルダを C:\Users\あなたの名前\.claude\skills\(Win)または ~/.claude/skills/(Mac)へ移動
4. Claude Code を再起動

⬇ .zip でダウンロード(推奨) ⬇ .skill 形式(上級者用) 元のソース ↗

⚠️ ダウンロード・利用は自己責任でお願いします。当サイトは内容・動作・安全性について責任を負いません。

🎯 このSkillでできること

下記の説明文を読むと、このSkillがあなたに何をしてくれるかが分かります。Claudeにこの分野の依頼をすると、自動で発動します。

📦 インストール方法 (3ステップ)

1. 上の「ダウンロード」ボタンを押して .skill ファイルを取得
2. ファイル名の拡張子を .skill から .zip に変えて展開(macは自動展開可)
3. 展開してできたフォルダを、ホームフォルダの .claude/skills/ に置く
- · macOS / Linux: ~/.claude/skills/
- · Windows: %USERPROFILE%\.claude\skills\

Claude Code を再起動すれば完了。「このSkillを使って…」と話しかけなくても、関連する依頼で自動的に呼び出されます。

詳しい使い方ガイドを見る →

最終更新: 2026-05-18
取得日時: 2026-05-18
同梱ファイル: 35

📖 Claude が読む原文 SKILL.md(中身を展開)

この本文は AI(Claude)が読むための原文(英語または中国語)です。日本語訳は順次追加中。

Phoenix Evals

Build evaluators for AI/LLM applications. Code first, LLM for nuance, validate against humans.

Quick Reference

Task	Files
Setup	setup-python, setup-typescript
Decide what to evaluate	evaluators-overview
Choose a judge model	fundamentals-model-selection
Use pre-built evaluators	evaluators-pre-built
Build code evaluator	evaluators-code-python, evaluators-code-typescript
Build LLM evaluator	evaluators-llm-python, evaluators-llm-typescript, evaluators-custom-templates
Batch evaluate DataFrame	evaluate-dataframe-python
Run experiment	experiments-running-python, experiments-running-typescript
Create dataset	experiments-datasets-python, experiments-datasets-typescript
Generate synthetic data	experiments-synthetic-python, experiments-synthetic-typescript
Validate evaluator accuracy	validation, validation-evaluators-python, validation-evaluators-typescript
Sample traces for review	observe-sampling-python, observe-sampling-typescript
Analyze errors	error-analysis, error-analysis-multi-turn, axial-coding
RAG evals	evaluators-rag
Avoid common mistakes	common-mistakes-python, fundamentals-anti-patterns
Production	production-overview, production-guardrails, production-continuous

Workflows

Starting Fresh: observe-tracing-setup → error-analysis → axial-coding → evaluators-overview

Building Evaluator: fundamentals → common-mistakes-python → evaluators-{code|llm}-{python|typescript} → validation-evaluators-{python|typescript}

RAG Systems: evaluators-rag → evaluators-code- (retrieval) → evaluators-llm- (faithfulness)

Production: production-overview → production-guardrails → production-continuous

Reference Categories

Prefix	Description
`fundamentals-*`	Types, scores, anti-patterns
`observe-*`	Tracing, sampling
`error-analysis-*`	Finding failures
`axial-coding-*`	Categorizing failures
`evaluators-*`	Code, LLM, RAG evaluators
`experiments-*`	Datasets, running experiments
`validation-*`	Validating evaluator accuracy against human labels
`production-*`	CI/CD, monitoring

Key Principles

Principle	Action
Error analysis first	Can't automate what you haven't observed
Custom > generic	Build from your failures
Code first	Deterministic before LLM
Validate judges	>80% TPR/TNR
Binary > Likert	Pass/fail, not 1-5

同梱ファイル

※ ZIPに含まれるファイル一覧。`SKILL.md` 本体に加え、参考資料・サンプル・スクリプトが入っている場合があります。

📄 SKILL.md (4,523 bytes)
📎 references/axial-coding.md (2,373 bytes)
📎 references/common-mistakes-python.md (7,156 bytes)
📎 references/error-analysis-multi-turn.md (1,371 bytes)
📎 references/error-analysis.md (4,316 bytes)
📎 references/evaluate-dataframe-python.md (4,464 bytes)
📎 references/evaluators-code-python.md (3,045 bytes)
📎 references/evaluators-code-typescript.md (1,207 bytes)
📎 references/evaluators-custom-templates.md (1,179 bytes)
📎 references/evaluators-llm-python.md (2,615 bytes)
📎 references/evaluators-llm-typescript.md (1,420 bytes)
📎 references/evaluators-overview.md (1,132 bytes)
📎 references/evaluators-pre-built.md (2,455 bytes)
📎 references/evaluators-rag.md (3,222 bytes)
📎 references/experiments-datasets-python.md (4,915 bytes)
📎 references/experiments-datasets-typescript.md (2,749 bytes)
📎 references/experiments-overview.md (1,748 bytes)
📎 references/experiments-running-python.md (2,990 bytes)
📎 references/experiments-running-typescript.md (3,205 bytes)
📎 references/experiments-synthetic-python.md (1,816 bytes)
📎 references/experiments-synthetic-typescript.md (2,266 bytes)
📎 references/fundamentals-anti-patterns.md (1,632 bytes)
📎 references/fundamentals-model-selection.md (1,458 bytes)
📎 references/fundamentals.md (1,835 bytes)
📎 references/observe-sampling-python.md (2,415 bytes)
📎 references/observe-sampling-typescript.md (3,954 bytes)
📎 references/observe-tracing-setup.md (3,396 bytes)
📎 references/production-continuous.md (3,642 bytes)
📎 references/production-guardrails.md (1,363 bytes)
📎 references/production-overview.md (2,352 bytes)
📎 references/setup-python.md (1,971 bytes)
📎 references/setup-typescript.md (981 bytes)
📎 references/validation-evaluators-python.md (1,067 bytes)
📎 references/validation-evaluators-typescript.md (4,551 bytes)
📎 references/validation.md (1,764 bytes)