phoenix-pxi-playwright
Write, extend, and debug PXI Playwright E2E tests for Phoenix. Use when adding PXI agent frontend specs, authoring LLM-as-judge rubrics, asserting PXI tool use, persisting PXI test runs as Phoenix experiments, or debugging PXI E2E failures.
下記のコマンドをコピーしてターミナル(Mac/Linux)または PowerShell(Windows)に貼り付けてください。 ダウンロード → 解凍 → 配置まで全自動。
mkdir -p ~/.claude/skills && cd ~/.claude/skills && curl -L -o phoenix-pxi-playwright.zip https://jpskill.com/download/23156.zip && unzip -o phoenix-pxi-playwright.zip && rm phoenix-pxi-playwright.zip
$d = "$env:USERPROFILE\.claude\skills"; ni -Force -ItemType Directory $d | Out-Null; iwr https://jpskill.com/download/23156.zip -OutFile "$d\phoenix-pxi-playwright.zip"; Expand-Archive "$d\phoenix-pxi-playwright.zip" -DestinationPath $d -Force; ri "$d\phoenix-pxi-playwright.zip"
完了後、Claude Code を再起動 → 普通に「動画プロンプト作って」のように話しかけるだけで自動発動します。
💾 手動でダウンロードしたい(コマンドが難しい人向け)
- 1. 下の青いボタンを押して
phoenix-pxi-playwright.zipをダウンロード - 2. ZIPファイルをダブルクリックで解凍 →
phoenix-pxi-playwrightフォルダができる - 3. そのフォルダを
C:\Users\あなたの名前\.claude\skills\(Win)または~/.claude/skills/(Mac)へ移動 - 4. Claude Code を再起動
⚠️ ダウンロード・利用は自己責任でお願いします。当サイトは内容・動作・安全性について責任を負いません。
🎯 このSkillでできること
下記の説明文を読むと、このSkillがあなたに何をしてくれるかが分かります。Claudeにこの分野の依頼をすると、自動で発動します。
📦 インストール方法 (3ステップ)
- 1. 上の「ダウンロード」ボタンを押して .skill ファイルを取得
- 2. ファイル名の拡張子を .skill から .zip に変えて展開(macは自動展開可)
- 3. 展開してできたフォルダを、ホームフォルダの
.claude/skills/に置く- · macOS / Linux:
~/.claude/skills/ - · Windows:
%USERPROFILE%\.claude\skills\
- · macOS / Linux:
Claude Code を再起動すれば完了。「このSkillを使って…」と話しかけなくても、関連する依頼で自動的に呼び出されます。
詳しい使い方ガイドを見る →- 最終更新
- 2026-05-18
- 取得日時
- 2026-05-18
- 同梱ファイル
- 1
📖 Claude が読む原文 SKILL.md(中身を展開)
この本文は AI(Claude)が読むための原文(英語または中国語)です。日本語訳は順次追加中。
Phoenix PXI Playwright Tests
Use this skill when authoring or maintaining Playwright specs for PXI, Phoenix's built-in AI assistant. The concrete harness lives in app/tests/pxi/; this skill is the authoring guide for using and extending that harness.
Start Here
- Read the existing example spec first:
app/tests/pxi/docs-smoke.spec.ts. - Reuse the shared fixture and driver from
app/tests/pxi/fixtures.ts. - Reuse shared constants from
app/tests/pxi/constants.tsand shared types fromapp/tests/pxi/types.ts. - Put pure parsing/API helpers in
app/tests/pxi/utils.tsrather than in specs or fixture classes. - Reuse the generic AI SDK judge from
app/tests/pxi/judge.ts. - Reuse experiment persistence from
app/tests/pxi/experimentPersistence.ts. - Add one entry to
PXI_EXPERIMENT_EXAMPLESinapp/tests/pxi/experimentPersistence.tsfor every new PXI spec scenario. All specs share the same dataset, and the update upload treats that registry as the complete desired set of examples. - Do not create a bespoke PXI driver, duplicate experiment client, or duplicate PXI tool schemas in a spec.
Current Harness
The current harness provides these abstractions:
testandexpectfrom./fixtures: PXI-aware Playwright fixture exports.constants.ts: default assistant and judge model/project constants.types.ts: shared PXI harness types such asPxiTurn.utils.ts: pure utilities for API response validation and span/tool parsing.pxi.open(): opens PXI for the test session.pxi.acknowledgeConsent(): accepts PXI consent for the test session.pxi.askAndWait(prompt): sends a user prompt and waits for the assistant turn. It does not require a backend TOOL span; add explicit tool assertions in the spec.pxi.expectNoAgentError(): asserts the visible PXI session did not surface an agent error.pxi.expectBackendToolSpanCalled(turn): asserts the PXI turn produced at least one persisted backend TOOL span and merges those backend tool names intoturn.calledTools. Use this for server/MCP-backed tools such as docs tools, not for purely client-executed external tools.pxi.expectDocsToolCalled(turn): asserts the PXI turn used runtime docs tooling via Phoenix-observed tool spans.pxi.getMetadata(): collects PXI metadata for persistence.judge({ model, system, prompt, assistantText, rubric }): evaluates an assistant answer with AI SDKgenerateTextand structuredOutput.object.evaluatePxiOutcome({ assertions, judgeInput }): runs deterministic assertions and LLM judging while preserving failed post-turn outcomes for experiment persistence.assertPxiOutcome(outcome): fails the Playwright test after persistence, preferring the original deterministic assertion failure when one exists.persistPxiExperiment({ request, record }): stores the PXI interaction, judge result, and metadata as a Phoenix experiment.
Authoring Workflow
- Add the scenario prompt and expected output to
PXI_EXPERIMENT_EXAMPLESso the shared PXI E2E dataset gets one example per test scenario. - Put the scenario prompt, PXI user instructions, and judge rubric in the spec file so the test is readable top-to-bottom. Import the scenario from
PXI_EXPERIMENT_EXAMPLESrather than duplicating prompt strings. - Drive PXI through the real UI with
pxi.open,pxi.acknowledgeConsent, andpxi.askAndWait. - Add deterministic assertions before judge assertions, such as expected text, no agent error, or expected tool use.
- Put all post-turn deterministic assertions inside
evaluatePxiOutcome, includingpxi.expectBackendToolSpanCalled(turn), so failures after PXI returns an answer still get persisted. - After
pxi.askAndWaitreturns a turn, persist both passing and failing outcomes. Do not let deterministic assertion failures skip experiment persistence. - Use
evaluatePxiOutcomeinstead of writing per-spectry/catchblocks. It runsjudgeeven when deterministic Playwright assertions fail, then combines the judge explanation with a sanitized, truncated Playwright assertion message in the persisted failed evaluation. - Run the targeted spec with isolated ports before reporting success.
Spec Pattern
import {
persistPxiExperiment,
PXI_EXPERIMENT_EXAMPLES,
} from "./experimentPersistence";
import { expect, test } from "./fixtures";
import { getRequiredJudgeApiKeyEnv } from "./judge";
import { assertPxiOutcome, evaluatePxiOutcome } from "./outcome";
const EXPERIMENT_EXAMPLE = PXI_EXPERIMENT_EXAMPLES.someScenario;
const USER_PROMPT = EXPERIMENT_EXAMPLE.prompt;
const JUDGE_RUBRIC = [
"The answer satisfies the user request.",
"The answer is grounded in the expected Phoenix context.",
"The answer does not invent unsupported facts.",
];
const JUDGE_API_KEY_ENV = getRequiredJudgeApiKeyEnv();
test.describe("PXI scenario", () => {
test("handles the scenario", async ({
browserName,
page,
pxi,
request,
}, testInfo) => {
test.skip(
browserName !== "chromium",
"PXI real-LLM smoke runs once in chromium."
);
test.skip(
process.env.PXI_E2E !== "true",
"Set PXI_E2E=true to run PXI E2E tests."
);
test.skip(
!process.env.OPENAI_API_KEY,
"OPENAI_API_KEY is required for the PXI assistant."
);
test.skip(
!process.env[JUDGE_API_KEY_ENV],
`${JUDGE_API_KEY_ENV} is required for the PXI E2E judge.`
);
await pxi.open();
await pxi.acknowledgeConsent();
const turn = await pxi.askAndWait(USER_PROMPT);
const outcome = await evaluatePxiOutcome({
assertions: async () => {
await pxi.expectNoAgentError();
// For server/MCP-backed tools only. Client-executed external tools can
// be asserted through visible tool UI or final app state instead.
await pxi.expectBackendToolSpanCalled(turn);
expect(turn.assistantText).toContain("deterministic expected text");
},
judgeInput: {
system: "You are judging a Phoenix PXI E2E answer.",
prompt: USER_PROMPT,
assistantText: turn.assistantText,
rubric: JUDGE_RUBRIC,
},
});
await persistPxiExperiment({
request,
record: {
example: EXPERIMENT_EXAMPLE,
assistantText: turn.assistantText,
calledTools: turn.calledTools,
url: page.url(),
durationMs: turn.durationMs,
judgeResult: outcome.judgeResult,
playwrightProject: testInfo.project.name,
...pxi.getMetadata(),
},
});
assertPxiOutcome(outcome);
});
});
Judge Models
- Pass judge models as
provider/modelstrings. - Default:
openai/gpt-4.1viaPXI_E2E_JUDGE_MODELfallback. - Supported providers are currently
openaiandanthropic. openai/...requiresOPENAI_API_KEY.anthropic/...requiresANTHROPIC_API_KEY.- The judge must use AI SDK 6 structured output idioms:
generateTextwithOutput.object({ schema }). Do not use deprecatedgenerateObject.
Running Specs
Use isolated ports to avoid collisions with a local Phoenix dev server:
pnpm run test:e2e:pxi
Real PXI specs require external model credentials. Keep tests skipped by default unless PXI_E2E=true is set.
Experiment Persistence
- Persistence defaults to
http://localhost:6006so real E2E runs can be inspected in the developer's local Phoenix instance. - Override with
PXI_E2E_EXPERIMENT_BASE_URL. - Use
PXI_E2E_EXPERIMENT_BEARER_TOKENonly when persisting to an authenticated Phoenix target. - All PXI specs share the
PXI E2E Agent Testsdataset.persistPxiExperimentuploads every entry inPXI_EXPERIMENT_EXAMPLESusingaction: "update", so adding or removing an entry updates the dataset examples declaratively. - Every new PXI spec must add one
PXI_EXPERIMENT_EXAMPLESentry with a stableid, prompt, expected output, experiment name prefix, and experiment description, then pass that entry asrecord.example. - Persist the complete test record: prompt, assistant text, called tools, duration, judge result, Playwright project, URL, and PXI metadata.
- Persist failed post-turn outcomes too. If PXI returned an answer, the experiment run should exist even when deterministic assertions or judge checks fail.
- Let
evaluatePxiOutcomestrip ANSI escape sequences and truncate Playwright assertion messages before they are added tojudgeResult.explanation; raw Playwright error messages are terminal-formatted and too noisy for Phoenix experiment tables.
Tool Assertions
- Use
pxi.expectBackendToolSpanCalled(turn)when the scenario requires a server-observed backend TOOL span. Keep the call insideevaluatePxiOutcome. - Use
pxi.expectDocsToolCalled(turn)afterpxi.expectBackendToolSpanCalled(turn)for docs scenarios because it checks the backend tool names merged intoturn.calledTools. - Do not require backend TOOL spans for client-executed external tools such as UI control tools. Assert their visible tool chips, tool result text, or the resulting app state instead.
Extending The Harness
If fixtures.ts, judge.ts, or experimentPersistence.ts do not satisfy a testing use case, use internal_docs/pxi_playwright_e2e_harness.md as the design guide for extending the harness, utilities, and this skill together. That internal doc describes the intended architecture for richer PXI drivers, seed fixtures, LLM modes, outcome assertions, experiment persistence, and debugging artifacts.
Prefer small harness extensions that make future specs simpler. Do not add one-off logic to an individual spec if it should be reusable across PXI E2E scenarios.
Keep This Skill Fresh
When a session creates new PXI Playwright capabilities, discovers a reliable assertion pattern, changes judge behavior, adds seed fixtures, or learns a debugging technique, update this skill in the same branch. Future developers and agents should not have to rediscover branch-specific harness behavior.