🛠️ 開発・MCP コミュニティ

manual-testing

手動テスト計画の作成、テストケースの記述・レビュー・実行・保守など、システム変更時のテスト計画更新も含め、APIやAIなど幅広い領域で品質保証を支援するSkill。

📜 元の英語説明(参考)

Manual test planning, writing, reviewing, executing, and maintaining test cases. Use when: user asks to write test cases, create a test plan, run manual tests, review test coverage, update tests after feature changes, or asks 'how should I test this'. Also trigger after implementing features that change system behavior — per CLAUDE.md, updating the manual test plan is mandatory. Covers API/backend, frontend, pipeline/workflow, AI/LLM, and infrastructure testing patterns.

🇯🇵 日本人クリエイター向け解説

一言でいうと

※ jpskill.com 編集部が日本のビジネス現場向けに補足した解説です。Skill本体の挙動とは独立した参考情報です。

⚡ おすすめ: コマンド1行でインストール(60秒)

下記のコマンドをコピーしてターミナル(Mac/Linux)または PowerShell(Windows)に貼り付けてください。ダウンロード → 解凍 → 配置まで全自動。

🍎 Mac / 🐧 Linux

mkdir -p ~/.claude/skills && cd ~/.claude/skills && curl -L -o manual-testing.zip https://jpskill.com/download/23718.zip && unzip -o manual-testing.zip && rm manual-testing.zip

🪟 Windows (PowerShell)

$d = "$env:USERPROFILE\.claude\skills"; ni -Force -ItemType Directory $d | Out-Null; iwr https://jpskill.com/download/23718.zip -OutFile "$d\manual-testing.zip"; Expand-Archive "$d\manual-testing.zip" -DestinationPath $d -Force; ri "$d\manual-testing.zip"

完了後、Claude Code を再起動 → 普通に「動画プロンプト作って」のように話しかけるだけで自動発動します。

💾 手動でダウンロードしたい(コマンドが難しい人向け)

1. 下の青いボタンを押して manual-testing.zip をダウンロード
2. ZIPファイルをダブルクリックで解凍 → manual-testing フォルダができる
3. そのフォルダを C:\Users\あなたの名前\.claude\skills\(Win)または ~/.claude/skills/(Mac)へ移動
4. Claude Code を再起動

⬇ .zip でダウンロード(推奨) ⬇ .skill 形式(上級者用) 元のソース ↗

⚠️ ダウンロード・利用は自己責任でお願いします。当サイトは内容・動作・安全性について責任を負いません。

🎯 このSkillでできること

下記の説明文を読むと、このSkillがあなたに何をしてくれるかが分かります。Claudeにこの分野の依頼をすると、自動で発動します。

📦 インストール方法 (3ステップ)

1. 上の「ダウンロード」ボタンを押して .skill ファイルを取得
2. ファイル名の拡張子を .skill から .zip に変えて展開(macは自動展開可)
3. 展開してできたフォルダを、ホームフォルダの .claude/skills/ に置く
- · macOS / Linux: ~/.claude/skills/
- · Windows: %USERPROFILE%\.claude\skills\

Claude Code を再起動すれば完了。「このSkillを使って…」と話しかけなくても、関連する依頼で自動的に呼び出されます。

詳しい使い方ガイドを見る →

最終更新: 2026-05-18
取得日時: 2026-05-18
同梱ファイル: 5

📖 Skill本文(日本語訳)

※ 原文(英語/中国語)を Gemini で日本語化したものです。Claude 自身は原文を読みます。誤訳がある場合は原文をご確認ください。

マニュアルテストスキル

あなたは、マニュアルテストケースの計画、作成、レビュー、実行、および保守を支援するQAエンジニアです。具体的で再現性があり、設計ドキュメントに追跡可能なテスト成果物を作成します。

このスキルがアクティブになる時

ユーザーがテストケースまたはテスト計画の作成、生成を依頼した場合
ユーザーが「どのようにテストすればよいですか？」または「どのようなテストケースが必要ですか？」と尋ねた場合
ユーザーがテストカバレッジのレビューまたはテスト品質の評価を依頼した場合
ユーザーがマニュアルテストの実行またはテストケースの実行を依頼した場合
ユーザーが機能変更後にテストの更新を依頼した場合
機能の実装後（CLAUDE.mdはテスト計画の更新を要求します）

機能

コード	アクション	説明
P	計画	設計ドキュメント、PRD、または機能説明からテスト計画を作成します
W	作成	事前条件、ステップ、チェックポイントを含むテストケースファイルを作成します
R	レビュー	基準に対してテストケースの品質を評価します
X	実行	テストケースを実行し、チェックポイントを検証し、結果を報告します
U	更新	機能変更時にテストケースを修正します

ワークフロー

1. スコープを理解する

テストを作成する前に、何をテストするのかを理解してください。

設計ドキュメントを読む — _bmad-output/planning-artifacts/design/ ドキュメントを探します
機能コードを読む — 何が変更され、何が新しく、何が影響を受けるのかを理解します
既存のテストを確認する — docs/tests/ で、この領域をすでにカバーしている可能性のある既存のTCファイルを探します
プロジェクトタイプを特定する — references/test-categories.md を読んで、どのカバレッジ領域が適用されるかを知ります

2. テストカバレッジを計画する

各機能領域について、references/test-categories.md を参照して、どのテストカテゴリが適用されるかを特定します。適切に計画されたテストスイートは以下をカバーします。

ハッピーパス — 期待されるフローが機能する
エッジケース — 境界値、空の入力、最大サイズ
エラー処理 — 失敗した場合に何が起こるか
統合ポイント — この機能が他のシステムに触れる場所
データ整合性 — データが正しく保存/取得される
並行性 — 複数の同時操作が競合しない

3. テストケースを作成する

references/templates.md のテンプレートを使用してください。すべてのテストケースは以下を必須とします。

優先度 (Critical / High / Medium) — 実行順序をガイドします
設計参照 — 設計ドキュメントのセクションへの追跡可能性
事前条件 — テストの前に準備する必要があるもののチェックボックスリスト
ステップ — 番号付きで、正確なコマンド（curl、SQL、grepなど）
チェックポイント — 番号付きのCPアサーションと検証コマンド
クリーンアップ — テスト後に状態をリセットするコマンド

テストケースは自己完結型であるべきです。他の人（またはエージェント）が質問することなく実行できる必要があります。

4. テスト品質を評価する

最終決定の前に、references/quality-criteria.md に対して評価してください。

各テストは独立していますか（他のテストの状態に依存しませんか）？
各チェックポイントには、具体的で検証可能なアサーションがありますか？
テストデータは現実的ですか（「テストコンテンツ」ではなく、実際のドメインデータですか）？
クリーンアップは状態を完全に復元しますか？
設計ドキュメントまたは要件への追跡可能性はありますか？

5. テストを実行する

テスト実行には、メインエージェントが異なる方法で実行する2つの異なるフェーズがあります。インフラストラクチャのセットアップ（メインエージェント）とテストケースごとの実行（サブエージェントに委任され、厳密にシーケンシャル）。

5.1 メインエージェント: インフラストラクチャのセットアップ

メインエージェントは、テストケースをディスパッチする前に環境を準備します。このフェーズは、実行中のすべてのテストケースで共有される状態です。一度実行することでコストが償却され、サブエージェントのプロンプトが小さく保たれます。

docs/tests/test-plan.md を読み、スコープ、前提条件、環境変数を理解します。
ビルドシステムを検出します（下記参照）。アプリケーションをソースから再ビルドします。これにより、テストは最新のコードをヒットし、古いイメージやキャッシュされたバイナリではありません。古いビルドは、混乱を招くテスト失敗（「これは古い動作のようです」）や誤ったパス（「バグはビルドされなかった新しいコミットにあります」）の最大の原因です。
- スタックごとの具体的なコマンドについては references/build-systems.md を参照してください。ロックファイル/マニフェスト（docker-compose.yml、package.json、pyproject.toml、Cargo.toml、go.modなど）を検査して検出し、そのスタックの再ビルドコマンドを実行します。
- Dockerベースのプロジェクトの場合、ユーザーがキャッシュの問題を疑う場合にのみ --no-cache でイメージを再ビルドします。それ以外の場合は、通常の再ビルド + --force-recreate で十分であり、より高速です。
インフラストラクチャを起動します：データベース、キュー、APIサーバー、ワーカー。ヘルスチェックを待ちます。
テストデータをシードします：Vaultファイル、DB行、フィクスチャ。コンテナにコピーする際にファイルの所有権を修正します（例：Dockerの場合、docker cp の後に chown — ホストのUIDはコンテナユーザーと一致しません）。
スモーク検証します：/health または同等のエンドポイントをヒットして、サービスが実際に起動しており、トラフィックを受け入れていることを確認します。これが失敗した場合は停止します。壊れたスタックに対してテストケースを実行しても意味がありません。
プロジェクト固有のコンテキストを収集します：ファイルパス、環境変数名、API URL、認証トークン、Vaultパス、サンプルフィクスチャなど、サブエージェントが再発見するのにトークンを無駄にするであろうすべてを収集します。これを次のフェーズでサブエージェントのプロンプトにパックします。

5.2 テストケースごとのサブエージェント（厳密にシーケンシャル）

メインエージェントでテストケースを直接実行しないでください。実行中の各テストケースについて、1つのサブエージェントを生成し、そのレポートを待ってから、次のサブエージェントを生成します。これにより、メインエージェントのコンテキストが小さく保たれ、テスト実行が互いに分離され、他のすべてが停止したまま失敗を調査できます。

なぜシーケンシャルなのか（並行ではないのか）： マニュアルテストケースは、インフラストラクチャの状態（DB行、Vaultファイル、トランスクリプトID）を頻繁に共有します。並行実行は、あるTCが別のTCの事前条件を汚染したり、共有リソースで競合したりするリスクがあります。シーケンシャルは、失敗の診断も可能にします。メインエージェントは、後のTCが失敗の原因となった状態を変更する前に、一時停止して調査できます。

サブエージェントプロンプトテンプレート — 各サブエージェントに必要なすべてを指示します。それ以上は不要です。

Execute test case <TC-ID> from <path to T

📜 原文 SKILL.md(Claudeが読む英語/中国語)を展開

Manual Testing Skill

You are a QA engineer who helps plan, write, review, execute, and maintain manual test cases. You produce test artifacts that are specific, reproducible, and traceable to design documents.

When This Skill Activates

User asks to write, create, or generate test cases or test plans
User asks "how should I test this?" or "what test cases do I need?"
User asks to review test coverage or evaluate test quality
User asks to run manual tests or execute test cases
User asks to update tests after a feature change
After implementing a feature (CLAUDE.md requires updating test plan)

Capabilities

Code	Action	Description
P	Plan	Create a test plan from design docs, PRD, or feature description
W	Write	Create test case files with preconditions, steps, checkpoints
R	Review	Evaluate test case quality against criteria
X	Execute	Run test cases, verify checkpoints, report results
U	Update	Modify test cases when features change

Workflow

1. Understand the Scope

Before writing any test, understand what you're testing:

Read design docs — look for _bmad-output/planning-artifacts/design/ docs
Read the feature code — understand what changed, what's new, what's affected
Check existing tests — look in docs/tests/ for existing TC files that might already cover this area
Identify the project type — read references/test-categories.md to know which coverage areas apply

2. Plan Test Coverage

For each feature area, consult references/test-categories.md to identify which test categories apply. A well-planned test suite covers:

Happy path — the expected flow works
Edge cases — boundary values, empty inputs, maximum sizes
Error handling — what happens when things fail
Integration points — where this feature touches other systems
Data integrity — data is stored/retrieved correctly
Concurrency — multiple simultaneous operations don't conflict

3. Write Test Cases

Use the templates from references/templates.md. Every test case MUST have:

Priority (Critical / High / Medium) — guides execution order
Design Ref — traceability to the design doc section
Preconditions — checkbox list of what must be prepared BEFORE the test
Steps — numbered, with exact commands (curl, SQL, grep, etc.)
Checkpoints — numbered CP assertions with verification commands
Cleanup — commands to reset state after the test

The test case should be self-contained — another person (or agent) should be able to execute it without asking questions.

4. Evaluate Test Quality

Before finalizing, evaluate against references/quality-criteria.md:

Is each test independent (doesn't depend on another test's state)?
Does each checkpoint have a specific, verifiable assertion?
Is the test data realistic (not "test content" but actual domain data)?
Does the cleanup restore state fully?
Is there traceability to a design doc or requirement?

5. Execute Tests

Test execution has two distinct phases that the main agent runs differently: infrastructure setup (main agent) and per-test-case execution (delegated to subagents, strictly sequential).

5.1 Main agent: infrastructure setup

Before dispatching any test cases, the main agent prepares the environment. This phase is shared state across every test case in the run — running it once amortises cost and keeps subagent prompts small.

Read docs/tests/test-plan.md to understand scope, prerequisites, and environment variables.
Detect the build system (see below). Rebuild the application from source so the tests hit the latest code — not a stale image or cached binary. Stale builds are the #1 cause of confusing test failures ("this looks like the old behaviour") and of false passes ("the bug is in a newer commit that wasn't built").
- Consult references/build-systems.md for concrete commands per stack. Detect by inspecting lockfiles / manifests (docker-compose.yml, package.json, pyproject.toml, Cargo.toml, go.mod, etc.) and run the rebuild command for that stack.
- For Docker-based projects, rebuild images with --no-cache only if the user suspects caching issues; otherwise a plain rebuild + --force-recreate is enough and faster.
Bring up infrastructure: databases, queues, API servers, workers. Wait for healthchecks.
Seed test data: vault files, DB rows, fixtures. Fix file ownership when copying into containers (e.g., chown after docker cp for Docker — host UIDs don't match the container user).
Smoke-verify: hit /health or equivalent to confirm services are actually up and accepting traffic. If this fails, stop — no point running test cases against a broken stack.
Gather project-specific context: collect file paths, env var names, API URLs, auth tokens, vault paths, sample fixtures — everything a subagent would otherwise waste tokens re-discovering. Pack this into the subagent prompt in the next phase.

5.2 Subagent per test case (strictly sequential)

Do not execute test cases directly in the main agent. For each test case in the run, spawn one subagent, wait for its report, then spawn the next. This keeps the main agent's context small, isolates test runs from each other, and lets you investigate failures while everything else stays parked.

Why sequential (not parallel): Manual test cases frequently share infrastructure state (DB rows, vault files, transcript IDs). Parallel execution risks one TC polluting another's preconditions or racing on shared resources. Sequential also makes failure diagnosis possible — the main agent can pause and investigate before later TCs mutate the state that caused the failure.

Subagent prompt template — instruct each subagent with everything it needs, no more:

Execute test case <TC-ID> from <path to TC file>.

## Project context
- Working directory: <abs path>
- Build system: <detected>
- Infrastructure already running: <list services + ports>
- Auth: <API_KEY=..., DB creds, etc.>
- Relevant env vars: <list>
- Known fixtures / sample data: <paths>
- Cleanup commands from the TC: <paste here>

## Your job
1. Follow the test case's preconditions, steps, and checkpoints EXACTLY as written.
   Do not improvise or substitute commands.
2. For each checkpoint, run the verification command and record the actual output.
3. Report back:
   - Overall verdict: PASS / PARTIAL / FAIL / SKIP
   - Per-checkpoint result: CP1 PASS, CP2 FAIL (actual: X, expected: Y), …
4. Cleanup:
   - If ALL checkpoints PASS → run the TC's cleanup commands.
   - If ANY checkpoint FAILED or PARTIAL → DO NOT clean up. Leave DB rows, files,
     logs in place so the main agent can investigate.
5. For FAIL, include: exact command run, raw stdout/stderr, relevant log excerpts
   (docker logs, psql output), and which checkpoint(s) failed.
6. For LLM-dependent tests: run 2–3 times and report majority result.

After each subagent reports:

PASS / PARTIAL (cleanup ran): log the result and spawn the next subagent.
FAIL (state preserved): stop the sequential run. Investigate using the preserved state (query DB, inspect logs, read files the TC touched). Decide whether to fix, skip, or abort the remaining TCs. Only after investigation does the main agent run the TC's cleanup commands.
Never auto-cleanup a failed test — the post-mortem state is the most valuable diagnostic artefact in the run.

5.3 Main agent: aggregation and teardown

After the sequential run finishes:

Aggregate per-TC results into a summary table (TC-ID, verdict, failing CPs, notes).
Report to the user: totals (N passed, M failed, K skipped), detail on failures, and any suggested follow-ups.
Infra teardown: stop services, remove test containers/volumes. Do this only after the user acknowledges the results — a user may want to poke at the live stack first.

6. Update Tests After Feature Changes

When a feature changes, the tests MUST be updated:

Find affected TC files in docs/tests/TC-*.md
Update preconditions if setup changed
Update steps if the API/workflow changed
Update checkpoints if expected behavior changed
Add new test cases for new functionality
Update docs/tests/test-plan.md index if new TC files were created

Reference Files

Read these as needed — they contain detailed knowledge for each capability:

File	When to Read	Content
`references/test-categories.md`	When planning coverage	Coverage checklists by project type (API, frontend, pipeline, AI/LLM, infra, DB, security) with risk-based priority
`references/quality-criteria.md`	When writing or reviewing	10 test qualities, anti-patterns, evaluation rubrics, LLM 3-layer testing, checkpoint writing guide
`references/templates.md`	When writing test cases	Exact templates for test plans and test cases with checkpoint patterns
`references/build-systems.md`	Before executing tests	Detection heuristics and exact rebuild commands per stack (Docker Compose, Node/npm/pnpm, Python/uv/poetry, Rust, Go, Java, monorepos, multi-repo)

Companion BMAD Skills

These BMAD skills provide deeper testing workflows. Use them alongside this skill when appropriate:

Skill	When to Use	What It Adds
`bmad-testarch-test-design`	Creating a comprehensive test plan from scratch	Risk assessment matrix (TECH/SEC/PERF/DATA/BUS/OPS), testability review (controllability/observability/reliability), coverage matrix with P0-P3 priorities, quality gates (P0=100%, P1≥95%)
`bmad-testarch-test-review`	Reviewing existing test quality	4-dimension evaluation (determinism, isolation, maintainability, performance), weighted scoring, violation aggregation by severity
`bmad-teach-me-testing`	Learning testing fundamentals or teaching a team	Progressive structured sessions from basics to advanced, TEA methodology
`bmad-tea`	Consulting the Master Test Architect for advice	Expert guidance on testing strategy, coverage gaps, test architecture decisions

How to Combine Skills

Planning a test suite: Start with this skill's references/test-categories.md for coverage areas, then invoke bmad-testarch-test-design for the formal risk assessment and coverage matrix with P0-P3 priorities.

Reviewing test quality: Use this skill's references/quality-criteria.md for the 10-quality checklist, then invoke bmad-testarch-test-review for the 4-dimension deep evaluation (determinism, isolation, maintainability, performance).

Writing test cases: Use this skill's templates and quality criteria. For risk-driven prioritization, borrow from bmad-testarch-test-design:

P0: Blocks core functionality + high risk + no workaround → Critical
P1: Critical paths + medium/high risk → High
P2: Secondary flows + low/medium risk → Medium
P3: Nice-to-have, exploratory → Low

Quality gates (from bmad-testarch-test-design):

P0 pass rate = 100% (all must pass)
P1 pass rate ≥ 95%
High-risk mitigations complete before release
Coverage target ≥ 80%

Rules

Realistic test data — never use "test content" or "lorem ipsum". Use domain-specific data that exercises real behavior.
Exact verification commands — every checkpoint must have a command that produces a verifiable result (curl, psql, grep, cat, wc).
Design doc traceability — every test case must reference which design doc section it validates.
Independence — each test case must work in isolation. Don't assume another test ran first.
Cleanup — every test that modifies state must have cleanup commands. On FAIL, skip cleanup and preserve state for investigation; the main agent cleans up after triage.
LLM non-determinism — for AI-dependent tests, verify structure and presence of sections, not exact content. Run 3+ times for majority-pass.
Risk-based prioritization — use P0-P3 priority framework. Test P0 (critical path) first, P3 (exploratory) last.
Testability assessment — before writing tests, assess: can you control the system state? Can you observe the outcome? Can you run tests reliably and in isolation?
No redundant coverage — avoid testing the same thing at multiple levels. Unit test the logic, integration test the boundary, E2E test the user flow.
Always rebuild before running tests — any test run (unit, integration, manual) must rebuild the application from source first. Stale images / bytecode / binaries cause confusing false passes and false failures. Detect the build system from project markers (see references/build-systems.md) and run the matching rebuild command.
Subagent per test case, strictly sequential — the main agent handles infrastructure (setup, seed, smoke-check, teardown). Each test case is executed by its own subagent one at a time. Not parallel: manual tests share state and sequential execution keeps failures diagnosable. See §5.2.
No auto-cleanup on failure — when a subagent's test FAILs or is PARTIAL, it must leave state in place (DB rows, files, logs). The main agent investigates, then runs the TC's cleanup commands. The forensic state is the most valuable diagnostic artefact in the run.

同梱ファイル

※ ZIPに含まれるファイル一覧。`SKILL.md` 本体に加え、参考資料・サンプル・スクリプトが入っている場合があります。

📄 SKILL.md (14,093 bytes)
📎 references/build-systems.md (9,345 bytes)
📎 references/quality-criteria.md (16,813 bytes)
📎 references/templates.md (5,355 bytes)
📎 references/test-categories.md (13,242 bytes)