🛠️ 開発・MCP コミュニティ

multi-ai-testing

テスト駆動開発において、網羅的なテストスイートを生成し、独立した複数エージェントで検証することで、テストの抜け穴を防ぎ、十分なテストカバレッジを確保するSkill。

📜 元の英語説明(参考)

Test-driven development with independent verification to prevent test gaming. TDD workflows, test generation, coverage validation (≥80% gate, ≥95% target), property-based testing, edge case discovery. Use when implementing TDD workflows, generating comprehensive test suites, validating test coverage, or preventing test gaming through independent multi-agent verification.

🇯🇵 日本人クリエイター向け解説

一言でいうと

※ jpskill.com 編集部が日本のビジネス現場向けに補足した解説です。Skill本体の挙動とは独立した参考情報です。

⚡ おすすめ: コマンド1行でインストール(60秒)

下記のコマンドをコピーしてターミナル(Mac/Linux)または PowerShell(Windows)に貼り付けてください。ダウンロード → 解凍 → 配置まで全自動。

🍎 Mac / 🐧 Linux

mkdir -p ~/.claude/skills && cd ~/.claude/skills && curl -L -o multi-ai-testing.zip https://jpskill.com/download/9450.zip && unzip -o multi-ai-testing.zip && rm multi-ai-testing.zip

🪟 Windows (PowerShell)

$d = "$env:USERPROFILE\.claude\skills"; ni -Force -ItemType Directory $d | Out-Null; iwr https://jpskill.com/download/9450.zip -OutFile "$d\multi-ai-testing.zip"; Expand-Archive "$d\multi-ai-testing.zip" -DestinationPath $d -Force; ri "$d\multi-ai-testing.zip"

完了後、Claude Code を再起動 → 普通に「動画プロンプト作って」のように話しかけるだけで自動発動します。

💾 手動でダウンロードしたい(コマンドが難しい人向け)

1. 下の青いボタンを押して multi-ai-testing.zip をダウンロード
2. ZIPファイルをダブルクリックで解凍 → multi-ai-testing フォルダができる
3. そのフォルダを C:\Users\あなたの名前\.claude\skills\(Win)または ~/.claude/skills/(Mac)へ移動
4. Claude Code を再起動

⬇ .zip でダウンロード(推奨) ⬇ .skill 形式(上級者用) 元のソース ↗

⚠️ ダウンロード・利用は自己責任でお願いします。当サイトは内容・動作・安全性について責任を負いません。

🎯 このSkillでできること

下記の説明文を読むと、このSkillがあなたに何をしてくれるかが分かります。Claudeにこの分野の依頼をすると、自動で発動します。

📦 インストール方法 (3ステップ)

1. 上の「ダウンロード」ボタンを押して .skill ファイルを取得
2. ファイル名の拡張子を .skill から .zip に変えて展開(macは自動展開可)
3. 展開してできたフォルダを、ホームフォルダの .claude/skills/ に置く
- · macOS / Linux: ~/.claude/skills/
- · Windows: %USERPROFILE%\.claude\skills\

Claude Code を再起動すれば完了。「このSkillを使って…」と話しかけなくても、関連する依頼で自動的に呼び出されます。

詳しい使い方ガイドを見る →

最終更新: 2026-05-18
取得日時: 2026-05-18
同梱ファイル: 1

📖 Skill本文(日本語訳)

※ 原文(英語/中国語)を Gemini で日本語化したものです。Claude 自身は原文を読みます。誤訳がある場合は原文をご確認ください。

Multi-AI Testing

概要

multi-ai-testing は、テストの不正操作を防ぎ、包括的なテストカバレッジを保証するために、独立した検証によるテスト駆動開発ワークフローを提供します。

目的: TDD を通じて高品質なテストを生成し、≥80% のカバレッジ（ゲート）/≥95%（目標）を達成し、エージェントが自身のテストを不正操作することを防ぎます。

パターン: ワークフローベース (4 つのコアワークフロー)

主要なイノベーション: 独立したテスト/実装エージェントによる独立検証により、過剰適合とテストの不正操作を防ぎます。

コア原則 (tri-AI 研究によって検証済み):

テストファースト開発 - 実装の前にテストを記述します。
独立検証 - テストと実装で別々のエージェントを使用します。
包括的なカバレッジ - ≥80% のゲート、AI によるエッジケースの発見により ≥95% を達成可能
非決定的な評価 - 二値の合格/不合格ではなく、スコアリングシステムを使用します。
自己修復テスト - テストはコードの変更に適応します (メンテナンスを 80% 削減)。

使用するタイミング

multi-ai-testing は、以下の場合に使用します。

テスト駆動開発 (TDD) を実装する場合
包括的なテストスイート (ユニット、統合、E2E) を生成する場合
テストカバレッジを検証する場合 (≥80% 最小)
テストの不正操作を防ぐ場合 (独立検証)
エッジケースを発見する場合 (AI 搭載の探索)
テストを保守および進化させる場合 (自己修復による適応)

前提条件

必須

テストフレームワークがインストールされていること (Jest, Vitest, pytest など)
テストするコードまたは仕様があること
カバレッジツールが利用可能であること

推奨

multi-ai-implementation - テスト記述後にコードを実装する場合
multi-ai-verification - テストの品質検証を行う場合

理解

TDD の概念 (テストファースト開発)
カバレッジメトリクス (行、分岐、関数)
使用する言語のテストフレームワーク

テストワークフロー

ワークフロー 1: TDD (テスト駆動開発)

モック実装やテストの不正操作を防ぐ、コアとなるテストファースト開発ワークフローです。

目的: テストが実装を駆動し、その逆ではないことを保証します。

パターン: テスト → 失敗 → 実装 → 合格 → 検証 (独立)

プロセス:

仕様の定義:

# 機能仕様

**関数**: generateToken(user: User): string

**要件**:
- id を持つ user オブジェクトを受け入れる
- JWT 文字列 (形式: xxx.yyy.zzz) を返す
- トークンに userId クレームが含まれる
- トークンの有効期限は 24 時間
- 無効なユーザーの場合、エラーをスローする

**エッジケース**:
- user が null/undefined
- user.id が欠落している
- JWT_SECRET が構成されていない

最初にテストを生成 (テストエージェント):

独立したテストエージェントを起動 (Task ツール):

const testGeneration = await task({
  description: "トークン生成のテストを生成",
  prompt: `トークン生成関数の包括的なテストを生成します。

  仕様 (spec.md から読み取り):
  - 関数: generateToken(user)
  - 要件: [spec から貼り付け]
  - エッジケース: [すべてリスト]

  以下をカバーするテストを生成します。
  1. ハッピーパス (有効なユーザー)
  2. すべてのエッジケース
  3. エラーシナリオ
  4. セキュリティに関する考慮事項

  Jest/Vitest フレームワークを使用します。
  以下に書き込みます: tests/auth/generateToken.test.ts

  関数を実装しないでください。
  テストは最初は失敗するはずです (関数がまだ存在しないため)。`
});

テストが失敗することを確認:

# 生成されたテストを実行
npm test -- tests/auth/generateToken.test.ts

# 期待される出力:
# ❌ FAIL tests/auth/generateToken.test.ts
#    ● generateToken is not defined
#    ● [すべてのテストが期待どおりに失敗]

# テストが合格した場合: ⚠️ 問題あり! テストは実装なしで失敗するはずです。

テストに合格するように実装 (実装エージェント):

別の実装エージェントを起動 (Task ツール):

const implementation = await task({
  description: "generateToken 関数を実装",
  prompt: `既存のテストに合格するように generateToken 関数を実装します。

  テストは以下にあります: tests/auth/generateToken.test.ts
  以下に実装します: src/auth/tokens.ts

  要件:
  - すべてのテストに合格させる
  - テストを変更しない
  - テストのパターンに従う

  成功: generateToken.test.ts のすべてのテストが合格`
});

重要: 別のエージェントは、テスト生成の理由付けを見ることができず、テスト自体のみを見ることができます。

テストが合格することを確認:

# テストを再度実行
npm test -- tests/auth/generateToken.test.ts

# 期待される出力:
# ✅ PASS tests/auth/generateToken.test.ts
#    ● generateToken › returns valid JWT ✓
#    ● generateToken › includes userId ✓
#    ● generateToken › expires in 24h ✓
#    ● generateToken › throws on invalid user ✓
#    [すべてのテストが合格]

独立検証 (検証エージェント):

独立した検証者を起動 (Task ツール):

const verification = await task({
  description: "テストと実装の品質を検証",
  prompt: `テスト (tests/auth/generateToken.test.ts) と実装 (src/auth/tokens.ts) をレビューします。

  実装の会話を読まないでください。

  以下を検証します。
  1. テストが実際に要件をテストしているか (過剰適合していないか)
  2. エッジケースが適切にカバーされているか
  3. 実装が質の高いコードであるか (テストの不正操作がないか)
  4. セキュリティに関する考慮事項が考慮されているか

  品質をスコアリングします (0-100)。
  レポートを以下に書き込みます: tdd-verification.md`
});

// 検証を読み取る
const report = readFile('tdd-verification.md');
if (report.score >= 90) {
  // ✅ 承認済み: テストと実装は高品質です
} else {
  // ⚠️ 問題が見つかりました: コミットする前に対応してください
}

出力:

テストが最初に記述される (実装の前)
テストが最初の失敗を確認する
実装がすべてのテストに合格する
独立検証スコア ≥90/100
テストの不正操作がない (検証済み)

検証:

[ ] テストが実装の前に生成された
[ ] テストが最初に失敗することを確認した
[ ] 別のエージェントがコードを実装した
[ ] 実装エージェントがテストを変更しなかった
[ ] すべてのテストが合格するようになった
[ ] 独立検証スコア ≥90
[ ] N

(原文がここで切り詰められています)

📜 原文 SKILL.md(Claudeが読む英語/中国語)を展開

Multi-AI Testing

Overview

multi-ai-testing provides test-driven development workflows with independent verification to prevent test gaming and ensure comprehensive test coverage.

Purpose: Generate high-quality tests through TDD, achieve ≥80% coverage (gate) / ≥95% (target), prevent agents from gaming their own tests

Pattern: Workflow-based (4 core workflows)

Key Innovation: Independent verification through separate test/implementation agents prevents overfitting and test gaming

Core Principles (validated by tri-AI research):

Test-First Development - Write tests BEFORE implementation
Independent Verification - Separate agents for testing vs. implementation
Comprehensive Coverage - ≥80% gate, ≥95% achievable with AI edge case discovery
Non-Deterministic Evaluation - Scoring systems, not binary pass/fail
Self-Healing Tests - Tests adapt to code changes (80% maintenance reduction)

When to Use

Use multi-ai-testing when:

Implementing test-driven development (TDD)
Generating comprehensive test suites (unit, integration, E2E)
Validating test coverage (≥80% minimum)
Preventing test gaming (independent verification)
Discovering edge cases (AI-powered exploration)
Maintaining and evolving tests (self-healing adaptation)

Prerequisites

Required

Test framework installed (Jest, Vitest, pytest, etc.)
Code or specifications to test
Coverage tool available

Understanding

TDD concepts (test-first development)
Coverage metrics (line, branch, function)
Testing frameworks for your language

Testing Workflows

Workflow 1: TDD (Test-Driven Development)

The core test-first development workflow that prevents mock implementations and test gaming.

Purpose: Ensure tests drive implementation, not the reverse

Pattern: Test → Fail → Implement → Pass → Verify (Independent)

Process:

Define Specifications:

# Feature Specification

**Function**: generateToken(user: User): string

**Requirements**:
- Accepts user object with id
- Returns JWT string (format: xxx.yyy.zzz)
- Token includes userId claim
- Token expires in 24 hours
- Throws error for invalid user

**Edge Cases**:
- user is null/undefined
- user.id is missing
- JWT_SECRET not configured

Generate Tests First (Test Agent):

Spawn Independent Test Agent (Task tool):

const testGeneration = await task({
  description: "Generate tests for token generation",
  prompt: `Generate comprehensive tests for token generation function.

  Specifications (read from spec.md):
  - Function: generateToken(user)
  - Requirements: [paste from spec]
  - Edge cases: [list all]

  Generate tests covering:
  1. Happy path (valid user)
  2. All edge cases
  3. Error scenarios
  4. Security considerations

  Use Jest/Vitest framework.
  Write to: tests/auth/generateToken.test.ts

  DO NOT implement the function.
  Tests should FAIL initially (function doesn't exist yet).`
});

Confirm Tests Fail:

# Run generated tests
npm test -- tests/auth/generateToken.test.ts

# Expected output:
# ❌ FAIL tests/auth/generateToken.test.ts
#    ● generateToken is not defined
#    ● [All tests fail as expected]

# If tests pass: ⚠️ Problem! Tests should fail without implementation.

Implement to Pass Tests (Implementation Agent):

Spawn Separate Implementation Agent (Task tool):

const implementation = await task({
  description: "Implement generateToken function",
  prompt: `Implement the generateToken function to pass existing tests.

  Tests are in: tests/auth/generateToken.test.ts
  Implement in: src/auth/tokens.ts

  Requirements:
  - Make all tests pass
  - Do NOT modify tests
  - Follow patterns from tests

  Success: All tests in generateToken.test.ts pass`
});

Key: Separate agent can't see test generation reasoning, only tests themselves

Verify Tests Pass:

# Run tests again
npm test -- tests/auth/generateToken.test.ts

# Expected:
# ✅ PASS tests/auth/generateToken.test.ts
#    ● generateToken › returns valid JWT ✓
#    ● generateToken › includes userId ✓
#    ● generateToken › expires in 24h ✓
#    ● generateToken › throws on invalid user ✓
#    [All tests pass]

Independent Verification (Verification Agent):

Spawn Independent Verifier (Task tool):

const verification = await task({
  description: "Verify tests and implementation quality",
  prompt: `Review tests (tests/auth/generateToken.test.ts) and implementation (src/auth/tokens.ts).

  Do NOT read implementation conversation.

  Verify:
  1. Tests actually test requirements (not overfitted)
  2. Edge cases adequately covered
  3. Implementation is quality code (not test gaming)
  4. Security considerations addressed

  Score quality (0-100).
  Write report to: tdd-verification.md`
});

// Read verification
const report = readFile('tdd-verification.md');
if (report.score >= 90) {
  // ✅ Approved: Tests and implementation are quality
} else {
  // ⚠️ Issues found: Address before commit
}

Outputs:

Tests written first (before implementation)
Tests confirm initial failure
Implementation passes all tests
Independent verification score ≥90/100
No test gaming (verified)

Validation:

[ ] Tests generated before implementation
[ ] Tests confirmed to fail initially
[ ] Separate agent implemented code
[ ] Implementation agent didn't modify tests
[ ] All tests now pass
[ ] Independent verification score ≥90
[ ] No overfitting detected

Time Estimate: 1-3 hours

Workflow 2: Test Generation

Generate comprehensive test suites covering unit, integration, E2E, property-based, and edge cases.

Purpose: Achieve ≥95% coverage through automated test generation

Pattern: Analyze → Generate (multiple types) → Execute → Validate

Process:

Analyze Target (code or specifications):

# For existing code
grep "export.*function\|export.*class" --glob "src/**/*.ts"

# Identify all testable units
# Map function signatures
# Note current coverage gaps

Generate Unit Tests (function/class level):

const unitTests = await task({
  description: "Generate unit tests",
  prompt: `Generate comprehensive unit tests for src/auth/tokens.ts.

  For each exported function/class:
  - Happy path tests
  - Edge case tests
  - Error scenario tests

  Use Jest framework.
  Write to: tests/auth/tokens.test.ts

  Target: ≥80% coverage of tokens.ts`
});

Generate Integration Tests (component level):

const integrationTests = await task({
  description: "Generate integration tests",
  prompt: `Generate integration tests for authentication flow.

  Components to integrate:
  - src/auth/tokens.ts (token generation)
  - src/auth/validate.ts (token validation)
  - src/api/auth.ts (API endpoints)

  Test workflows:
  - Token generation → validation (round-trip)
  - API login → token return → validation
  - Token expiry → validation failure

  Write to: tests/integration/auth-flow.test.ts`
});

Generate E2E Tests (complete workflows):

const e2eTests = await task({
  description: "Generate E2E tests",
  prompt: `Generate end-to-end tests for complete auth workflow.

  User journeys:
  1. Register → Email confirm → Login → Access protected resource
  2. Login → Get token → Refresh token → Continue session
  3. Failed login → Rate limiting → Account lockout

  Use Playwright or Cypress.
  Write to: tests/e2e/auth-workflows.test.ts`
});

Generate Property-Based Tests (invariants):

const propertyTests = await task({
  description: "Generate property-based tests",
  prompt: `Generate property-based tests for token generation.

  Invariants to test:
  - All generated tokens are valid JWTs (xxx.yyy.zzz format)
  - All tokens contain userId claim
  - All tokens expire (have exp claim)
  - Token validation is inverse of generation (roundtrip)

  Use fast-check (JS) or Hypothesis (Python).
  Write to: tests/properties/tokens.property.test.ts`
});

Generate Edge Case Tests:

const edgeCaseTests = await task({
  description: "Generate edge case tests",
  prompt: `Generate edge case tests for authentication.

  Edge cases (AI-discovered):
  - Empty/null inputs
  - Maximum input sizes
  - Boundary conditions (exactly 24h expiry)
  - Invalid formats
  - Unicode/special characters
  - Concurrent requests
  - Leap year date edge cases
  - Timezone edge cases

  Write to: tests/edge-cases/auth-edge-cases.test.ts`
});

Can Parallelize:

// All test generation in parallel
const [unit, integration, e2e, properties, edges] = await Promise.all([
  task({description: "Unit tests", prompt: "..."}),
  task({description: "Integration tests", prompt: "..."}),
  task({description: "E2E tests", prompt: "..."}),
  task({description: "Property tests", prompt: "..."}),
  task({description: "Edge case tests", prompt: "..."})
]);

Outputs:

Comprehensive test suite (5 test types)
≥95% coverage (AI finds cases humans miss)
All test types generated
Edge cases discovered

Validation:

[ ] Unit tests generated
[ ] Integration tests generated
[ ] E2E tests generated
[ ] Property tests generated
[ ] Edge case tests generated
[ ] All tests executable
[ ] Coverage estimated ≥95%

Time Estimate: 30-90 minutes

Workflow 3: Coverage Validation

Measure test coverage, identify gaps, generate missing tests until ≥80% (gate) or ≥95% (target) achieved.

Purpose: Ensure comprehensive test coverage

Pattern: Measure → Identify Gaps → Generate Tests → Re-Measure → Report

Process:

Execute Test Suite:

# Run all tests with coverage
npm test -- --coverage

# Or for Python
pytest --cov=src --cov-report=html

Measure Coverage (multiple dimensions):

# Coverage Report

**Line Coverage**: 78% (target: ≥80%)
**Branch Coverage**: 72% (target: ≥80%)
**Function Coverage**: 85% (target: ≥90%)
**Path Coverage**: 65% (target: ≥70%)

**Status**: Below gate (need ≥80% line coverage)

Identify Uncovered Code:

# Find uncovered lines/functions
npm run coverage:uncovered

# Or use coverage report
open coverage/index.html

# Note which functions/branches not covered

Example Output:

# Uncovered Code

**src/auth/tokens.ts**:
- Lines 45-52: Error handling branch (not tested)
- Lines 78-82: Token refresh logic (no tests)
- Function: validateExpiry() - 0% coverage

**src/auth/validate.ts**:
- Lines 23-28: Edge case (expired token by 1 second)

Generate Missing Tests (for gaps):

const gapTests = await task({
  description: "Generate tests for coverage gaps",
  prompt: `Generate tests to cover identified gaps.

  Uncovered code:
  - src/auth/tokens.ts lines 45-52 (error handling)
  - src/auth/tokens.ts lines 78-82 (refresh logic)
  - validateExpiry() function (0% coverage)

  Generate tests that exercise:
  - Error handling branches
  - Token refresh scenarios
  - validateExpiry function with various expiry times

  Write to: tests/auth/coverage-gaps.test.ts`
});

Re-Measure Coverage:

# Run tests again with new tests
npm test -- --coverage

# Check improvement
# Before: 78% → After: 87% ✅ (≥80% gate passed)

Generate Coverage Report:

# Final Coverage Report

**Line Coverage**: 87% ✅ (gate: ≥80%)
**Branch Coverage**: 82% ✅
**Function Coverage**: 92% ✅
**Path Coverage**: 74% ✅

**Status**: GATE PASSED ✅

**Gaps Remaining** (if targeting 95%):
- src/auth/admin.ts: 45% (low priority admin functions)
- Edge cases in error recovery (rare scenarios)

**Recommendation**: Current coverage sufficient for gate.
Target 95% in future iteration if needed.

Outputs:

Coverage measurements (all dimensions)
Gap analysis (uncovered code identified)
Additional tests generated for gaps
Final coverage report
Gate status (≥80% pass/fail)

Validation:

[ ] Coverage measured across all dimensions
[ ] Gaps identified specifically
[ ] Tests generated for critical gaps
[ ] Re-measurement shows improvement
[ ] Gate threshold achieved (≥80%)

Time Estimate: 30-60 minutes

Workflow 4: Independent Verification

Prevent test gaming through multi-agent ensemble verification.

Purpose: Verify tests actually test requirements (not overfitted to implementation)

Pattern: Spawn Independent Verifier → Ensemble Evaluation → Score → Report

Process:

Spawn Independent Verification Agent (Task tool):

Critical: Separate agent that hasn't seen implementation process

const verification = await task({
  description: "Independently verify test quality",
  prompt: `Verify test quality for authentication tests.

  Tests: tests/auth/*.test.ts
  Code: src/auth/*.ts
  Specifications: specs/auth-requirements.md

  DO NOT read previous implementation conversation.

  Verify:
  1. Tests match original specifications (not just what code does)
  2. Edge cases adequately covered (not just happy path)
  3. Tests would catch real bugs (not superficial)
  4. No overfitting to specific implementation details
  5. Property-based tests present (invariants)

  Score each dimension (0-20):
  - Specification alignment: /20
  - Edge case coverage: /20
  - Bug detection capability: /20
  - Implementation independence: /20
  - Property coverage: /20

  Total: /100

  Write detailed report to: independent-verification.md`
});

Multi-Agent Ensemble (for critical features):

Spawn 3 Independent Verifiers (voting ensemble):

// 3 separate agents, no shared context
const [verify1, verify2, verify3] = await Promise.all([
  task({description: "Verifier 1", prompt: verificationPrompt}),
  task({description: "Verifier 2", prompt: verificationPrompt}),
  task({description: "Verifier 3", prompt: verificationPrompt})
]);

// Aggregate scores (median to prevent outliers)
const scores = [
  verify1.score, // e.g., 92
  verify2.score, // e.g., 88
  verify3.score  // e.g., 90
];

const medianScore = median(scores); // 90
const variance = max(scores) - min(scores); // 92-88 = 4

if (variance > 15) {
  // High disagreement → spawn 2 more verifiers
  // Use 5-agent ensemble for final score
}

if (medianScore >= 90) {
  // ✅ Tests are high quality, independent
} else {
  // ⚠️ Tests may be overfitted or incomplete
}

Cross-Verify Against Specifications:

# Specification Verification

For each requirement in specs/auth-requirements.md:
✅ Req 1: "Tokens expire in 24h" → Test: validateExpiry.test.ts:15-20 ✓
✅ Req 2: "Invalid user throws error" → Test: generateToken.test.ts:45-50 ✓
❌ Req 3: "Token refresh supported" → NO TEST FOUND

**Finding**: Missing test for Requirement 3
**Action**: Generate test for token refresh

Score Test Quality:

# Test Quality Score (0-100)

## Specification Alignment (/20)
- All requirements have tests: 18/20 (1 missing)

## Edge Case Coverage (/20)
- Boundary conditions tested: 17/20 (good)

## Bug Detection Capability (/20)
- Would catch real bugs: 18/20 (strong)

## Implementation Independence (/20)
- Tests don't assume implementation details: 19/20 (excellent)

## Property Coverage (/20)
- Invariants tested: 15/20 (basic property tests present)

**Total**: 87/100

**Status**: ⚠️ PASS (≥80) but below ideal (≥90)
**Recommendation**: Add test for Req 3, improve property coverage

Provide Actionable Feedback:

# Verification Feedback

## Critical Issues (Must Fix)
None

## High Priority (Should Fix)
1. **Missing test for token refresh**
   - **What**: Requirement 3 has no test coverage
   - **Where**: tests/auth/generateToken.test.ts
   - **Why**: Core functionality untested
   - **How**: Add test suite for refresh logic
   - **Priority**: High

## Medium Priority
1. **Property test coverage low**
   - **What**: Only 2 invariants tested
   - **How**: Add tests for: roundtrip (generate→validate), expiry monotonicity

**Estimated Effort**: 30-45 min to reach ≥90 score

Outputs:

Independent verification report
Quality score (0-100)
Specification cross-check
Actionable feedback (What/Where/Why/How/Priority)
Ensemble score (if used)

Validation:

[ ] Independent agent verified (no implementation context)
[ ] Specification alignment checked
[ ] Quality scored (0-100)
[ ] Actionable feedback provided
[ ] Ensemble used for critical features (optional)

Time Estimate: 45-90 minutes (ensemble adds 30 min)

Integration with Other Skills

With multi-ai-implementation

Called During: Step 3 (Incremental Implementation)

Process:

Implementation invokes TDD workflow
Tests generated first
Implementation makes tests pass
Continuous integration

Benefit: Test-driven development enforced

With multi-ai-verification

Called For: Test quality verification

Process:

Tests generated
Verification checks test quality
Independent scoring
Feedback for improvement

Benefit: Ensures tests are high quality, not gaming

Quality Standards

Coverage Targets

Gate (must pass): ≥80% line coverage
Target (desired): ≥95% line coverage
Stretch: 100% with mutation testing

Test Quality Score

Gate: ≥80/100 for basic quality
Target: ≥90/100 for production
Dimensions: Specification alignment, edge cases, bug detection, independence, property coverage

Independence Verification

Always: Use separate test/implementation agents
Critical features: Use 3-5 agent ensemble
Scoring: Median of ensemble (prevent outliers)

Appendix A: Independence Protocol

How Test Independence is Maintained

Technical Isolation:

Test generation agent spawned via Task tool
Prompt does NOT reference implementation approach
Agent sees: Specifications ONLY (no code yet)

During TDD:

// Step 1: Test agent sees specifications only
await task({
  prompt: "Generate tests from specs/auth.md. No code exists yet."
});

// Step 2: Implementation agent sees tests only
await task({
  prompt: "Implement code to pass tests in tests/auth/. Do NOT modify tests."
});

// Step 3: Verification agent sees both, but fresh context
await task({
  prompt: "Verify tests in tests/auth/ match specs/auth.md independently."
});

Bias Prevention:

Specifications written BEFORE test generation
Test agent cannot see implementation decisions
Implementation agent cannot modify tests
Verification agent evaluates independence

Enforcement:

Prompt engineering (explicit independence)
Verification checklist (independence maintained?)
Post-verification audit (manual check)

Validation of Independence:

If verifier finds 0 issues: ⚠️ Suspicious (expected some feedback)
If verifier finds 1-3 issues: ✅ Healthy skepticism
If verifier finds >5 issues: ⚠️ Tests may be low quality

Appendix B: Technical Foundation

Test Frameworks Supported

JavaScript/TypeScript:

Jest, Vitest (unit/integration)
Playwright, Cypress (E2E)
fast-check (property-based)

Python:

pytest (unit/integration/E2E)
Hypothesis (property-based)

Coverage Tools:

JS/TS: c8, nyc, istanbul
Python: pytest-cov, coverage.py

CI/CD Integration

# .github/workflows/test.yml
name: Test Suite
on: [push, pull_request]
jobs:
  test:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3
      - run: npm install
      - run: npm test -- --coverage
      - run: npx c8 check-coverage --lines 80

Cost Controls

Test generation: Use Sonnet (faster, cheaper than Opus)
Verification: Use ensemble (3 agents) only for critical features
Budget cap: $30/month for test generation

Quick Reference

The 4 Workflows

Workflow	Purpose	Time	Output
TDD	Test-first development	1-3h	Tests + implementation (verified)
Generation	Comprehensive test creation	30-90m	5 test types, ≥95% coverage
Coverage	Validate and improve coverage	30-60m	Coverage report (≥80% gate)
Verification	Independent quality check	45-90m	Quality score (0-100)

Coverage Gates

Gate (must pass): ≥80% line coverage
Target (desired): ≥95% line coverage
Stretch: 100% with mutation tests

Quality Scores

≥90: Excellent (production-ready)
80-89: Good (acceptable)
70-79: Needs work (improve before production)
<70: Poor (regenerate tests)

multi-ai-testing ensures comprehensive, independently-verified test coverage through TDD workflows, preventing test gaming and achieving ≥95% coverage with AI-powered edge case discovery.

For TDD examples, see examples/. For independence protocol, see Appendix A.

multi-ai-testing

🇯🇵 日本人クリエイター向け解説

🎯 このSkillでできること

📦 インストール方法 (3ステップ)

📖 Skill本文(日本語訳)

Multi-AI Testing

概要

使用するタイミング

前提条件

必須

推奨

理解

テストワークフロー

ワークフロー 1: TDD (テスト駆動開発)

Multi-AI Testing

Overview

When to Use

Prerequisites

Required

Recommended

Understanding

Testing Workflows

Workflow 1: TDD (Test-Driven Development)

Workflow 2: Test Generation

Workflow 3: Coverage Validation

Workflow 4: Independent Verification

Integration with Other Skills

With multi-ai-implementation

With multi-ai-verification

Quality Standards

Coverage Targets

Test Quality Score

Independence Verification

Appendix A: Independence Protocol

How Test Independence is Maintained

Appendix B: Technical Foundation

Test Frameworks Supported

CI/CD Integration

Cost Controls

Quick Reference

The 4 Workflows

Coverage Gates

Quality Scores