superval
superbuildやautobuildで作成されたプランに基づき、すべての機能が正しく実装・連携され、エンドツーエンドで動作することを検証し、ビルドが計画通りに完了したかを確認するSkill。
📜 元の英語説明(参考)
Use when a plan has been built with superbuild or autobuild and you need to validate that every feature was implemented correctly, wired properly, and actually works end-to-end. Use after superbuild or autobuild completes, or when the user wants proof the build matches the plan.
🇯🇵 日本人クリエイター向け解説
superbuildやautobuildで作成されたプランに基づき、すべての機能が正しく実装・連携され、エンドツーエンドで動作することを検証し、ビルドが計画通りに完了したかを確認するSkill。
※ jpskill.com 編集部が日本のビジネス現場向けに補足した解説です。Skill本体の挙動とは独立した参考情報です。
下記のコマンドをコピーしてターミナル(Mac/Linux)または PowerShell(Windows)に貼り付けてください。 ダウンロード → 解凍 → 配置まで全自動。
mkdir -p ~/.claude/skills && cd ~/.claude/skills && curl -L -o superval.zip https://jpskill.com/download/9312.zip && unzip -o superval.zip && rm superval.zip
$d = "$env:USERPROFILE\.claude\skills"; ni -Force -ItemType Directory $d | Out-Null; iwr https://jpskill.com/download/9312.zip -OutFile "$d\superval.zip"; Expand-Archive "$d\superval.zip" -DestinationPath $d -Force; ri "$d\superval.zip"
完了後、Claude Code を再起動 → 普通に「動画プロンプト作って」のように話しかけるだけで自動発動します。
💾 手動でダウンロードしたい(コマンドが難しい人向け)
- 1. 下の青いボタンを押して
superval.zipをダウンロード - 2. ZIPファイルをダブルクリックで解凍 →
supervalフォルダができる - 3. そのフォルダを
C:\Users\あなたの名前\.claude\skills\(Win)または~/.claude/skills/(Mac)へ移動 - 4. Claude Code を再起動
⚠️ ダウンロード・利用は自己責任でお願いします。当サイトは内容・動作・安全性について責任を負いません。
🎯 このSkillでできること
下記の説明文を読むと、このSkillがあなたに何をしてくれるかが分かります。Claudeにこの分野の依頼をすると、自動で発動します。
📦 インストール方法 (3ステップ)
- 1. 上の「ダウンロード」ボタンを押して .skill ファイルを取得
- 2. ファイル名の拡張子を .skill から .zip に変えて展開(macは自動展開可)
- 3. 展開してできたフォルダを、ホームフォルダの
.claude/skills/に置く- · macOS / Linux:
~/.claude/skills/ - · Windows:
%USERPROFILE%\.claude\skills\
- · macOS / Linux:
Claude Code を再起動すれば完了。「このSkillを使って…」と話しかけなくても、関連する依頼で自動的に呼び出されます。
詳しい使い方ガイドを見る →- 最終更新
- 2026-05-18
- 取得日時
- 2026-05-18
- 同梱ファイル
- 1
📖 Skill本文(日本語訳)
※ 原文(英語/中国語)を Gemini で日本語化したものです。Claude 自身は原文を読みます。誤訳がある場合は原文をご確認ください。
Superval - プラン駆動型検証ループ
バージョン: 1.0.0 by skulto
概要
Superval は、構築されたプロジェクトがそのプランと一致することを証明するプラン駆動型検証エンジンです。プランを読み込み、すべてのビルド状態を読み込み、テストフレームワークを検出し、3つのレベル(構造、配線、動作)で検証します。動作検証のために、外部から見たブラックボックステストである独立したスクリプト(多くの場合、bash またはスクリプト言語)を作成します。これは、ソースコードをインポートせずに、構築されたアプリケーションを外部から自動化します。すべてが合格するまでループします。決して試行を止めません。
中心となる原則: プランは仕様です。構築されたコードは実装です。Superval はその証明です。受け入れテストは、アプリをブラックボックスとして扱います。実際のユーザーが行うように、公開インターフェースを通じて外部からアプリを調べます。
パイプラインにおける位置:
/superplan -> /superbuild or /autobuild -> /superval
(プラン) (ビルド) (検証)
使用するタイミング
/superbuildまたは/autobuildがすべてのフェーズを完了した後- 計画されたすべての機能が存在し、動作することを証明する必要がある場合
- ビルドが途中で失敗し、何が欠落しているかを評価する必要がある場合
- コンテキスト圧縮後に再開し、状態を検証する必要がある場合
- PR を作成する前に、実装が正しいことを証明する場合
使用しないタイミング
- プランが存在する前(最初に
/superplanを使用してください) - アクティブなビルド中(
/superbuildまたは/autobuildを使用してください) - プランドキュメントがないプロジェクトの場合(検証するものがない)
実行フロー
digraph superval {
rankdir=TB;
node [shape=box, style=rounded];
ingest [label="1. プランの取り込み\nプランドキュメントを見つけて読み込む"];
state [label="2. 状態の読み込み\n.autobuild/ とプランのチェックボックスをロード"];
detect [label="3. スタックの検出\nテストフレームワークとツールを見つける"];
no_framework [label="中断\nテストフレームワークが見つかりません。\nアドバイス: /superplan で\nテストピラミッドをブートストラップしてください", shape=octagon, style="rounded,filled", fillcolor="#ffcccc"];
extract [label="4. 機能の抽出\nプランから機能マップを構築"];
structural [label="5. 構造検証\n期待されるファイルは存在するか?"];
wiring [label="6. 配線検証\nモジュールは接続されているか?"];
behavioral [label="7. 動作検証\n機能は実際に動作するか?"];
report [label="8. トレーサビリティレポート\nすべての機能を結果にマッピング"];
all_pass [label="すべて合格?\nすべての機能が検証されましたか?", shape=diamond];
done [label="検証完了\nレポート: 合格", shape=doubleoctagon, style="rounded,filled", fillcolor="#ccffcc"];
feedback [label="9. フィードバックの生成\n構造化された失敗診断"];
fix [label="10. 失敗の修正\n各失敗に対処"];
no_plan [label="中断\nプランが見つかりません", shape=octagon, style="rounded,filled", fillcolor="#ffcccc"];
ingest -> state [label="プランが見つかりました"];
ingest -> no_plan [label="プランが見つかりません"];
state -> detect;
detect -> no_framework [label="テスト\nフレームワークなし"];
detect -> extract [label="フレームワーク\n検出"];
extract -> structural;
structural -> wiring;
wiring -> behavioral;
behavioral -> report;
report -> all_pass;
all_pass -> done [label="はい"];
all_pass -> feedback [label="いいえ"];
feedback -> fix;
fix -> structural [label="再検証\n(永久ループ)"];
}
フェーズリファレンスインデックス
フェーズを実行する前に、リファレンスドキュメントをお読みください。
| フェーズ | リファレンスドキュメント | いつ読むか |
|---|---|---|
| 1. プランの取り込み | references/PLAN-PARSING.md |
プランを解析する前 |
| 2. 状態の読み込み | references/STATE-FILE-CONTRACTS.md |
.autobuild/ を読み込む前 |
| 3. スタックの検出 | scripts/detect-test-framework.sh |
このスクリプトを実行する |
| 4-7. 検証 | references/VALIDATION-PATTERNS.md |
検証を行う前 |
| 5-7. テストの生成 | references/CLI-TESTING-PATTERNS.md |
テストを作成する前 |
フェーズ 1: プランの取り込み
プランドキュメントを見つけます。 次の順序で検索します。
- ユーザーが指定したパス(
/supervalへの引数として指定された場合) docs/*-plan.mdまたはdocs/*-plan-*.md- ルートレベルの
*-plan.md .autobuild/config.json->plan_pathフィールド
プランが見つからない場合: 直ちに中断します。
SUPERVAL 中断: プランが見つかりません。
検索対象:
- docs/*-plan.md
- docs/*-plan-*.md
- .autobuild/config.json
プランを作成するには、次のコマンドを実行します: /superplan <機能の説明>
プランが見つかった場合: プラン全体を読み込みます。確認を出力します。
SUPERVAL: プランがロードされました
プラン: docs/autobuild-plan.md
フェーズ: 6 (0, 1, 2A, 2B, 2C, 3)
受け入れ基準: 4
複数ファイルのプラン: プランが複数のファイル (*-plan-1.md, *-plan-2.md) に分割されている場合は、すべての部分を読み込みます。
フェーズ 2: 状態の読み込み
利用可能なすべてのビルド状態をロードして、何が試みられたかを理解します。
2a. .autobuild/ ディレクトリの確認
.autobuild/ が存在する場合(プロジェクトが /autobuild でビルドされた場合):
.autobuild/config.jsonを読み込み -> スタック、コマンド、フェーズ数を抽出- 各
.autobuild/phases/phase-*.jsonを読み込み -> フェーズごとのステータス、ファイルリスト、品質ゲートの結果を抽出 .autobuild/logs/execution.logを読み込み -> 実行タイムラインを理解
2b. プランドキュメントのチェックボックスの確認
superbuild スタイルの状態についてプランドキュメントを読み込みます。
- フェーズ概要テーブル -> ステータス列 (⬜/✅/🔄)
- フェーズごとの目標 ->
- [x]vs- [ ]の数 - フェーズごとの完了の定義 ->
- [x]vs- [ ]の数
2c. 状態の概要を出力
SUPERVAL: 状態がロードされました
ソース: .autobuild/ + プランのチェックボックス
フェーズのステータス:
フェーズ 0: ブートストラップ ......... 完了 (autobuild 検証済み)
フェーズ 1: コアサービス ...... 完了 (autobuild 検証済み)
フェーズ 2A: バックエンド API ....... 完了 (autobuild 検証済み)
フェーズ 2B: フロントエンド .......... 完了 (autobuild 検証済み)
フェーズ 2C: テスト ............. 完了 (autobuild 検証済み)
フェーズ 3: 統合 ........ 完了 (autobuild 検証済み)
期待されるファイル: 24 個作成済み、8 個変更済み
品質ゲートの主張: すべて合格
注: すべての主張は個別に検証されます。
フェーズ 3: スタックの検出
検出スクリプトを実行するか、手動で検出を実行します。
使用方法
(原文がここで切り詰められています)
📜 原文 SKILL.md(Claudeが読む英語/中国語)を展開
Superval - Plan-Driven Validation Loop
Version: 1.0.0 by skulto
Overview
Superval is a plan-driven validation engine that proves a built project matches its plan. It reads the plan, reads all build state, detects the test framework, and validates at three levels (structural, wiring, behavioral). For behavioral verification, it writes outside-in black-box acceptance tests -- independent scripts (often bash or a scripting language) that automate the built application from the outside, never importing source code. It loops until everything passes. It never stops trying.
Core principle: The plan is the specification. The built code is the implementation. Superval is the proof. Acceptance tests treat the app as a black box -- they poke it from the outside, through its public interface, like a real user would.
Position in pipeline:
/superplan -> /superbuild or /autobuild -> /superval
(plan) (build) (validate)
When to Use
- After
/superbuildor/autobuildcompletes all phases - When you need proof that every planned feature exists and works
- When a build failed partway and you need to assess what's missing
- When resuming after context compaction and need to verify state
- Before creating a PR to prove the implementation is correct
When NOT to Use
- Before a plan exists (use
/superplanfirst) - During active building (use
/superbuildor/autobuild) - For projects without a plan document (nothing to validate against)
Execution Flow
digraph superval {
rankdir=TB;
node [shape=box, style=rounded];
ingest [label="1. INGEST PLAN\nFind and read plan document"];
state [label="2. READ STATE\nLoad .autobuild/ and plan checkboxes"];
detect [label="3. DETECT STACK\nFind test framework and tools"];
no_framework [label="ABORT\nNo test framework found.\nAdvise: /superplan bootstrap\nthe testing pyramid", shape=octagon, style="rounded,filled", fillcolor="#ffcccc"];
extract [label="4. EXTRACT FEATURES\nBuild feature map from plan"];
structural [label="5. STRUCTURAL VERIFICATION\nDo expected files exist?"];
wiring [label="6. WIRING VERIFICATION\nAre modules connected?"];
behavioral [label="7. BEHAVIORAL VERIFICATION\nDo features actually work?"];
report [label="8. TRACEABILITY REPORT\nMap every feature to result"];
all_pass [label="ALL PASS?\nEvery feature verified?", shape=diamond];
done [label="VALIDATION COMPLETE\nReport: PASS", shape=doubleoctagon, style="rounded,filled", fillcolor="#ccffcc"];
feedback [label="9. GENERATE FEEDBACK\nStructured failure diagnostics"];
fix [label="10. FIX FAILURES\nAddress each failure"];
no_plan [label="ABORT\nNo plan found", shape=octagon, style="rounded,filled", fillcolor="#ffcccc"];
ingest -> state [label="plan found"];
ingest -> no_plan [label="no plan"];
state -> detect;
detect -> no_framework [label="no test\nframework"];
detect -> extract [label="framework\ndetected"];
extract -> structural;
structural -> wiring;
wiring -> behavioral;
behavioral -> report;
report -> all_pass;
all_pass -> done [label="yes"];
all_pass -> feedback [label="no"];
feedback -> fix;
fix -> structural [label="re-validate\n(loop forever)"];
}
Phase Reference Index
Read the reference doc BEFORE executing that phase:
| Phase | Reference Document | When to Read |
|---|---|---|
| 1. Ingest Plan | references/PLAN-PARSING.md |
Before parsing any plan |
| 2. Read State | references/STATE-FILE-CONTRACTS.md |
Before reading .autobuild/ |
| 3. Detect Stack | scripts/detect-test-framework.sh |
Run this script |
| 4-7. Verification | references/VALIDATION-PATTERNS.md |
Before any verification |
| 5-7. Test Generation | references/CLI-TESTING-PATTERNS.md |
Before writing any test |
Phase 1: INGEST PLAN
Find the plan document. Search in this order:
- User-provided path (if given as argument to
/superval) docs/*-plan.mdordocs/*-plan-*.md- Root-level
*-plan.md .autobuild/config.json->plan_pathfield
If no plan found: ABORT immediately.
SUPERVAL ABORT: No plan found.
Searched:
- docs/*-plan.md
- docs/*-plan-*.md
- .autobuild/config.json
To create a plan, run: /superplan <feature description>
If plan found: Read the entire plan. Output confirmation:
SUPERVAL: Plan loaded
Plan: docs/autobuild-plan.md
Phases: 6 (0, 1, 2A, 2B, 2C, 3)
Acceptance Criteria: 4
Multi-file plans: If plan is split across files (*-plan-1.md, *-plan-2.md), read ALL parts.
Phase 2: READ STATE
Load all available build state to understand what was attempted.
2a. Check for .autobuild/ directory
If .autobuild/ exists (project was built with /autobuild):
- Read
.autobuild/config.json-> extract stack, commands, phase counts - Read each
.autobuild/phases/phase-*.json-> extract per-phase status, file lists, quality gate results - Read
.autobuild/logs/execution.log-> understand execution timeline
2b. Check plan document checkboxes
Read the plan document for superbuild-style state:
- Phase Overview table -> Status column (⬜/✅/🔄)
- Per-phase objectives ->
- [x]vs- [ ]counts - Per-phase Definition of Done ->
- [x]vs- [ ]counts
2c. Output state summary
SUPERVAL: State loaded
Source: .autobuild/ + plan checkboxes
Phase Status:
Phase 0: Bootstrap ......... complete (autobuild verified)
Phase 1: Core Services ...... complete (autobuild verified)
Phase 2A: Backend API ....... complete (autobuild verified)
Phase 2B: Frontend .......... complete (autobuild verified)
Phase 2C: Tests ............. complete (autobuild verified)
Phase 3: Integration ........ complete (autobuild verified)
Files expected: 24 created, 8 modified
Quality gates claimed: ALL PASS
NOTE: All claims will be independently verified.
Phase 3: DETECT STACK
Run the detection script or perform manual detection.
Using the script
./scripts/detect-test-framework.sh <project-dir>
Manual detection (if script unavailable)
Check for these files in order:
| File | Stack |
|---|---|
package.json + tsconfig.json |
TypeScript |
package.json |
JavaScript |
pyproject.toml / requirements.txt |
Python |
go.mod |
Go |
Cargo.toml |
Rust |
Then check for test framework:
| Stack | Config Files to Check |
|---|---|
| TypeScript | vitest.config.ts, jest.config.ts, package.json deps |
| Python | pytest.ini, pyproject.toml [tool.pytest] |
| Go | Built-in (go test) |
| Rust | Built-in (cargo test) |
No test framework found: ABORT
SUPERVAL ABORT: No test framework detected.
Stack: typescript
Checked: vitest.config.ts, jest.config.ts, package.json
Cannot validate without a test framework.
To bootstrap testing, run: /superplan bootstrap the testing pyramid for me
This is a hard stop. Do NOT proceed without a test framework.
Framework found: Continue
SUPERVAL: Stack detected
Stack: typescript
Package Manager: npm
Test Framework: vitest
Linter: eslint
Formatter: prettier
Type Checker: tsc
Test Command: npm test
Test Files Found: 12
Phase 4: EXTRACT FEATURES
Parse the plan to build the complete feature map. See references/PLAN-PARSING.md for parsing details.
Extract from plan:
- Phase Overview table -> all phases with names and status
- Per-phase Objectives -> feature checklist per phase
- Per-phase Code Changes -> expected files (CREATE/MODIFY/DELETE)
- Per-phase Tests -> expected test files
- Acceptance Criteria -> high-level feature requirements
- Definition of Done -> quality gate requirements per phase
Build the feature map:
For each phase, create a feature entry:
Feature: Phase 1 - Core Services
Objectives: [config service, logger service, state service]
Files Created: [src/services/config.ts, src/services/logger.ts, src/services/state.ts]
Files Modified: [src/index.ts]
Test Files: [src/__tests__/unit/services/config.test.ts, ...]
DoD: [linter, formatter, typecheck, tests]
Output feature map:
SUPERVAL: Feature map extracted
Total features: 8 phases
Total files expected: 24 created, 8 modified
Total test files expected: 12
Acceptance criteria: 4
Phase 5: STRUCTURAL VERIFICATION (Level 1)
Question: Does the code EXIST?
For every file in the feature map:
5a. Source file existence
Check each files_created and files_modified path:
STRUCTURAL VERIFICATION
=======================
Phase 0: Bootstrap
PASS eslint.config.js
PASS .prettierrc
PASS vitest.config.ts
Phase 1: Core Services
PASS src/services/config.ts
PASS src/services/logger.ts
PASS src/services/state.ts
FAIL src/services/missing.ts <-- STRUCTURAL FAILURE
5b. Test file existence
For every source file, verify a corresponding test file exists:
TEST FILE VERIFICATION
======================
PASS src/services/config.ts -> src/__tests__/unit/services/config.test.ts
PASS src/services/logger.ts -> src/__tests__/unit/services/logger.test.ts
FAIL src/services/missing.ts -> (no test file found)
5c. Dependency verification
Check that declared dependencies are installed:
# Node.js
npm ls --depth=0 2>/dev/null | grep -c "ERR!"
# Should be 0
# Python
pip check 2>/dev/null
Structural failures gate further verification
If a file doesn't exist, skip wiring and behavioral checks for that feature. Record as STRUCTURAL FAIL in the traceability matrix.
Phase 6: WIRING VERIFICATION (Level 2)
Question: Is the code CONNECTED?
For every feature that passed structural verification:
6a. Import chain verification
Verify that entry points reach the feature code:
WIRING VERIFICATION
===================
CLI -> Commands:
PASS src/cli.ts imports src/commands/start.ts
PASS src/cli.ts imports src/commands/run.ts
PASS src/cli.ts imports src/commands/status.ts
PASS src/cli.ts imports src/commands/config.ts
Commands -> Services:
PASS src/commands/start.ts imports src/services/agent-orchestrator.ts
PASS src/commands/status.ts imports src/services/state.ts
FAIL src/commands/run.ts does NOT import src/services/plan-registry.ts
How to check: Use grep/Grep to search for import statements:
Pattern: "import .* from ['\"]\./services/config"
File: src/commands/start.ts
6b. Export verification
Verify barrel files (index.ts) re-export expected symbols:
// Dynamic import check
const mod = await import('./src/index.ts');
const keys = Object.keys(mod);
// Verify expected exports are present
6c. Service instantiation
Verify services can be imported without errors (catches circular deps):
const imports = [
import('./src/services/config.ts'),
import('./src/services/logger.ts'),
// ... all services
];
const results = await Promise.allSettled(imports);
// All should be 'fulfilled'
Phase 7: BEHAVIORAL VERIFICATION (Level 3)
Question: Does the code WORK?
For every feature that passed wiring verification:
7a. Smoke test (first gate)
The project must build and start without errors:
# Build
npm run build
# Exits 0? Continue. Exits non-zero? BEHAVIORAL FAIL for ALL features.
# Start (quick check)
node dist/cli.js --help
# Exits 0? Continue. Exits non-zero? BEHAVIORAL FAIL for ALL features.
If smoke test fails, skip all other behavioral checks. Fix the build first.
7b. Quality gates
Run all quality gate commands:
npm run lint # Linter
npm run format # Formatter (check mode)
npm run typecheck # Type checker
npm test # Full test suite
Each must exit 0. Capture output for the traceability report.
7c. Generate outside-in acceptance tests
These are BLACK BOX tests. They treat the built application as an opaque artifact and poke it from the outside -- exactly like a real user or consumer would. They do NOT import source code. They do NOT call internal functions. They automate the app under test through its public interface.
Key principle: The acceptance test is an independent script that could be written in ANY language. A bash script can test a TypeScript CLI. A Python script can test a Go API. The test language does not need to match the project language. Pick whatever is most natural for automating the interface.
What makes these different from the project's own tests:
| Project's Unit/Integration Tests | Superval Acceptance Tests | |
|---|---|---|
| Perspective | Inside the codebase | Outside the app |
| Imports source? | Yes | Never |
| Tests what? | Functions, modules, classes | The built artifact |
| Written in | Same language as project | Any scripting language |
| Runs against | Source code or mocks | The compiled/running application |
| Purpose | Developer confidence | Proof the feature exists in the product |
Acceptance test patterns by project type
Every acceptance test automates the built application through its user-facing interface. The interface determines the automation tool. Here is the complete catalog:
CLI Tools -- Bash script testing the built binary
The user's interface is the terminal. Test exactly what they'd type.
#!/bin/bash
# acceptance-test.sh -- Black box CLI tests
set -euo pipefail
PASS=0; FAIL=0
CLI="node dist/cli.js" # The BUILT artifact, not source
run_test() {
local name="$1"; shift
if "$@" >/dev/null 2>&1; then
echo " PASS $name"; PASS=$((PASS + 1))
else
echo " FAIL $name (exit code: $?)"; FAIL=$((FAIL + 1))
fi
}
assert_output_contains() {
local name="$1"; local pattern="$2"; shift 2
local output; output=$("$@" 2>&1) || true
if echo "$output" | grep -q "$pattern"; then
echo " PASS $name"; PASS=$((PASS + 1))
else
echo " FAIL $name (expected '$pattern' in output)"; FAIL=$((FAIL + 1))
fi
}
echo "ACCEPTANCE TESTS (outside-in)"
echo "=============================="
# AC-1: CLI displays version
assert_output_contains "AC-1: displays version" "[0-9]\.[0-9]" $CLI --version
# AC-2: CLI shows help for all commands
assert_output_contains "AC-2: help shows 'start'" "start" $CLI --help
assert_output_contains "AC-2: help shows 'config'" "config" $CLI --help
# AC-3: Each subcommand has --help
for cmd in start run status config; do
run_test "AC-3: $cmd --help exits 0" $CLI $cmd --help
done
echo ""; echo "RESULTS: $PASS passed, $FAIL failed"
[ "$FAIL" -eq 0 ] && exit 0 || exit 1
TUI (Terminal UI) Apps -- expect/pexpect for interactive terminals
TUI apps (ncurses, blessed, ink, bubbletea) don't just print output -- they draw screens and respond to keystrokes. You need a tool that can drive an interactive terminal session.
#!/usr/bin/expect -f
# acceptance-tui.exp -- Drives an interactive TUI app
# Uses expect (TCL-based) to send keystrokes and match screen output
set timeout 10
# Launch the built TUI app
spawn ./dist/my-tui-app
# AC-1: Main menu renders
expect {
"Select an option" { puts " PASS AC-1: main menu renders" }
timeout { puts " FAIL AC-1: main menu did not render"; exit 1 }
}
# AC-2: Arrow keys navigate menu
send "\[B" ;# Down arrow
expect {
"> Option 2" { puts " PASS AC-2: down arrow selects option 2" }
timeout { puts " FAIL AC-2: navigation broken"; exit 1 }
}
# AC-3: Enter selects item
send "\r"
expect {
"Option 2 selected" { puts " PASS AC-3: enter selects item" }
timeout { puts " FAIL AC-3: selection broken"; exit 1 }
}
# AC-4: q quits
send "q"
expect eof
puts " PASS AC-4: q exits cleanly"
Python alternative using pexpect:
#!/usr/bin/env python3
# acceptance-tui.py -- Drives interactive TUI with pexpect
import pexpect
child = pexpect.spawn('./dist/my-tui-app', timeout=10)
# AC-1: Main menu renders
child.expect('Select an option')
print(' PASS AC-1: main menu renders')
# AC-2: Navigate with arrow keys
child.send('\x1b[B') # Down arrow
child.expect('> Option 2')
print(' PASS AC-2: arrow navigation works')
child.sendline('q')
child.expect(pexpect.EOF)
print(' PASS AC-3: clean exit')
Web Applications (React, Vue, Angular, etc.) -- Playwright or Cypress
The user's interface is the browser. Playwright and Cypress automate real browsers against the running app.
// acceptance-web.spec.ts -- Playwright drives RUNNING app in REAL browser
import { test, expect } from '@playwright/test';
// No source imports. Playwright hits the live URL.
test('AC-1: User can create a new item', async ({ page }) => {
await page.goto('http://localhost:3000/items/new');
await page.fill('[data-testid="name"]', 'Test Item');
await page.click('button[type="submit"]');
await expect(page.locator('.success')).toBeVisible();
});
test('AC-2: Navigation shows all sections', async ({ page }) => {
await page.goto('http://localhost:3000');
await expect(page.getByRole('link', { name: 'Dashboard' })).toBeVisible();
await expect(page.getByRole('link', { name: 'Settings' })).toBeVisible();
});
Cypress alternative:
// acceptance-web.cy.js
describe('Acceptance Tests', () => {
it('AC-1: User can create a new item', () => {
cy.visit('http://localhost:3000/items/new');
cy.get('[data-testid="name"]').type('Test Item');
cy.get('button[type="submit"]').click();
cy.get('.success').should('be.visible');
});
});
Backend APIs -- curl/HTTP from outside the process
The user's interface is HTTP. Test via actual HTTP requests to a running server. Never import the app module.
#!/bin/bash
# acceptance-api.sh -- Tests a RUNNING API server from outside
set -euo pipefail
BASE_URL="http://localhost:3000"
PASS=0; FAIL=0
assert_http() {
local name="$1" expected_code="$2"; shift 2
local response http_code body
response=$(curl -s -w "\n%{http_code}" "$@")
http_code=$(echo "$response" | tail -1)
body=$(echo "$response" | sed '$d')
if [ "$http_code" = "$expected_code" ]; then
echo " PASS $name (HTTP $http_code)"; PASS=$((PASS + 1))
else
echo " FAIL $name (expected $expected_code, got $http_code)"; FAIL=$((FAIL + 1))
fi
}
echo "API ACCEPTANCE TESTS"
echo "===================="
# AC-1: Health endpoint
assert_http "AC-1: GET /health returns 200" "200" "$BASE_URL/health"
# AC-2: Create resource
assert_http "AC-2: POST /api/items returns 201" "201" \
-X POST "$BASE_URL/api/items" \
-H "Content-Type: application/json" \
-d '{"name": "Test"}'
# AC-3: Unauthorized access rejected
assert_http "AC-3: GET /api/secret returns 401" "401" "$BASE_URL/api/secret"
echo ""; echo "RESULTS: $PASS passed, $FAIL failed"
[ "$FAIL" -eq 0 ] && exit 0 || exit 1
iOS Apps -- XCUITest (Xcode UI Testing)
The user's interface is the touch screen. XCUITest drives the app through the accessibility hierarchy.
// AcceptanceTests.swift -- Xcode UI Test target (separate from app target)
import XCTest
class AcceptanceTests: XCTestCase {
let app = XCUIApplication()
override func setUp() {
continueAfterFailure = false
app.launch() // Launches the BUILT .app bundle
}
func testAC1_LoginScreenAppears() {
XCTAssertTrue(app.textFields["Email"].exists)
XCTAssertTrue(app.secureTextFields["Password"].exists)
XCTAssertTrue(app.buttons["Sign In"].exists)
}
func testAC2_UserCanLogin() {
app.textFields["Email"].tap()
app.textFields["Email"].typeText("test@example.com")
app.secureTextFields["Password"].tap()
app.secureTextFields["Password"].typeText("password123")
app.buttons["Sign In"].tap()
XCTAssertTrue(app.staticTexts["Welcome"].waitForExistence(timeout: 5))
}
}
Android Apps -- Espresso or UI Automator
Espresso for single-app testing, UI Automator for cross-app flows.
// AcceptanceTest.kt -- Android instrumentation test (separate from app code)
@RunWith(AndroidJUnit4::class)
class AcceptanceTest {
@get:Rule
val activityRule = ActivityScenarioRule(MainActivity::class.java)
@Test
fun ac1_loginScreenAppears() {
// Drives the RUNNING app through the accessibility layer
onView(withId(R.id.email_input)).check(matches(isDisplayed()))
onView(withId(R.id.password_input)).check(matches(isDisplayed()))
onView(withId(R.id.sign_in_button)).check(matches(isDisplayed()))
}
@Test
fun ac2_userCanLogin() {
onView(withId(R.id.email_input)).perform(typeText("test@example.com"))
onView(withId(R.id.password_input)).perform(typeText("password123"))
onView(withId(R.id.sign_in_button)).perform(click())
onView(withText("Welcome")).check(matches(isDisplayed()))
}
}
React Native Apps -- Detox
Detox tests the built app on a real device/simulator, not the JS bundle.
// acceptance.e2e.js -- Detox drives the BUILT React Native app
describe('Acceptance Tests', () => {
beforeAll(async () => {
await device.launchApp(); // Launches the BUILT .app/.apk
});
it('AC-1: login screen renders', async () => {
await expect(element(by.id('email-input'))).toBeVisible();
await expect(element(by.id('password-input'))).toBeVisible();
await expect(element(by.id('sign-in-button'))).toBeVisible();
});
it('AC-2: user can login', async () => {
await element(by.id('email-input')).typeText('test@example.com');
await element(by.id('password-input')).typeText('password123');
await element(by.id('sign-in-button')).tap();
await expect(element(by.text('Welcome'))).toBeVisible();
});
});
Desktop Apps (Electron, Tauri, native) -- Accessibility API via bash/script
Desktop apps expose an accessibility tree. On macOS, use AppleScript/osascript. On Windows, use UI Automation via PowerShell. On Linux, use xdotool + AT-SPI.
macOS -- AppleScript via osascript:
#!/bin/bash
# acceptance-desktop-macos.sh -- Drives desktop app via macOS Accessibility API
set -euo pipefail
APP_NAME="MyApp"
APP_PATH="./dist/MyApp.app"
# Launch the built app
open "$APP_PATH"
sleep 3 # Wait for launch
PASS=0; FAIL=0
assert_ax() {
local name="$1" script="$2"
if osascript -e "$script" 2>/dev/null; then
echo " PASS $name"; PASS=$((PASS + 1))
else
echo " FAIL $name"; FAIL=$((FAIL + 1))
fi
}
# AC-1: Main window appears
assert_ax "AC-1: main window exists" \
"tell application \"System Events\" to tell process \"$APP_NAME\" to exists window 1"
# AC-2: Menu bar has expected items
assert_ax "AC-2: File menu exists" \
"tell application \"System Events\" to tell process \"$APP_NAME\" to exists menu bar item \"File\" of menu bar 1"
# AC-3: Click a button and verify result
osascript -e "
tell application \"System Events\"
tell process \"$APP_NAME\"
click button \"New Document\" of window 1
end tell
end tell
" 2>/dev/null
sleep 1
assert_ax "AC-3: new document created" \
"tell application \"System Events\" to tell process \"$APP_NAME\" to get name of window 1 contains \"Untitled\""
# Cleanup
osascript -e "tell application \"$APP_NAME\" to quit"
echo ""; echo "RESULTS: $PASS passed, $FAIL failed"
[ "$FAIL" -eq 0 ] && exit 0 || exit 1
Electron apps -- Playwright with Electron support:
// acceptance-electron.spec.ts -- Playwright can drive Electron directly
import { test, expect, _electron as electron } from '@playwright/test';
test('AC-1: app launches and shows main window', async () => {
const app = await electron.launch({ args: ['./dist/main.js'] });
const window = await app.firstWindow();
await expect(window.locator('h1')).toContainText('Welcome');
await app.close();
});
Libraries (npm, pip, crate) -- Script that installs and uses the published package
The user's interface is import/require from a package. Test the published artifact, not source.
#!/bin/bash
# acceptance-library.sh -- Install from local tarball and test
set -euo pipefail
TMPDIR=$(mktemp -d)
trap 'rm -rf $TMPDIR' EXIT
# Pack the built library (not source)
npm pack --pack-destination "$TMPDIR"
cd "$TMPDIR"
npm init -y >/dev/null 2>&1
npm install ./mylib-*.tgz >/dev/null 2>&1
# AC-1: Can import the package
node -e "const lib = require('mylib'); console.log('PASS AC-1: import works')" || {
echo "FAIL AC-1: import failed"; exit 1
}
# AC-2: Exported function works
node -e "
const { createThing } = require('mylib');
const result = createThing({ name: 'test' });
if (result.name === 'test') {
console.log('PASS AC-2: createThing works');
} else {
console.log('FAIL AC-2: unexpected result');
process.exit(1);
}
"
Choosing the automation tool
| Project Type | User Interface | Automation Tool | Script Language |
|---|---|---|---|
| CLI tool | Terminal (stdout/stderr/exit code) | Direct invocation | Bash |
| TUI app | Interactive terminal (ncurses, etc.) | expect / pexpect | TCL (expect) or Python (pexpect) |
| Web app (React, Vue, etc.) | Browser | Playwright or Cypress | TypeScript/JavaScript |
| Backend API | HTTP | curl / httpie | Bash |
| iOS app | Touch screen / accessibility tree | XCUITest | Swift |
| Android app | Touch screen / accessibility tree | Espresso or UI Automator | Kotlin/Java |
| React Native | Touch screen (cross-platform) | Detox | JavaScript |
| Desktop app (macOS) | Windows / accessibility tree | osascript (AppleScript) | Bash + AppleScript |
| Desktop app (Electron) | Browser-in-window | Playwright (Electron mode) | TypeScript |
| Desktop app (Windows) | Windows / accessibility tree | PowerShell + UI Automation | PowerShell |
| Desktop app (Linux) | X11/Wayland / AT-SPI | xdotool + AT-SPI | Bash or Python |
| Library/package | import/require from package | Install package, call functions | Bash + consumer language |
The guiding principle: Match the automation tool to the user-facing interface, not the implementation language. A Go CLI is tested with bash. A Rust TUI is tested with expect. A TypeScript web app is tested with Playwright. The test script is always external to the codebase.
7d. Run acceptance tests
Execute the generated acceptance test script:
# CLI / API / Desktop / Library (bash scripts):
bash .superval/acceptance-tests/acceptance-test.sh
# Web app (Playwright):
npx playwright test .superval/acceptance-tests/
# Web app (Cypress):
npx cypress run --spec .superval/acceptance-tests/
# TUI (expect):
expect .superval/acceptance-tests/acceptance-tui.exp
# iOS (XCUITest):
xcodebuild test -scheme AcceptanceTests -destination 'platform=iOS Simulator,name=iPhone 15'
# Android (Espresso):
./gradlew connectedAndroidTest
# React Native (Detox):
detox test --configuration ios.sim.release
Record results per acceptance criterion. The exit code is the verdict:
- Exit 0: All acceptance tests pass
- Exit non-zero: At least one acceptance test failed
Critical rule: NEVER import source code in acceptance tests
Acceptance tests automate the APP, not the CODE.
These tests must NOT:
import { anything } from '../../src/...' // importing source
require('../src/...') // importing source
from mypackage.internal import ... // importing source
These tests MUST:
Spawn a process (bash, exec, subprocess.run)
Hit a URL (curl, Playwright, Cypress)
Drive a UI (XCUITest, Espresso, Detox, osascript)
Drive an interactive tty (expect, pexpect)
Install and use a package (npm pack + npm install + require)
If you find yourself importing source code, STOP.
You are writing an integration test, not an acceptance test.
Acceptance tests automate the built application from the outside.
Phase 8: TRACEABILITY REPORT
Map every plan feature to its verification result.
Output format
SUPERVAL TRACEABILITY REPORT
=============================
Plan: docs/autobuild-plan.md
Project: /Users/adamcobb/codes/autobuild
Attempt: 1
Date: 2025-01-25T10:00:00Z
FEATURE VERIFICATION
+--------+---------------------------+-----------+---------+------------+--------+
| Phase | Feature | Struct. | Wiring | Behavioral | Status |
+--------+---------------------------+-----------+---------+------------+--------+
| 0 | Bootstrap (eslint) | PASS | PASS | PASS | PASS |
| 0 | Bootstrap (prettier) | PASS | PASS | PASS | PASS |
| 1 | Config service | PASS | PASS | PASS | PASS |
| 1 | Logger service | PASS | PASS | PASS | PASS |
| 1 | State service | PASS | PASS | PASS | PASS |
| 2 | CLI start command | PASS | PASS | PASS | PASS |
| 2 | CLI run command | PASS | FAIL | SKIP | FAIL |
+--------+---------------------------+-----------+---------+------------+--------+
QUALITY GATES
+-------------+---------+--------------------------------+
| Gate | Result | Output |
+-------------+---------+--------------------------------+
| Build | PASS | tsc compiled successfully |
| Lint | PASS | 0 errors, 0 warnings |
| Format | PASS | All files formatted |
| Typecheck | PASS | No type errors |
| Test | PASS | 94 passed, 0 failed |
+-------------+---------+--------------------------------+
ACCEPTANCE TESTS
+--------+------------------------------------------+---------+
| AC | Criterion | Result |
+--------+------------------------------------------+---------+
| AC-1 | CLI displays version | PASS |
| AC-2 | CLI shows help for all commands | PASS |
| AC-3 | Each command has --help | PASS |
| AC-4 | Config loads from file | FAIL |
+--------+------------------------------------------+---------+
SUMMARY: 6/7 features verified, 3/4 acceptance criteria met
STATUS: FAIL
Phase 9: GENERATE FEEDBACK (on failure)
For each failure, produce structured, actionable feedback:
FAILURE REPORT
==============
FAILURE 1:
Feature: CLI run command
Phase: 2
Level: WIRING
Check: Import chain from src/commands/run.ts to src/services/plan-registry.ts
Expected: run.ts should import and use planRegistry
Actual: No import statement found for plan-registry in run.ts
Suggestion: Add `import { planRegistry } from '../services/plan-registry.js';` to run.ts
FAILURE 2:
Feature: AC-4 Config loads from file
Phase: 1
Level: BEHAVIORAL
Check: Config service reads from ~/.autobuild/config.json
Expected: loadConfig() returns parsed config when file exists
Actual: Test threw: "Cannot read properties of undefined (reading 'plansDir')"
Suggestion: Check config.ts loadConfig() error handling for missing fields
Phase 10: FIX FAILURES
Fix every reported failure. Work through them in order: structural first, then wiring, then behavioral.
Fix strategy
| Failure Level | Fix Action |
|---|---|
| Structural (file missing) | Create the file with content from the plan |
| Structural (test missing) | Create the test file |
| Wiring (import missing) | Add the import statement |
| Wiring (export missing) | Add the export |
| Behavioral (build fails) | Fix compilation errors |
| Behavioral (test fails) | Fix the test or implementation |
| Behavioral (quality gate) | Run the fix command (lint:fix, format:fix) |
| Behavioral (acceptance test) | Fix the feature implementation |
After fixing: RETURN TO PHASE 5
Re-run the entire verification from structural through behavioral. Do not skip levels even if only behavioral tests failed -- a fix may have introduced structural or wiring regressions.
The Validation Loop: NEVER STOP
IRON RULE: Superval loops until ALL features pass ALL levels.
There is no maximum retry count.
There is no "good enough."
There is no "let's move on."
If the plan says it should exist, it must exist.
If the plan says it should work, it must work.
If the plan says it should be tested, it must be tested.
Keep trying. Fix. Verify. Fix. Verify.
Stop only when the traceability report reads: STATUS: PASS
Escalation strategy
If the same failure persists after 3 fix attempts:
- Expand context: Read more of the surrounding code to understand the system
- Read the plan more carefully: The fix may require understanding a different phase
- Check dependencies: The failure may be caused by a different feature's incompleteness
- Try a different approach: If the obvious fix isn't working, rethink the implementation
- Ask the user: If truly stuck after multiple diverse attempts, describe the problem and ask for guidance
But do not stop the loop. Even asking the user is a step in the loop, not an exit from it.
Integration with Build State
Reading .autobuild/ state
If .autobuild/ exists, superval can:
- Skip stack detection -- use
config.jsonstack info - Know which files to check -- use
phases/*.jsonfile lists - Compare claims -- autobuild's
verification.fresh_verificationvs superval's own results - Understand failures -- read
errorfield for context on what went wrong
Reading superbuild plan updates
If the plan has checked checkboxes (- [x]):
- Know what was claimed complete -- checked objectives
- Know quality gate claims -- checked DoD items
- Verify independently -- superbuild's self-reported status is not evidence
Trust hierarchy
Plan document: SOURCE OF TRUTH (what should exist)
.autobuild/ state: EVIDENCE (what was attempted)
Plan checkboxes: CLAIMS (what was self-reported)
Superval verification: PROOF (what actually exists and works)
Superval trusts nothing. It verifies everything.
Output Artifacts
Superval writes its results to .superval/:
.superval/
report.json # Machine-readable traceability report
report.md # Human-readable report (same as terminal output)
acceptance-tests/ # Generated acceptance test files
structural.test.ts # Level 1 checks as test file
wiring.test.ts # Level 2 checks as test file
behavioral.test.ts # Level 3 acceptance tests
These files persist across validation attempts so progress can be tracked.
Quick Reference
Commands
| Action | Command |
|---|---|
| Detect stack | ./scripts/detect-test-framework.sh . |
| Run quality gates | npm run lint && npm run format && npm run typecheck && npm test |
| Run acceptance tests | npx vitest run .superval/acceptance-tests/ |
| Smoke test | npm run build && node dist/cli.js --help |
Status Icons
| Icon | Meaning |
|---|---|
| PASS | Verified and working |
| FAIL | Verification failed (needs fix) |
| SKIP | Skipped (dependency failed or phase skipped) |
| N/A | Not applicable (config files, docs) |
Abort Conditions (only 2)
- No plan found -> Cannot validate without specification
- No test framework -> Cannot run behavioral verification
Everything else is fixable. Keep looping.
Common Mistakes
| Mistake | Fix |
|---|---|
| Trusting build state without verifying | Always run fresh verification |
| Skipping structural checks after behavioral fix | Always re-run all 3 levels |
| Stopping after partial pass | Loop until 100% pass |
| Importing source code in acceptance tests | Acceptance tests are BLACK BOX -- spawn process, hit URL, drive UI, never import |
| Picking automation tool based on project language | Match tool to USER INTERFACE: bash for CLI, Playwright for web, XCUITest for iOS, etc. |
| Generating tests that test implementation detail | Test user-visible behavior through the public interface only |
| Running acceptance tests against source (tsx/ts-node) | Run against the BUILT artifact (node dist/cli.js, not npx tsx src/cli.ts) |
| Using unit test patterns for TUI/desktop apps | TUI needs expect/pexpect, desktop needs accessibility API (osascript, UI Automation) |
| Checking only files from state, not from plan | Plan is the source of truth, not state files |
| Accepting "mostly works" | The plan is binary. It either matches or it doesn't. |
Red Flags -- STOP and Reassess
If you find yourself thinking:
- "Close enough" -- No. The plan is the spec. Match it exactly.
- "The tests pass so it's fine" -- No. Tests passing doesn't mean the feature is wired correctly.
- "That feature isn't important" -- No. If it's in the plan, it must be verified.
- "I'll skip this one" -- No. Every feature. Every level. Every time.
- "The user can verify this manually" -- No. Superval's job is automated proof.
These thoughts mean you're about to exit the loop prematurely. Don't.