jpskill.com
💬 コミュニケーション コミュニティ

codex-review-cycle

Run a bounded 3-cycle interactive review-and-fix workflow against a user-chosen git review target (working-tree diff, current branch vs. auto-detected base, or an explicit commit/tag/branch ref) using the codex plugin. Each cycle invokes codex `review` or `adversarial-review --json`; Claude verifies each finding against a six-item validity checklist, calls `review-scope-guard` for Definition-of-Done triage, and the user picks which findings to fix before the next cycle. Covers both code diffs and markdown planning documents. Hard cap at 3 cycles. Use ONLY when the user explicitly asks to run the codex review cycle on working-tree changes, a committed branch diff, or an explicit base ref. Do NOT trigger for single-shot review requests, auto-hardening, background reviews, plan drafting, or when the chosen target would produce an empty diff.

⚡ おすすめ: コマンド1行でインストール(60秒)

下記のコマンドをコピーしてターミナル(Mac/Linux)または PowerShell(Windows)に貼り付けてください。 ダウンロード → 解凍 → 配置まで全自動。

🍎 Mac / 🐧 Linux
mkdir -p ~/.claude/skills && cd ~/.claude/skills && curl -L -o codex-review-cycle.zip https://jpskill.com/download/9631.zip && unzip -o codex-review-cycle.zip && rm codex-review-cycle.zip
🪟 Windows (PowerShell)
$d = "$env:USERPROFILE\.claude\skills"; ni -Force -ItemType Directory $d | Out-Null; iwr https://jpskill.com/download/9631.zip -OutFile "$d\codex-review-cycle.zip"; Expand-Archive "$d\codex-review-cycle.zip" -DestinationPath $d -Force; ri "$d\codex-review-cycle.zip"

完了後、Claude Code を再起動 → 普通に「動画プロンプト作って」のように話しかけるだけで自動発動します。

💾 手動でダウンロードしたい(コマンドが難しい人向け)
  1. 1. 下の青いボタンを押して codex-review-cycle.zip をダウンロード
  2. 2. ZIPファイルをダブルクリックで解凍 → codex-review-cycle フォルダができる
  3. 3. そのフォルダを C:\Users\あなたの名前\.claude\skills\(Win)または ~/.claude/skills/(Mac)へ移動
  4. 4. Claude Code を再起動

⚠️ ダウンロード・利用は自己責任でお願いします。当サイトは内容・動作・安全性について責任を負いません。

🎯 このSkillでできること

下記の説明文を読むと、このSkillがあなたに何をしてくれるかが分かります。Claudeにこの分野の依頼をすると、自動で発動します。

📦 インストール方法 (3ステップ)

  1. 1. 上の「ダウンロード」ボタンを押して .skill ファイルを取得
  2. 2. ファイル名の拡張子を .skill から .zip に変えて展開(macは自動展開可)
  3. 3. 展開してできたフォルダを、ホームフォルダの .claude/skills/ に置く
    • · macOS / Linux: ~/.claude/skills/
    • · Windows: %USERPROFILE%\.claude\skills\

Claude Code を再起動すれば完了。「このSkillを使って…」と話しかけなくても、関連する依頼で自動的に呼び出されます。

詳しい使い方ガイドを見る →
最終更新
2026-05-18
取得日時
2026-05-18
同梱ファイル
1
📖 Claude が読む原文 SKILL.md(中身を展開)

この本文は AI(Claude)が読むための原文(英語または中国語)です。日本語訳は順次追加中。

Codex Review Cycle

Overview

A simple, user-driven review-and-fix workflow. Every cycle runs one codex review, Claude verifies each finding's validity, presents a verbatim summary, and the user picks which findings to address. Claude then applies only the chosen fixes and loops. Three cycles is a hard cap — the loop never runs a fourth cycle without the user starting a new invocation.

The skill is deliberately simple. It does not auto-fix, does not run parallel reviewers, does not compute stall fingerprints, does not manage autonomy bands, and does not delegate to rescue subagents. Claude is the only fix applier; the user is the only arbiter of which findings matter.

Language

All user-facing output is rendered in the user's language (the language the user has been using in the conversation, or as configured in the Claude Code system-level language setting). This section is the authoritative translation contract — any per-language sample reference (e.g. references/summary-samples.ja.md) is illustrative only and MUST NOT contradict these rules.

Translate into the user's language:

  • Section headings and column labels (Claude's note, Recommended action, Scope, Severity, etc. column header text)
  • Free-text fields Claude authors: Claude's note body, Recommended action values, fleet-rate warnings, stop-signal footer prose, termination messages, post-cycle review assessment
  • AskUserQuestion question, header, and option label / description fields

Keep verbatim (do NOT translate), regardless of user language:

  • Codex title field (surfaced in the Title (codex verbatim) column)
  • Codex recommendation field (quoted below the table per §Summary Output Template)
  • Severity values (high / medium / low) — codex output
  • Validity outcome keywords (valid / partially-valid / invalid)
  • Scope category names (must-fix / minimal-hygiene / reject-out-of-scope / reject-noise)
  • Stop-signal Status keywords (ACTIVE / ADVISORY / WARNING / silent)
  • Technical identifiers: file paths, git refs (SHAs, branch names), fingerprint, cluster_id, field names like applied_fixes, not_evaluated_signal_names
  • Cycle indices (cycle N, N/3)

For a Japanese rendering example that applies these rules, see references/summary-samples.ja.md. For German, Korean, or other languages, apply the same rules directly — the Japanese sample is an illustration, not a template to translate.

When to Use

Use this skill ONLY when:

  • The user explicitly asks to run the codex review cycle on one of: the current working-tree diff, the current branch vs. its base branch, or an explicit commit/tag/branch ref, and
  • The current working directory is a git repository, and
  • The chosen review target produces a non-empty diff (working tree has uncommitted changes, or HEAD is ahead of the chosen base ref).

Do NOT use this skill when:

  • The user wants a single codex review pass — use the codex plugin directly.
  • The resolved review target produces an empty diff — stop and tell the user.
  • The user is drafting a plan from scratch — use vibe-planning-guard instead.
  • A review is running as a background or automatic check.

Review Target Modes

The skill supports three review targets, chosen once at Phase 0 and fixed for all three cycles:

  • working-tree — uncommitted changes (tracked-modified + staged + untracked). Codex is invoked with --scope working-tree. Claude-side diff commands use git diff HEAD --name-only plus git ls-files --others --exclude-standard for untracked files.
  • branch — HEAD vs. the auto-detected default branch (tries local main/master/trunk first, then origin/*). Codex is invoked with --base <base_sha> (frozen SHA resolved from the detected ref at Phase 0). Claude-side diff commands use git diff --name-only <base_sha>...HEAD (triple-dot: merge-base semantics).
  • base-ref — HEAD vs. an explicit ref the user supplies (any commit SHA, tag, or branch name). Codex is invoked with --base <base_sha> (frozen SHA resolved from the user-supplied ref at Phase 0). Claude-side diff commands use git diff --name-only <base_sha>...HEAD.

The three modes share the same workflow — only the Phase 0 target-resolution step, the codex CLI flags, and the diff command Claude uses for validity checks differ.

Target Kinds

The skill auto-detects code vs plan from the diff file extensions. See references/focus-text.md for the detection rules. Mixed targets (any code file present) are treated as code; item 6 of references/validity-checklist.md filters out detailed-design findings on markdown files inside a code cycle.

Review Variant Selection

At Phase 0, ask the user once which codex review variant to use for all three cycles:

  • review — codex's native review command. Output is free-form text. Claude manually structures each finding section into title / recommendation / body before running the validity check.
  • adversarial-review — codex's adversarial review with --json. Output is structured findings[]. Each element has severity, file, line_start, title, recommendation, body. Adversarial cycles also carry a <review_context> block (see §Review Context Format) so codex keeps the same angle across cycles.

The choice is fixed for the whole loop. If the user wants to switch variants, they must restart the skill.

Recommendation: use adversarial-review unless there is a specific reason not to. The review variant is retained for environments where structured JSON output is unavailable or for users who specifically want free-form codex output, but it operates in a minimum-functionality mode: no <review_context> carry across cycles, no proposal-mode DoD (interview only), no V=0 cycle-N+1 override, no rejected-findings forwarding. Expect to get a single-shot review that Claude structures manually, with no adaptation between cycles. For multi-cycle review-and-fix workflows, adversarial-review is strictly the better path.

Workflow

Phase 0 — Preflight (runs once)

  1. Verify git repository.

    • Bash: git rev-parse --is-inside-work-tree — stop if not inside a git repo.
  2. Resolve the review target. Ask the user once, via AskUserQuestion, which review scope this cycle should cover. Offer three options:

    • working-tree — review the uncommitted diff (tracked-modified + staged + untracked).
    • branch — review HEAD vs. the auto-detected default branch.
    • base-ref — review HEAD vs. an explicit ref the user provides.

    After the scope choice:

    • working-tree: no additional input needed.
    • branch: auto-detect the default branch by trying main, master, trunk as local refs via git show-ref --verify --quiet refs/heads/<name>, then falling back to origin/<name>. If none resolve, stop with Could not auto-detect a base branch. Re-run with scope = base-ref and supply a ref explicitly.
    • base-ref: ask a follow-up free-form AskUserQuestion for the ref string. Validate it with git rev-parse --verify <ref> — if the command fails, stop with Base ref '<ref>' not found in this repository.

    Store the result as review_target. Fully construct the object in Phase 0 so proposal DoD mode in step 7 has the data it needs:

    • scope — one of working-tree, branch, base-ref.
    • base_refnull for working-tree; the resolved ref string for branch and base-ref (kept as display metadata only).
    • base_shanull for working-tree; for branch / base-ref, the immutable commit SHA resolved from base_ref at Phase 0 via git rev-parse <base_ref>. All subsequent commands across all 3 cycles (codex --base, diff commands, commit-range enumeration, validity-check scope diff, Part B ownership audit, soft-reset anchor) use base_sha, never base_ref, so a mutable ref (e.g. main, origin/main) advancing mid-run cannot drift the review target. If base_ref != base_sha at any later check (user manually updated the ref), print a one-line warning Base ref '<base_ref>' moved from <base_sha> during the run; continuing against the frozen SHA. and proceed.
    • diff_command — the exact git diff --name-only … command Claude will reuse for target-kind detection and validity checks:
      • working-treegit diff HEAD --name-only (paired with git ls-files --others --exclude-standard for untracked files)
      • branch / base-refgit diff --name-only <base_sha>...HEAD (triple-dot: merge-base semantics; uses the frozen SHA, not the mutable base_ref, so target-kind detection and validity checks cannot drift if the named ref advances mid-run)
    • diff_files — the executed output of diff_command. For working-tree scope, this MUST be the union of git diff HEAD --name-only and git ls-files --others --exclude-standard (tracked-modified + staged + untracked); omitting untracked files would undercount the actual review surface. For branch / base-ref it is just the diff_command output.
    • diff_numstat — for branch / base-ref: git diff --numstat <base_sha>...HEAD. For working-tree: git diff --numstat HEAD PLUS a synthesized per-untracked-file line count (e.g. wc -l on each untracked file, emitted in the same <added>\t<deleted>\t<path> shape as numstat so the total LOC calculation is uniform). Omitting untracked line counts — as an earlier draft did — would let an untracked-only working-tree diff (100% new files) report 0 numstat LOC and silently qualify for proposal-mode DoD with no commit messages or patch to ground intent. Used to size the diff for the proposal-mode threshold (≤ ~100 changed LOC).
    • commit_rangenull for working-tree; <base_sha>..HEAD (double-dot, using the frozen SHA for commit-delta enumeration) for branch / base-ref. NOTE: diff uses triple-dot (merge-base), commit enumeration uses double-dot (exact commits on HEAD that are not on base). Using base_sha (not base_ref) keeps enumeration stable against mid-run ref movement.
    • commit_messages[][] for working-tree; git log --format='%s%n%b' <commit_range> splits for branch / base-ref, trimmed per commit. Derives from the frozen-SHA commit_range above. Proposal-mode DoD drafting reads these to ground item 1 (intent) and item 4 (out-of-scope) in what the commits actually claim.
    • diff_patch_excerpts — bounded content-bearing evidence: a handful of representative files shown mostly in full (small untracked files, key tracked hunks), trimmed with [truncated — <M> more lines] when needed. Keep the total roughly on the order of a few KB so the proposal-mode prompt stays manageable. The goal is "enough for Claude to infer intent and out-of-scope boundaries", not byte-exact compliance.
      • working-tree: always synthesize.
      • branch / base-ref: omit only when the proposal-mode evidence gate is already satisfied by commit messages (≥20-char subject + non-empty body in at least one commit in scope). If the evidence gate fails on messages alone — squashed / templated / vague commits — synthesize excerpts sourced exclusively from the target commit range (git diff <base_sha>...HEAD output), never from local working-tree state or untracked files, using the same bounded-budget shape as working-tree. If the range cannot yield a usable excerpt (binary-only, no textual diff), fall back to interview mode. This preserves the existing invariant that DoD drafting for branch/base-ref never anchors on a short squash-commit title AND never leaks out-of-range evidence into the proposal.

    Proposal-mode evidence gate: even when diff_numstat totals ≤ 100 LOC, proposal mode requires content-bearing evidence.

    • For working-tree scope: if commit_messages[] is empty AND diff_patch_excerpts has no non-blank content (e.g. all untracked files are empty or binary, or all tracked-modified hunks collapsed to no patch), fall back to interview mode — filenames and line counts alone cannot draft six DoD items with enough fidelity.
    • For branch / base-ref scope: commit messages alone are NOT sufficient evidence. Squashed, templated, or vague messages like "fix review comments", "wip", "update tests" can pass the LOC threshold while giving proposal mode no usable intent or out-of-scope signal. Require that commit_messages[] contain at least one commit with a subject of ≥20 characters AND a non-empty body, OR fall back to populating diff_patch_excerpts for branch/base-ref (same budget-based heuristic as working-tree) and passing it forward. If neither evidence path is available — all commit messages are short/empty and no patch excerpts are synthesized — fall back to interview mode. The risk this gate blocks is a DoD drafted from the title of a squash commit, which then anchors reject-out-of-scope decisions for the whole run.

    Every cycle reuses the same review_target so the diff scope stays stable even after fixes are applied.

  3. Verify the target has a non-empty diff.

    • working-tree: git status --porcelain must be non-empty. If empty, stop with No working-tree diff to review. The codex-review-cycle skill requires uncommitted changes when scope is working-tree.
    • branch / base-ref: git diff --name-only <base_sha>...HEAD (use the frozen SHA from step 2, not the mutable base_ref) must be non-empty. If empty, stop with No committed changes between <base_ref> (<base_sha>) and HEAD. The codex-review-cycle skill requires a non-empty diff for branch/base-ref scopes.
  4. Ensure codex is ready. Invoke Skill(codex:setup) once to confirm the codex CLI is configured. Stop if setup reports a blocking failure.

  5. Detect target kind.

    • Run review_target.diff_command. For working-tree, also run git ls-files --others --exclude-standard and union the untracked list with the diff output.
    • Apply the extension rules in references/focus-text.md.
    • Record target_kind as either code or plan.
  6. Ask for review variant (once, via AskUserQuestion). Two options: review and adversarial-review. Store the choice as variant.

  7. Pre-collect DoD (adversarial only) and initialize cycle state. Set rejected_ledger = [], cycle_history = [], dod = null.

    • If variant == adversarial-review, collect the six-item Definition of Done now by invoking the four-mode collection flow in skills/review-scope-guard/references/dod-template.md §Collection Modes, passing the fully-constructed review_target from Phase 0 step 2 (including diff_files, diff_numstat, commit_messages[]) as the proposal-mode input contract. Default to interview; use proposal when review_target.diff_numstat totals ≤ ~100 LOC AND commit-messages or patch excerpts provide content-bearing evidence; use quick when the diff is ≤ ~30 LOC AND the user explicitly said "quick DoD" / "minimal DoD" / similar; use free-text when the user has already pasted a DoD block in the conversation. If review_target is somehow incomplete (defensive check — Phase 0 step 2 should have populated every field), force interview mode per the scope-guard input contract. Cache the result on dod so <review_context cycle="1"> <intent> can be populated from DoD item 1 before step 8 runs. Pass the cached dod (not null) to review-scope-guard at step 10a so the scope-triage skill does not re-ask. This solves the cycle-1 dependency where <review_context> would otherwise need intent that had not yet been collected.
    • If variant == review, leave dod = null here. Native review does not carry <review_context>, so there is no early-intent dependency. Step 10a's first review-scope-guard invocation will collect DoD interactively at that point.

    Also record pre_cycle_1_head = git rev-parse HEAD — this is the anchor for the step 20 soft-reset at termination. For working-tree scope this value is unused.

    Subsequent cycles reuse the cached DoD and pass the running rejected_ledger / cycle_history forward.

Phase 1 — Review Cycle (repeats up to 3 times; counter N = 1..3)

  1. Run the review. Compute codex_scope_args from review_target.scope:

    • working-tree--scope working-tree
    • branch--base <review_target.base_sha> (frozen SHA from Phase 0; NOT base_ref, which is mutable)
    • base-ref--base <review_target.base_sha>

    Cycle-N>1 preflight (branch / base-ref only): before invoking codex on cycle 2 or 3, verify the state between cycles is as expected. Let expected_commit = !cycle_history[N-1].no_fix_cycle (true for normal fix cycles, false for V=0 no-fix retries). Run the following single-pass check:

    • HEAD movement: compare git rev-parse HEAD against cycle_history[N-1].pre_pause_head.
      • If expected_commit is true: HEAD MUST have advanced. If equal, the user never committed — re-issue the step 14 manual-commit instruction.
      • If expected_commit is false (V=0 retry): HEAD MUST equal the stored head. If HEAD moved, the user pulled or committed unrelated work during the override pause; halt with ⚠️ HEAD changed during the V=0 override pause. Retry cycle would review an expanded target. Restart the skill or revert the changes.
    • Working-tree cleanliness:
      • When expected_commit is true: git status --porcelain -- <cycle N-1's touched_files> MUST be empty (path-restricted to the fix set; staged/unstaged remnants of applied fixes block the cycle). Untracked files unrelated to the review_target are exempt.
      • When expected_commit is false (V=0 retry, no touched_files exists): git status --porcelain with no path restriction MUST be empty, excepting untracked files unrelated to the review_target. This is strictly wider than the expected_commit=true check because no commit was made — any change to tracked files during the override pause would expand the review target and invalidate the retry. On failure, halt with ⚠️ Working tree changed during the V=0 override pause. Retry cycle would review an expanded target. Restart the skill or revert the changes.
    • Commit-delta coverage (only when expected_commit is true): git diff --name-only <pre_pause_head>..HEAD -- <touched files> must be non-empty AND must cover every file in cycle N-1's applied_fixes[*].touched_files[] list. Any touched file missing from this delta means the user's commit did not include that file. A legitimate fix that reverts a file back to base is still a valid commit delta even though the file disappears from <base_sha>...HEAD — this variant catches that case because it queries the commit-delta range, not the branch-total range. Skipped entirely for V=0 retries (no commits to audit).
    • Cycle-commit ownership (warn-and-confirm) (only when expected_commit is true): compare the full commit-delta path list against cycle N-1's touched_files. Run git diff --name-only <pre_pause_head>..HEAD (no path restriction) and let committed_paths be that output. Paths in committed_paths that are NOT in cycle N-1's touched_files[] are unrelated — typically lint autofixes, typo repairs, or adjacent cleanups the user bundled into the cycle commit. Rather than abort (previous behavior, which was hostile to git commit -am usage), surface them via a single AskUserQuestion:
      • question: "Cycle N-1 commit includes <K> path(s) that Claude did not touch: <full path list>. These will be preserved by the terminal soft-reset and ship in the final squash. Keep them as part of this review's squash, or abort for amend-drop?"
      • options:
        • Keep (continue to cycle N) — record the extras in cycle_history[N-1].unrelated_commit_paths[] for the step-20 Part B audit to surface again at terminal reset. Proceed to cycle N.
        • Abort to amend — print Amend your cycle N-1 commit to drop the unrelated paths, then reply continue. and pause the skill like the manual-commit gate in step 14. Rationale: the hard-abort form of this check rejected normal developer workflows. Warn-and-confirm preserves the signal (user sees unrelated paths per-cycle) without blocking lint-fix-plus-cycle-fix commits. Skipped entirely for V=0 retries.

    On any mismatch of the bullets above, do NOT proceed. Print a compact explanation naming the specific check that failed and re-issue the step 14 manual-commit instruction (or the V=0 restart message). Wait for the user to correct the state and reply continue. Do not silently review stale state.

    Then:

    • variant == review:
      node "${CLAUDE_PLUGIN_ROOT}/scripts/codex-companion.mjs" review --wait <codex_scope_args>

      Capture stdout as free-form text.

    • variant == adversarial-review:
      node "${CLAUDE_PLUGIN_ROOT}/scripts/codex-companion.mjs" adversarial-review --wait --json <codex_scope_args> "<focus_text_with_context>"

      <focus_text_with_context> is the target-kind focus text from references/focus-text.md followed by the <review_context> block (see §Review Context Format). Parse stdout as JSON.

    • Parse-retry policy (adversarial only): if JSON parsing fails or any required field is missing (findings[], severity, file, line_start, title, recommendation), retry the exact same call once. A second failure aborts the cycle, surfaces codex's raw stdout verbatim to the user, and ends the skill.
  2. Extract findings and assign IDs F1..Fn.

    • adversarial-review: use findings[] as-is.
    • review: Claude manually slices the free-form output into finding blocks. Each block must have a title (first line of the block, verbatim), a recommendation (the action codex suggests, verbatim), a best-effort file, and line_start (resolve from context, leave null if codex did not cite a location). Findings without at least a title and a recommendation are dropped with a note in the summary.
  3. Run the validity check silently. For every finding, run the six items in references/validity-checklist.md without echoing the per-item trace to the user. Every item still requires Claude to Read the cited file internally — do not trust codex's body alone — but file reads and item-by-item reasoning are internal only. Assign each finding a three-value outcome: valid, partially-valid, or invalid. Record a short Claude's note (≤20 words) for every finding regardless of outcome — for valid findings, note the primary reason the finding is grounded (e.g. "confirmed by reading cited lines", "DoD required feature violation"); for partially-valid/invalid, note the rejection reason. When multiple findings cite the same file, issue a single Read call covering the union of cited ranges and reuse the result for every item-2/item-3/item-4 check — do not re-read the same region per finding.

    External-source rule (warning-only): external reads (dependency crate sources, standard library docs, upstream README) are allowed as background evidence for Claude's internal reasoning, but they MUST NOT flip the validity verdict. The verdict is always determined from the review diff itself plus what the finding claims. If an external read contradicts or confirms the finding, record it as Claude's note: background — <source>: <what it showed> without changing the outcome. The silent-trace rule still holds for validity determined solely from the diff — the background note is only emitted when Claude actually consulted an external source. This rule replaces an earlier "External-source exception" that allowed verdict-flipping with version-pinned sources; in practice Claude cannot reliably pin dependency versions, and the safe constraint is to forbid verdict-flipping entirely.

    No severity-based tiering: item 3 (premise matches artifact) is mandatory for every finding that could become selectable. Read tiering was considered (skip item 3 on medium/low) but rejected: self-consistency between title and recommendation does not prove the artifact actually has the claimed behavior. Skipping item 3 would let invalid medium/low findings reach the user-selection UI, which is exactly the silent-hallucination failure mode the validity check exists to catch. The Read cost (1 Read per unique cited file, shared across findings in that file via the union rule above) is acceptable; tiering's savings do not justify the safety weakening. 10a. Run scope triage via review-scope-guard. Invoke Skill(review-scope-guard) passing findings[] (with the validity outcomes already attached), the cached dod (pre-collected in step 7 when variant is adversarial; null on cycle 1 for review variant — the skill will collect it interactively then), the running rejected_ledger, cycle_history (for stop-signal evaluation), and review_target (already fully constructed in Phase 0 step 2 — pass it verbatim without re-deriving any field). Phase 0 step 2 guarantees review_target carries the full {scope, base_ref, base_sha, diff_command, diff_files, diff_numstat, commit_range, commit_messages[], diff_patch_excerpts} tuple; step 10a simply forwards it. Do not drop diff_patch_excerpts — scope-guard's proposal-mode evidence gate consumes it for working-tree targets where commit_messages[] is empty. The caller MUST pass review_target so scope-guard's proposal DoD mode has an authoritative source; without it, scope-guard falls back to interview mode (see scope-guard §Inputs). The skill returns a triage verdict per finding (must-fix / minimal-hygiene / reject-out-of-scope / reject-noise), an updated rejected_ledger, a set of active stop signals, and the collected dod (on cycle 1). Cache the DoD for later cycles. Store the triage verdicts alongside each finding for step 11. When DoD is missing, the skill still returns classifications inside the 4-category invariant (fall-through lands in minimal-hygiene); render the degraded-mode warning as documented in Failure Modes.

  4. Render the summary. Use the exact table format in §Summary Output Template. Every finding appears in the table, including invalid and reject-* ones. Every finding's recommendation field is quoted verbatim below the table (per §Summary Output Template). The active stop signals footer is rendered when (a) any signal has status ADVISORY/ACTIVE/WARNING, OR (b) any signal is not evaluated: metrics missing. Omit the footer only when every signal is truly silent. When the footer renders solely due to not evaluated rows, print a compact one-line notice — Not evaluated (metrics missing): <comma-separated signal names> — instead of the full signal table.

    Structurally-unevaluable compaction: subtract structurally_unevaluable_signal_names from the not_evaluated_signal_names set before rendering. The structurally-unevaluable names are shown once in cycle 1's footer as _Stop signals unavailable in codex-review-cycle integration: <names> (standalone invocation required for full 5-signal surface)._ and omitted from cycle 2+ footers entirely. This replaces the previous behavior where file-bloat / reactive-testing appeared in every cycle's Not evaluated list.

    Additionally, starting from cycle 2, compare the current-cycle not_evaluated_signal_names (taken from review-scope-guard's return value received in step 10a of the current cycle — NOT from cycle_history[current], which is only appended later in step 15) against cycle_history[N-1].not_evaluated_signal_names (the immediately previous cycle, not cycle 1) using the element-wise-equal semantics in review-scope-guard/references/stop-signals.md §Per-cycle suppression. Comparing against N-1 (not cycle 1) prevents flapping from being masked: a set that differs from cycle 1 → matches cycle 2 → differs from cycle 3 would otherwise be silently suppressed if only the cycle-1 baseline were checked. This ordering is required because step 11 runs before step 15 persists the current cycle's entry; reading cycle_history[current] at step 11 would read stale or empty state. If the two lists are equal, print _Not evaluated: unchanged from cycle N-1 — see cycle N-1 summary for signal list._ instead of re-listing the names. If they differ, re-render the full list AND add _Not evaluated delta vs cycle N-1: added=<names>, removed=<names>._ so the change is visible. The canonical order guarantees ordering-only differences cannot occur; guard for them anyway.

    Validity fleet-rate check (plan targets only, ≥5 findings): if the current cycle has ≥5 findings and 100% are classified valid, print a single-line calibration warning at the bottom of the summary: ⚠️ 100% valid rate with ≥5 findings is unusual for adversarial-review on plan targets. Re-scan for: (1) vague recommendations that should be 'partially-valid: vague', (2) already-handled premise that should be 'invalid: misread', (3) design-intent reversals that should route through scope triage as 'reject-out-of-scope' instead of being accepted as must-fix. This is a soft prompt, not a hard gate — the cycle proceeds normally. Raised threshold (was ≥3) and plan-only scope prevent false alarms on small focused diffs, where 3 valid findings is a normal outcome.

  5. Zero-valid check. Let V be the count of findings whose validity outcome is valid or partially-valid and whose scope category is must-fix or minimal-hygiene. reject-out-of-scope and reject-noise findings are never counted as selectable, even if their validity outcome was valid. If V == 0:

    • If N == 3 (final cycle), jump to Phase 2 Case A unconditionally — the cap has fired.
    • If variant == review (native), jump to Phase 2 Case A unconditionally. The V=0 override is not available for native review because the native review command accepts neither a focus-text argument nor a <review_context> block — there is no channel to deliver an <angle_request> instruction. Re-running the same command against the same diff would be a hidden no-op that still consumes one of the 3 cycles. The override is therefore scoped to variant == adversarial-review.
    • If N < 3 and variant == adversarial-review, issue a single AskUserQuestion before terminating:
      • question: "No selectable findings this cycle. Terminate the review, or run one more cycle with a different angle request?"
      • options (translate to user language per §Language):
        • Terminate now (Case A) — proceed to Phase 2 Case A.
        • Run cycle N+1 — proceed to the no-fix persist step below, then re-enter step 8 with N = N + 1. The next cycle's <review_context> carries a one-line <angle_request> element: <angle_request>Prior cycle produced 0 selectable findings. Try a materially different angle — e.g. a deeper root-cause pass, a different subsystem emphasis, or a scope that cuts across files not yet reviewed.</angle_request> inserted between <previous_fixes> and <rejected_findings>.

    No-fix cycle-history persist (before re-entering step 8): because V=0 means no selection and no fix phase, step 15's normal persistence never runs. Without explicit persistence here, the next cycle's <review_context> and cycle_history[1] reference would be stale or empty. Before returning to step 8, append an entry to cycle_history for the just-completed cycle with the following shape:

    • applied_fixes: [] (empty — no fix phase)
    • user_declined: [] (empty — no selection UI was opened)
    • skipped_for_scope: [] (empty)
    • claude_invalid: [] — populated from the current cycle's validity check (findings whose validity was invalid). This carries forward into the next cycle's <rejected_findings> via the normal union with the ledger.
    • not_evaluated_signal_names: the current-cycle return value from review-scope-guard step 11 (same value used for the step 11 footer comparison)
    • pre_pause_head: null for working-tree; git rev-parse HEAD otherwise (no branch/base-ref pause occurs on V=0 since there were no fixes to commit)
    • no_fix_cycle: true — explicit marker that this cycle had no fix phase. The cycle-N>1 preflight in step 8 consumes this marker to set expected_commit = false: HEAD is required to be UNCHANGED (HEAD == pre_pause_head), full working tree required clean (no path restriction), commit-delta and ownership checks skipped. See step 8 §Cycle-N>1 preflight for the full unified rule.

    Also persist the current rejected_ledger (which review-scope-guard already updated) for the next cycle's forwarding. This persistence is cheap (all buckets are empty except claude_invalid and not_evaluated_signal_names) but required for <review_context> correctness.

    The override path is bounded by the existing 3-cycle cap: requesting cycle N+1 from a V=0 state still consumes one of the 3 cycles. The user cannot escape the cap this way.

  6. Ask the user which findings to fix. Use AskUserQuestion(multiSelect: true) per §User Selection UI. Only findings with scope must-fix or minimal-hygiene appear as options (further filtered by validity to exclude invalid). reject-out-of-scope and reject-noise findings are never offered for selection — they live in the summary table for audit trail only. Always append a final None — skip all, end cycle option. 13.5. Fix-weight precheck (self-discipline gate). Before applying any selected finding, verify that the planned edit matches the finding's scope classification. This check runs silently — it adds no user-visible output unless a mismatch is detected.

    • must-fix allows multi-line edits, new sections, flow changes, and cross-file edits within the review diff.
    • minimal-hygiene allows only 1-line edits, a single short paragraph addition, or a 1-sentence rule insertion. Edits that exceed this envelope indicate the finding should have been classified must-fix, not hygiene, and the rest of the workflow would miscount it.
    • On mismatch (a minimal-hygiene finding whose planned fix exceeds the hygiene envelope): either (a) simplify the edit to hygiene-scope and apply, or (b) raise an AskUserQuestion asking the user whether to re-classify the finding as must-fix before proceeding. Do not silently apply a must-fix-weight edit to a minimal-hygiene finding.
    • reject-* findings must not trigger any edit — skip entirely.
    • Rationale: without this gate, minimal-hygiene findings can receive multi-line structural edits, recreating the over-engineering pattern the skill is designed to prevent. This gate forces the classification and the applied weight to match.
  7. Apply fixes. For each selected finding, Claude reads the cited lines, applies the fix via Edit or Write, and reports the resulting git diff for the touched files. No sync-sweep, no rescue delegation.

    Write-scope boundary: Claude edits only files present in review_target.diff_command output (plus untracked files for working-tree scope). If a finding's fix genuinely needs an out-of-diff file, skip the finding with a note Skipped: requires out-of-diff write. Out-of-diff writes are a scope expansion that must go through a separate skill invocation, not through the user-selection UI.

    How the fixes become visible to the next cycle depends on review_target.scope:

    • working-tree: fixes are left in the working tree. Cycle N+1's --scope working-tree review sees the staged + unstaged + untracked state directly. No commit is needed.
    • branch / base-ref: codex's branch diff is computed as <merge-base>..HEAD, so in-place edits are invisible until they land in a commit on HEAD. Claude does not commit on the user's behalf. Before printing the manual-commit instruction, record pre_pause_head = git rev-parse HEAD into cycle_history[current].pre_pause_head — the next cycle's preflight uses this anchor (plus the per-fix touched_files[] list from step 15) to verify the user's actual commit delta, not just worktree cleanliness. Then, after all selected fixes are applied this cycle, print a manual-commit instruction and pause the skill:
      Cycle N fixes applied to working tree. Branch/base-ref scopes require you to commit these changes before cycle N+1 can see them. Recommended commands:
        git add <touched files>
        git commit -m "review cycle N fixes"
      After committing, reply `continue` to proceed to cycle N+1. Reply `stop` to end the skill here.

      The user owns pre-commit hook outcomes, clean-index concerns, and rollback. If the user replies stop, end the skill in Case B-like state (applied fixes remain uncommitted in the working tree; the user can deal with them however they like). If the user replies continue, proceed to step 8's cycle-N>1 preflight which verifies git rev-parse HEAD has moved.

    Sibling-doc cascade check: when a fix changes a user-facing contract of the skill (adds a new side effect the skill did not previously have, changes a stated invariant, introduces a step that sibling docs describe as absent), Claude must in the same edit pass grep sibling docs (README.md, other SKILL.md sections, CHANGELOG entries for the current release) for claims describing the OLD behavior, and update every match. Specifically run rg -n '<characteristic phrase from old behavior>' . for at least one phrase, and either edit every hit or leave an explicit NOTE comment explaining why a mismatch is acceptable. Rationale: catching contract-breaking fixes in the same edit pass prevents silent contract breaks that would only surface in a later cycle.

  8. Update cycle history and ledger. Append to cycle_history an entry for this cycle recording:

    • applied_fixes[] — each entry records {fingerprint, title, file, line_start, scope_category, touched_files[]}. fingerprint is the stable {normalized_title, file, line_start, scope_category} tuple used by step 17's residual matcher. touched_files[] is the exact list of files Claude edited while applying the finding — the preflight in step 8 consumes this list to verify those files are visible in cycle N+1's branch diff.
    • user_declined[] — each entry records {fingerprint, title, file, line_start, scope_category} for must-fix/minimal-hygiene findings the user did not select (including the None — skip all case).
    • skipped_for_scope[] — each entry records {fingerprint, title, file, line_start, scope_category, reason} for findings the user selected but Claude skipped because their fix required an out-of-diff write (see step 14 Write-scope boundary). These count as unresolved at termination time — Case A lists them alongside user-declined carry-overs and must not claim clean resolution while the bucket is non-empty.
    • claude_invalid[] — each entry records {fingerprint, title, file, line_start, rejection_reason} for invalid findings from the validity check.
    • not_evaluated_signal_names[] — the ordered string array returned by review-scope-guard step 11. Stored verbatim, no mutation. Used by step 11's footer rendering in cycle N+1 to decide whether to suppress the not evaluated footnote.
    • unrelated_commit_paths[] — optional, populated only when the user chose Keep at the cycle-N>1 ownership gate. Lists paths from the cycle commit that were NOT in applied_fixes[*].touched_files[]. The step-20 Part B terminal audit consumes this list to display the unrelated paths one more time before the final squash, so the user can decide anew whether to include them in the final commit.

    All four buckets carry fingerprints so step 17's residual accounting matches on the stable {normalized_title, file, line_start, scope_category} tuple, not on title alone.

    The rejected_ledger returned by step 10a is already updated with reject-out-of-scope and reject-noise entries; persist it as-is for the next cycle. The next cycle's <review_context> <rejected_findings> block is populated from the union of ledger entries and claude_invalid only — not from user_declined[] or skipped_for_scope[]. Declines and out-of-diff skips are deferrals, not rejections: leaving them out of <rejected_findings> lets codex freely re-raise the same findings next cycle so the user can reconsider them. Termination-time accounting still tracks them as unresolved residuals (see step 17 Case A).

  9. Loop check.

    • N < 3: set N = N + 1, return to step 8.
    • N == 3: always jump to Phase 2 Case A. The Case A routing internally chooses between the clean-termination variant and the residual-carried-forward variant based on whether cycle_history[*].user_declined[] + cycle_history[*].skipped_for_scope[] leave any unresolved residuals (see step 17). Final-cycle user declines are handled by the residual variant, not by Case B — the user explicitly dispositioned each finding through the selection UI, which is an active close-out, not a cap failure.
    • Case B is reserved for an explicit cap-stop condition where the cycle could not run the user-selection UI to completion (e.g. the user interrupted mid-paging during an overflow batch, or the skill aborted before step 13). Normal 3-cycle completion with some user declines is Case A residual, not Case B.

Phase 2 — Termination

  1. Case A — normal termination. Compute the full residual set: scan cycle_history[*].user_declined[] and cycle_history[*].skipped_for_scope[] across all prior cycles. For each, compute a stable fingerprint {normalized_title, file, line_start, scope_category} (same format as review-scope-guard's ledger fingerprint — reuse that rule). A residual is "carried" if no later cycle's applied_fixes[] contains an entry with a matching fingerprint. Matching on title alone is forbidden because generic adversarial titles collide across unrelated findings and could silently clear a residual. If the carried residual set is empty, print All findings resolved after N cycle(s). — the clean-termination variant. Otherwise print Review cycle terminated after N cycle(s) with residuals carried forward. (never the "resolved" line) followed by User-declined valid findings carried to termination: and Out-of-diff skipped findings carried to termination: lists, with each entry showing <title> (<file>:<line_start>, declined in cycle N) so the user can audit. Either way, also print the mandatory ⚠️ No automated verification was run warning and the per-cycle applied fixes summary.
  2. Case B — cap reached. Print the template in §Termination Criteria Case B. Do not automatically start a fourth cycle. Tell the user they can re-invoke the skill to run another 3-cycle pass.

Review Context Format

Used only when variant == adversarial-review. The block is appended to the focus text argument with a single blank line between the two sections:

<review_context cycle="N">
  <intent><![CDATA[<one-sentence change intent from Phase 0 step 7 DoD item 1>]]></intent>
  <previous_fixes>
    <fix cycle="N-1"><![CDATA[<applied finding title + one-line change summary>]]></fix>
  </previous_fixes>
  <angle_request><![CDATA[<one sentence; present only when V=0 override fired in the previous cycle>]]></angle_request>
  <rejected_findings>
    <rejected cycle="N-1" reason="invalid: file not in diff"><![CDATA[<finding title>]]></rejected>
    <rejected cycle="N-1" reason="reject-out-of-scope: DoD explicit out-of-scope"><![CDATA[<finding title>]]></rejected>
  </rejected_findings>
</review_context>
  • <angle_request> element (optional): present only when the previous cycle terminated at step 12 V=0 and the user chose Run cycle N+1. Contains a single sentence asking codex to try a different angle. Omit the element entirely when absent.

Template note: this block never carries user-declined findings. A user decline is a deferral — codex should remain free to re-raise the same finding next cycle so the user can reconsider. If a template reader is tempted to add a <rejected reason="user declined"> element, stop: that would let declined valid findings disappear from subsequent cycles and make Case A falsely claim resolution.

Rules:

  • Cycle 1 carries <intent> (populated from Phase 0 step 7 DoD item 1 pre-collection); <previous_fixes> and <rejected_findings> are empty.
  • <review_context> is preceded by this literal instruction, on its own line: Do not re-report findings in <rejected_findings> unless you have a materially different angle.
  • Every user-facing string inside <!-- CDATA --> is quoted as-is. No JSON encoding. No HTML entity escaping. The CDATA wrapper keeps any <, >, & in codex output from terminating the block.
  • This skill does not use a separate skip ledger. <review_context> is the only cross-cycle carry.
  • <previous_fixes> window: the block carries only the immediately prior cycle (N-1), not a cumulative history. Cycle 3's <review_context> contains the 5 fixes from cycle 2; it does NOT also enumerate cycle 1's fixes. Each <fix> element uses the compact form <fix cycle="N-1" category="must-fix|minimal-hygiene"><![CDATA[<title>: <≤40 word summary>]]></fix> — summaries longer than 40 words are forbidden. Codex only needs the latest ground truth for cross-cycle suppression; older history would inflate the context block without improving review quality. V=0 exception: when cycle_history[N-1].no_fix_cycle == true (prior cycle was a V=0 override retry and emitted no fixes), cycle N's <previous_fixes> skips the empty cycle N-1 and carries fixes from cycle N-2 instead. Without this exception, codex would lose context of cycle 1's applied fixes when cycle 2 was V=0 no-fix, causing re-surfacing of already-fixed findings in cycle 3.
  • <rejected_findings> sources: the block aggregates two kinds of prior-cycle rejections — (1) entries in the rejected_ledger returned by review-scope-guard (scope-triage rejections: reject-out-of-scope / reject-noise), and (2) claude_invalid[] from the prior cycle's validity check. Each rejection renders as its own <rejected> element with the reason attribute carrying the original category and rationale (e.g. reason="reject-out-of-scope: DoD explicit out-of-scope", reason="invalid: file not in diff"). Ledger entries with count >= 2 render with an extra hint: reason="reject-noise: already-rejected (count=N)" so codex sees how persistent the complaint is. User-declined findings are NOT included — a decline is a deferral, not a rejection, and codex is free to re-raise the same finding in the next cycle so the user can reconsider it.

Validity Check Summary

Full details live in references/validity-checklist.md. The six items are:

  1. File exists in the difffinding.file appears in the output of review_target.diff_command (plus git ls-files --others --exclude-standard when review_target.scope == working-tree).
  2. Line range existsfinding.line_start is within the current file length; flag shifted ranges as partially-valid.
  3. Premise matches artifact — Claude reads the cited lines and confirms codex's assertion.
  4. Scopeline_start..line_end overlaps a changed hunk in the scope-appropriate diff (git diff HEAD -- <file> for working-tree; git diff <base_sha>...HEAD -- <file> for branch / base-ref, using the frozen Phase-0 SHA), not unchanged code in a touched file.
  5. Recommendation concreteness — a specific failure mode is named, not a vague "consider…".
  6. Target-kind consistency — plan cycles reject detailed-design nitpicks on .md/.markdown/.txt files.

Outcome: valid (all pass), partially-valid (items 2 or 5 returned partially-valid, no invalid), invalid (any of items 1, 3, 4, 6 returned invalid).

Summary Output Template

Language reinforcement: the template below uses English for readability of the SKILL.md spec itself. When rendering actual output, translate ALL non-verbatim elements to the user's language per §Language: section headers, column headers (except Title (codex verbatim)), Claude's note content, Recommended action values, the recommendation block heading, stop-signal footer text, and termination messages. Only codex's title and recommendation fields stay in their original language (they are contractually verbatim).

Render after every cycle, before the user selection prompt:

### Cycle N review summary (variant: <review|adversarial-review>, target: <code|plan>)

| ID | Severity | File:Line | Title (codex verbatim) | Validity | Scope | Claude's note | Recommended action |
|----|----------|-----------|------------------------|----------|-------|---------------|--------------------|
| F1 | high     | src/auth/login.ts:42 | Missing null check on userId    | valid            | must-fix            | DoD required features; core correctness        | Apply fix            |
| F2 | medium   | src/api/user.ts:88   | Consider adding retry logic     | partially-valid  | reject-noise        | vague, no concrete failure mode                 | Skip                 |
| F3 | low      | docs/plan.md:15      | Rename process to handler       | invalid          | reject-noise        | detailed-design on plan target                  | Skip                 |
| F4 | medium   | src/curl.rs:130      | --url-query value leaks to URL  | valid            | minimal-hygiene     | value consume + warn; semantics NOT implemented | Apply 1-line hygiene |
| F5 | medium   | src/curl.rs:120      | Implement --json shorthand body | valid            | reject-out-of-scope | DoD explicit out-of-scope: cURL 7.82+ new       | Skip (ledger fwd)    |

**Recommendation (per finding)**:

- **F1**: <codex recommendation verbatim>
- **F2**: <codex recommendation verbatim>
...

Quote every finding's `recommendation` field verbatim below the table. Do not skip quoting even when the title seems to imply the recommendation — the user needs the full recommendation text to make an informed fix/decline decision without reading the raw codex JSON.

**Active stop signals** (footer rendered when ≥1 signal is `ADVISORY`/`ACTIVE`/`WARNING` **or** `not evaluated: metrics missing`; omit entirely only when all signals are truly `silent`. When only `not evaluated` rows exist, replace the full table with a compact one-liner `Not evaluated (metrics missing): <names>`):

| Signal | Status | Evidence |
|--------|--------|----------|
| ...    | ...    | ...      |

Format rules that protect finding intent

  • The Title (codex verbatim) column must contain codex's title field exactly. No paraphrase, no shortening, no translation.
  • The Recommendation (per finding) block must contain each finding's full recommendation field verbatim, regardless of length. Never truncate, summarize, or abbreviate — the user needs the complete remediation text to make an informed fix/decline decision.
  • Claude's interpretation lives only in the Claude's note column and the Recommended action column. Do not edit any other column based on what Claude thinks the finding "really means".
  • If Claude judges a finding invalid, the row still appears in the table with the original title and recommendation. The Claude's note column then carries invalid because <reason>.
  • If review-scope-guard classifies a finding as reject-out-of-scope or reject-noise, the row still appears in the table for audit. The Scope column carries the category and Claude's note carries the triage rationale verbatim from the skill's output.
  • Severity values come from codex. Do not upgrade or downgrade severity based on Claude's validity or scope verdict.

User Selection UI

Language reinforcement: AskUserQuestion question, header, and option label/description fields must be in the user's language per §Language. Codex verbatim titles embedded in labels stay in their original language.

Use AskUserQuestion with multiSelect: true. Only findings whose scope is must-fix or minimal-hygiene AND whose validity is valid or partially-valid appear as options. invalid, reject-out-of-scope, and reject-noise findings are never selectable — the user sees them in the summary table above for audit trail only.

minimal-hygiene options include a (hygiene) marker in the label so the user knows the expected fix is 1-line value consume + warn, not a full implementation.

Base layout. Token rule: each option's description field must carry only the finding's file:line — nothing else. The label already encodes the title, severity, and scope; the summary table above already carries rationale and Claude's note. Repeating any of that in the description is wasted context.

question: "Which findings should I address in cycle N?"
header: "Cycle N fixes"
multiSelect: true
options:
  - { label: "F1: Missing null check on userId (high, must-fix)",            description: "src/auth/login.ts:42" }
  - { label: "F4: --url-query value leaks to URL (medium, hygiene)",         description: "src/curl.rs:130" }
  - { label: "None — skip all, end cycle",                                   description: "End this cycle" }

Overflow handling (more than 3 selectable findings per severity)

AskUserQuestion accepts maximum 4 options per question; reserve one for None — end cycle, leaving 3 finding slots per question. When a severity bucket has more than 3 selectable findings, issue multiple sequential AskUserQuestion calls (3 findings each) in severity order until every selectable finding has been surfaced. No finding may be silently deferred just because it did not fit on a page — the fix phase does not begin until every selectable finding has been shown to the user and either applied or declined.

Termination Criteria

Language reinforcement: the templates below are in English for spec readability. Actual output must be in the user's language per §Language. Translate all headings, messages, and labels; keep codex verbatim titles and technical identifiers (must-fix, minimal-hygiene, file paths) as-is.

Case A — V == 0 (normal termination):

When the residual set (carried user-declined + carried out-of-diff skipped) is empty:

All findings resolved after N cycle(s).

⚠️ No automated verification was run. This skill never executes tests, lints, builds, or any verification command on behalf of the user. The "resolved" claim only means "codex returned zero selectable findings this cycle and no residuals were carried from prior cycles". Before shipping, review the applied diff and run your own verification (test suite, type check, lint, build, manual smoke) as appropriate for the change.

Applied fixes by cycle:
- Cycle 1: <list of finding titles or "none">
- Cycle 2: <list or "none">
- Cycle 3: <list or "none">

When any residuals exist (declined carry-overs, out-of-diff skips, or final-cycle declines), swap the opening line and list the residuals — do NOT print "All findings resolved":

Review cycle terminated after N cycle(s) with residuals carried forward.

⚠️ No automated verification was run. See the clean-termination variant above for rationale.

Applied fixes by cycle:
- Cycle 1: <list of finding titles or "none">
- Cycle 2: <list or "none">
- Cycle 3: <list or "none">

User-declined valid findings carried to termination: <titles from cycle_history[*].user_declined[] that never appear in a later cycle's applied_fixes[], or "none">
Out-of-diff skipped findings carried to termination: <titles from cycle_history[*].skipped_for_scope[] that never appear in a later cycle's applied_fixes[], or "none">

Case B — 3 cycles complete with unresolved valid findings:

## Review cycle terminated — cap reached

- Cycles run: 3 / 3
- Findings applied: <count>
- Findings still valid and unresolved at cap: <count>

⚠️ No automated verification was run on the applied fixes — see Case A for rationale.

### Unresolved valid findings

<Summary Output Template table, filtered to valid/partially-valid findings that were never applied>

### Next steps

- Re-run `codex-review-cycle` after further work, or
- Address the unresolved findings manually, or
- Explicitly accept them as known residuals.

The skill never advances to a fourth cycle. The user must invoke the skill again to continue.

  1. Review assessment. After printing Case A or Case B output, render a concise review assessment block in the user's language (per §Language) to help the user decide whether to re-invoke the skill or move on:

    ## Review assessment
    
    **Trend**: <1 sentence — e.g. "converging (5 → 4 → 3, severity shift from high to medium)", "stable (structural gaps in each cycle)", "cascading (cycle N fixes created cycle N+1 findings)">
    
    **Character**: <1 sentence — e.g. "mostly state-model gaps", "edge cases and design-philosophy arguments", "doc/wording consistency issues">
    
    **Clusters** (optional — render only when ≥2 **rejected-ledger** entries share a `cluster_id`): `<cluster_id>`: <N> ledger entries across <M> cycle(s) (see ledger entries L<i>, L<j>, ...). Emit at most 3 cluster lines, sorted by finding count descending. If no cluster has ≥2 members, omit the line entirely. **Scope limitation**: cluster accounting is intentionally limited to rejected-ledger entries because only those carry `cluster_id` (see `review-scope-guard` Phase 3 step 9 assignment rule). Applied-fix findings do not participate in cluster summary; extending the carrier to applied fixes is deliberately deferred to avoid inconsistent partial counts.
    
    **Recommendation**: <"continue reviewing" | "stop and audit scope" | "move to next work" with 1-sentence rationale. Determined from recorded state only:
    - If any `must-fix` or `minimal-hygiene` residual was carried to termination → "address residuals before shipping"
    - If any stop signal is `ACTIVE` or `WARNING` → "stop and audit scope" (aligns with review-scope-guard's stop-signal contract: ACTIVE/WARNING means diminishing returns or scope drift, not a reason to run more cycles)
    - If clean termination (no residuals) AND finding count decreased across cycles AND no stop signal tripped → "move to next work"
    - Otherwise → "continue reviewing" (default-safe)>
    
    **Suggested next action**: <concrete 1-line action — e.g. "squash and merge to main", "run 1-cycle working-tree dogfood on the applied fixes", "address the 2 carried residuals manually before merging">

    This block is advisory — it does not gate any action. Keep each part to one sentence; do not re-list findings or repeat the termination summary.

  2. Soft-reset temporary cycle commits (branch / base-ref only). During the review run, the user created one commit per cycle at Claude's request (step 14 manual-commit pause). These are intermediate review-cycle artifacts, not the user's intended final commit. To keep the applied code changes while removing the intermediate commit history:

    • If review_target.scope == working-tree or no cycle commits were created, skip this step silently.

    • Terminal-cycle verification: before resetting, verify the final cycle's applied fixes were actually committed. Run git status --porcelain -- <final cycle's touched_files>. If any files have uncommitted changes, print ⚠️ Final cycle has uncommitted applied fixes (<file list>). Soft-reset will NOT stage these — only committed changes become staged after reset. Commit them first, or they will be lost from the staged state. and skip the reset with a manual-squash fallback: git reset --soft <pre_cycle_1_head>.

    • Retrieve pre_cycle_1_head from Phase 0 step 7 and record the current HEAD as final_head.

    • Dirty-state audit (pre-preview): before preview, confirm no non-cycle-owned state would be staged by the reset. Compute cycle_owned_files = union of cycle_history[*].applied_fixes[*].touched_files[], then:

      • Part A (uncommitted): run git status --porcelain and inspect every entry:

        • Entries that refer to files outside cycle_owned_files are unrelated uncommitted state that would survive git reset --soft.
        • Entries that refer to files inside cycle_owned_files are also blocking unless they are the final cycle's applied-fix files that the Terminal-cycle verification above already cleared. Any staged/unstaged edit on an earlier-cycle cycle-owned file (or on a final-cycle file that the Terminal verification failed on) bypasses git reset --soft — soft-reset preserves the index and working tree but will NOT stage an unstaged edit, so the workflow's "all applied fixes are staged" claim becomes false. Surface every such entry in the abort output below, including staged vs unstaged status, so the user can commit/stash before re-running.
      • Part B (committed-range ownership): run git diff --name-only <pre_cycle_1_head>..<final_head> and compare against cycle_owned_files. Any path in the committed delta that is NOT in cycle_owned_files is an unrelated file the user accidentally included in a cycle commit; git reset --soft will stage it into the final squash without the preview flagging it (the preview only shows --stat, which lists filenames but does not cross-check ownership). Part B catches what Part A cannot: unrelated work already committed into the cycle range. Paths present in cycle_history[*].unrelated_commit_paths[] (user-approved during cycle-N>1 ownership gate) are NOT treated as abort-worthy at Part B — they already got a user decision. Part B surfaces them in the preview output with a (user-approved unrelated) tag so the final squash commit accurately reflects what is being shipped.

      • If either Part A or Part B reports entries outside cycle_owned_files, print the following and abort the soft-reset entirely:

        ⚠️ State outside the cycle-owned files detected. `git reset --soft <pre_cycle_1_head>` would preserve this into the final staged index, mixing unrelated work into what looks like a "cycle-only" squash.
        
        State leaking into the soft-reset target:
        <list every entry with its source tag: [A-uncommitted] / [A-dirty-owned] / [B-committed] — one per line; if all three categories are empty this entire abort block is not printed>
        
        Resolve before continuing:
         - [A-uncommitted] paths: commit, stash, or clean them.
         - [A-dirty-owned] paths: commit or clean the staged/unstaged remnants on cycle-owned files.
         - [B-committed] paths: amend the offending cycle commit to drop the unrelated file, OR rewrite the commit range to exclude it, OR explicitly include them in a manual post-reset commit by running `git reset --soft <pre_cycle_1_head>` yourself after clearing [A-*].
        Re-invoke the skill after the range is cycle-clean.
      • Stop the skill here (do NOT proceed to preview or reset) until the user fixes the state and re-invokes. The soft-reset preview gate gives false confidence if the preview shows only the cycle diff while the actual post-reset staged index would contain unrelated work (uncommitted OR committed).

    • Soft-reset preview (confirmation gate): only reached when the dirty-state audit passes. Before running git reset --soft, show the user what will be squashed. Run git log --oneline <pre_cycle_1_head>..<final_head> to list the cycle commits, and git diff --stat <pre_cycle_1_head>..<final_head> to show the cumulative change. Print:

      About to soft-reset <N> cycle commit(s) onto <pre_cycle_1_head>.
      
      Why squash: the cycle commits (`review cycle 1 fixes`, `review cycle 2 fixes`, ...) are intermediate artifacts of the review loop. Most users want a single final commit that represents the applied change; soft-reset preserves every line of every cycle fix in the staged index, so you can write the final commit message yourself. Declining keeps the cycle commits in place if you prefer their granular history.
      
      After reset, all changes below are staged in the index (working tree unchanged). You create your own commit message.
      
      Commits to be collapsed:
      <output of git log --oneline ...>
      
      Cumulative change (will be staged):
      <output of git diff --stat ...>

      Approved-unrelated paths notice (only when cycle_history[*].unrelated_commit_paths[] is non-empty): before the main reset prompt, display an informational section:

      Approved unrelated paths from earlier cycles (user-tagged during cycle-N>1 ownership gate):
        - <path> (approved in cycle N)
        - ...
      These files are NOT in any applied fix's touched_files. They were bundled into cycle commits and will be preserved by the soft-reset into the final staged index. If you changed your mind about including them, decline the next prompt and amend the cycle commits manually.

      This is a display-only notice, not an AskUserQuestion — the user already tagged these paths as approved during the cycle-N>1 ownership gate. The main reset prompt below is the opportunity to back out.

      Then issue the main reset AskUserQuestion:

      • question: "Collapse these N cycle commit(s) into a staged index, ready for your commit?"
      • options:
        • Yes, soft-reset now — proceed to the next bullet.
        • No, leave cycle commits as-is — skip the reset; print Cycle commits left in place. Squash manually withgit reset --soft <pre_cycle_1_head>if desired. and end the skill.
    • Run git reset --soft <pre_cycle_1_head>. This removes all intermediate cycle commits from HEAD but leaves the accumulated changes staged in the index. The user's working tree is unchanged.

    • Print:

      Soft-reset: <N> temporary cycle commit(s) (<pre_cycle_1_head>..<final_head>) removed.
      All applied fixes are staged in the index. Create your own commit:
        git commit -m "<your message>"

Preconditions Recap

  • Git CLI available on PATH.
  • Current working directory inside a git repository.
  • The chosen review target produces a non-empty diff: either uncommitted changes exist (working-tree), or HEAD is ahead of the auto-detected default branch (branch), or the user-supplied ref exists and <ref>...HEAD is non-empty (base-ref).
  • Codex plugin installed and Skill(codex:setup) reports a ready state.
  • review-scope-guard skill available (invoked at step 10a for scope triage and DoD collection).
  • Both codex-review-cycle and review-scope-guard are registered with the Claude Code harness. If not (e.g. during local development before marketplace publication), follow the SKILL.md steps manually — every step is self-contained.

Failure Modes

  • Codex CLI missing or setup incomplete — stop in Phase 0 step 4. Tell the user to install the codex plugin or run /codex:setup.
  • Default branch not detected (scope = branch) — stop in Phase 0 step 2 with guidance to re-run with scope = base-ref and an explicit ref.
  • User-supplied ref not found (scope = base-ref) — stop in Phase 0 step 2 with Base ref '<ref>' not found in this repository.
  • JSON parse failure (adversarial) — retry once; a second failure aborts the cycle with codex's raw stdout surfaced verbatim.
  • File cited by codex no longer exists — item 1 of the validity check returns invalid: file not in diff. The finding is listed in the summary but not selectable.
  • User has no working-tree diff after a cycle's fixes are applied (scope = working-tree) — continue to the next cycle anyway (the next review will see the committed state). Do not silently skip cycles. For branch / base-ref scopes the diff is against a committed base, so in-cycle fixes never empty the diff.
  • User declines every finding across all 3 cycles — terminate in Case A with the user-declined message, not Case B. The cap did not fire; the user actively closed the loop.
  • User declines the DoD interview in cycle 1 step 7 (adversarial) or step 10a (review)review-scope-guard stays inside the 4-category invariant: fall-through findings still classify as minimal-hygiene, and ledger/vague findings still classify as reject-noise. No 5th unclassified bucket is created. The summary table footer prints ⚠️ DoD not collected — scope triage degraded. Review each selectable finding manually before applying; the minimal-hygiene fall-through is weaker than a DoD-anchored classification. The user is the last line of defense in this degraded mode.
  • Stop signal ACTIVE or WARNING during cycles 1-2 — print the recommendation in the summary but do not auto-stop. The cycle cap still governs termination.
  • User chooses Run cycle N+1 from a V=0 state but codex again returns 0 selectable findings — the next V=0 offer is still issued per step 12 (adversarial-review variant only); the user can choose to terminate or burn another cycle. The cap still governs. Do not suppress the offer just because it fired before.
  • V=0 fires under variant == review — the override path is unavailable; skip directly to Phase 2 Case A as documented in step 12. The summary row for the cycle still renders, and the final Review assessment should note "V=0 under native review — override disabled, see step 12" so the user understands why no cycle N+1 offer appeared.
  • no_fix_cycle: true entry is internally inconsistent — corruption is defined by same-entry contradiction, not by comparison with earlier cycles. A valid applied-then-V=0-retry sequence (cycle 1: applied_fixes non-emptycycle 2: no_fix_cycle=true, applied_fixes=[]cycle 3: uses cycle-2 marker to exempt preflight) must be honored — cycle 2 having a no-fix marker while cycle 1 had fixes is NORMAL. Treat the marker as corrupted ONLY when the same entry that carries no_fix_cycle: true also has non-empty applied_fixes[], user_declined[], or skipped_for_scope[]. In that (truly contradictory) case, print ⚠️ Inconsistent no_fix_cycle marker on cycle N-1 (marker true but the same entry has applied/declined/skipped entries). Running full preflight. and run the full preflight ignoring the marker. This is defense-in-depth against a corrupted state writer; normal applied-then-V=0 flow is untouched.
  • Conversation context is lost mid-run (e.g. compaction, tab close, long idle) — the skill's state (cycle_history, rejected_ledger, review_target, dod) lives only in the active conversation. If context is truncated or the session resets, the in-flight run CANNOT be resumed automatically. Recovery steps: (1) if any cycle commits exist on branch / base-ref scope, the user may squash them manually with git reset --soft <pre_cycle_1_head> from git reflog; (2) if applied fixes sit uncommitted on working-tree scope, they stay in place and the user commits normally; (3) restart the skill from Phase 0 on the current state — the new run does NOT know about prior cycles' rejected_ledger, so codex may re-raise findings that the earlier run rejected as noise. State persistence across session breaks is deferred to a separate plan; this bullet documents the current fallback.
  • User wants to cancel the skill mid-cycle — at any AskUserQuestion prompt, the user can type a message indicating cancellation (e.g. "stop", "cancel", "abort"); Claude treats this as an early termination request. The current cycle's state is preserved as-is (no auto-rollback of applied fixes; no auto-commit). Claude prints a short summary: "Skill cancelled at cycle N step M. Applied fixes in this session: <list>. Remaining state: <working-tree dirty | N cycle commits on <branch>>. Manual cleanup may be needed depending on your preferences (git stash, git reset, amend, etc.)." The skill does NOT attempt any destructive cleanup on behalf of the user. Between-prompts cancellation (user Ctrl-C or tab close without an active prompt) falls under the "Conversation context is lost mid-run" bullet.

References

  • references/focus-text.md — target-kind detection and the canonical code/plan focus text.
  • references/validity-checklist.md — full details of the six validity items.
  • references/summary-samples.ja.md — 日本語で render する場合の summary table / stop signal footer / 終了メッセージ例。
  • skills/review-scope-guard/SKILL.md — scope triage skill invoked at step 10a (DoD collection, 4-category triage, rejected ledger, stop signals).