codex-review-cycle
Run a bounded 3-cycle interactive review-and-fix workflow against a user-chosen git review target (working-tree diff, current branch vs. auto-detected base, or an explicit commit/tag/branch ref) using the codex plugin. Each cycle invokes codex `review` or `adversarial-review --json`; Claude verifies each finding against a six-item validity checklist, calls `review-scope-guard` for Definition-of-Done triage, and the user picks which findings to fix before the next cycle. Covers both code diffs and markdown planning documents. Hard cap at 3 cycles. Use ONLY when the user explicitly asks to run the codex review cycle on working-tree changes, a committed branch diff, or an explicit base ref. Do NOT trigger for single-shot review requests, auto-hardening, background reviews, plan drafting, or when the chosen target would produce an empty diff.
下記のコマンドをコピーしてターミナル(Mac/Linux)または PowerShell(Windows)に貼り付けてください。 ダウンロード → 解凍 → 配置まで全自動。
mkdir -p ~/.claude/skills && cd ~/.claude/skills && curl -L -o codex-review-cycle.zip https://jpskill.com/download/9631.zip && unzip -o codex-review-cycle.zip && rm codex-review-cycle.zip
$d = "$env:USERPROFILE\.claude\skills"; ni -Force -ItemType Directory $d | Out-Null; iwr https://jpskill.com/download/9631.zip -OutFile "$d\codex-review-cycle.zip"; Expand-Archive "$d\codex-review-cycle.zip" -DestinationPath $d -Force; ri "$d\codex-review-cycle.zip"
完了後、Claude Code を再起動 → 普通に「動画プロンプト作って」のように話しかけるだけで自動発動します。
💾 手動でダウンロードしたい(コマンドが難しい人向け)
- 1. 下の青いボタンを押して
codex-review-cycle.zipをダウンロード - 2. ZIPファイルをダブルクリックで解凍 →
codex-review-cycleフォルダができる - 3. そのフォルダを
C:\Users\あなたの名前\.claude\skills\(Win)または~/.claude/skills/(Mac)へ移動 - 4. Claude Code を再起動
⚠️ ダウンロード・利用は自己責任でお願いします。当サイトは内容・動作・安全性について責任を負いません。
🎯 このSkillでできること
下記の説明文を読むと、このSkillがあなたに何をしてくれるかが分かります。Claudeにこの分野の依頼をすると、自動で発動します。
📦 インストール方法 (3ステップ)
- 1. 上の「ダウンロード」ボタンを押して .skill ファイルを取得
- 2. ファイル名の拡張子を .skill から .zip に変えて展開(macは自動展開可)
- 3. 展開してできたフォルダを、ホームフォルダの
.claude/skills/に置く- · macOS / Linux:
~/.claude/skills/ - · Windows:
%USERPROFILE%\.claude\skills\
- · macOS / Linux:
Claude Code を再起動すれば完了。「このSkillを使って…」と話しかけなくても、関連する依頼で自動的に呼び出されます。
詳しい使い方ガイドを見る →- 最終更新
- 2026-05-18
- 取得日時
- 2026-05-18
- 同梱ファイル
- 1
📖 Claude が読む原文 SKILL.md(中身を展開)
この本文は AI(Claude)が読むための原文(英語または中国語)です。日本語訳は順次追加中。
Codex Review Cycle
Overview
A simple, user-driven review-and-fix workflow. Every cycle runs one codex review, Claude verifies each finding's validity, presents a verbatim summary, and the user picks which findings to address. Claude then applies only the chosen fixes and loops. Three cycles is a hard cap — the loop never runs a fourth cycle without the user starting a new invocation.
The skill is deliberately simple. It does not auto-fix, does not run parallel reviewers, does not compute stall fingerprints, does not manage autonomy bands, and does not delegate to rescue subagents. Claude is the only fix applier; the user is the only arbiter of which findings matter.
Language
All user-facing output is rendered in the user's language (the language the user has been using in the conversation, or as configured in the Claude Code system-level language setting). This section is the authoritative translation contract — any per-language sample reference (e.g. references/summary-samples.ja.md) is illustrative only and MUST NOT contradict these rules.
Translate into the user's language:
- Section headings and column labels (
Claude's note,Recommended action,Scope,Severity, etc. column header text) - Free-text fields Claude authors:
Claude's notebody,Recommended actionvalues, fleet-rate warnings, stop-signal footer prose, termination messages, post-cycle review assessment AskUserQuestionquestion,header, and optionlabel/descriptionfields
Keep verbatim (do NOT translate), regardless of user language:
- Codex
titlefield (surfaced in theTitle (codex verbatim)column) - Codex
recommendationfield (quoted below the table per §Summary Output Template) - Severity values (
high/medium/low) — codex output - Validity outcome keywords (
valid/partially-valid/invalid) - Scope category names (
must-fix/minimal-hygiene/reject-out-of-scope/reject-noise) - Stop-signal
Statuskeywords (ACTIVE/ADVISORY/WARNING/silent) - Technical identifiers: file paths, git refs (SHAs, branch names),
fingerprint,cluster_id, field names likeapplied_fixes,not_evaluated_signal_names - Cycle indices (
cycle N,N/3)
For a Japanese rendering example that applies these rules, see references/summary-samples.ja.md. For German, Korean, or other languages, apply the same rules directly — the Japanese sample is an illustration, not a template to translate.
When to Use
Use this skill ONLY when:
- The user explicitly asks to run the codex review cycle on one of: the current working-tree diff, the current branch vs. its base branch, or an explicit commit/tag/branch ref, and
- The current working directory is a git repository, and
- The chosen review target produces a non-empty diff (working tree has uncommitted changes, or HEAD is ahead of the chosen base ref).
Do NOT use this skill when:
- The user wants a single codex review pass — use the codex plugin directly.
- The resolved review target produces an empty diff — stop and tell the user.
- The user is drafting a plan from scratch — use
vibe-planning-guardinstead. - A review is running as a background or automatic check.
Review Target Modes
The skill supports three review targets, chosen once at Phase 0 and fixed for all three cycles:
working-tree— uncommitted changes (tracked-modified + staged + untracked). Codex is invoked with--scope working-tree. Claude-side diff commands usegit diff HEAD --name-onlyplusgit ls-files --others --exclude-standardfor untracked files.branch— HEAD vs. the auto-detected default branch (tries localmain/master/trunkfirst, thenorigin/*). Codex is invoked with--base <base_sha>(frozen SHA resolved from the detected ref at Phase 0). Claude-side diff commands usegit diff --name-only <base_sha>...HEAD(triple-dot: merge-base semantics).base-ref— HEAD vs. an explicit ref the user supplies (any commit SHA, tag, or branch name). Codex is invoked with--base <base_sha>(frozen SHA resolved from the user-supplied ref at Phase 0). Claude-side diff commands usegit diff --name-only <base_sha>...HEAD.
The three modes share the same workflow — only the Phase 0 target-resolution step, the codex CLI flags, and the diff command Claude uses for validity checks differ.
Target Kinds
The skill auto-detects code vs plan from the diff file extensions. See references/focus-text.md for the detection rules. Mixed targets (any code file present) are treated as code; item 6 of references/validity-checklist.md filters out detailed-design findings on markdown files inside a code cycle.
Review Variant Selection
At Phase 0, ask the user once which codex review variant to use for all three cycles:
review— codex's native review command. Output is free-form text. Claude manually structures each finding section intotitle/recommendation/bodybefore running the validity check.adversarial-review— codex's adversarial review with--json. Output is structuredfindings[]. Each element hasseverity,file,line_start,title,recommendation,body. Adversarial cycles also carry a<review_context>block (see §Review Context Format) so codex keeps the same angle across cycles.
The choice is fixed for the whole loop. If the user wants to switch variants, they must restart the skill.
Recommendation: use adversarial-review unless there is a specific reason not to. The review variant is retained for environments where structured JSON output is unavailable or for users who specifically want free-form codex output, but it operates in a minimum-functionality mode: no <review_context> carry across cycles, no proposal-mode DoD (interview only), no V=0 cycle-N+1 override, no rejected-findings forwarding. Expect to get a single-shot review that Claude structures manually, with no adaptation between cycles. For multi-cycle review-and-fix workflows, adversarial-review is strictly the better path.
Workflow
Phase 0 — Preflight (runs once)
-
Verify git repository.
Bash:git rev-parse --is-inside-work-tree— stop if not inside a git repo.
-
Resolve the review target. Ask the user once, via
AskUserQuestion, which review scope this cycle should cover. Offer three options:working-tree— review the uncommitted diff (tracked-modified + staged + untracked).branch— review HEAD vs. the auto-detected default branch.base-ref— review HEAD vs. an explicit ref the user provides.
After the scope choice:
working-tree: no additional input needed.branch: auto-detect the default branch by tryingmain,master,trunkas local refs viagit show-ref --verify --quiet refs/heads/<name>, then falling back toorigin/<name>. If none resolve, stop withCould not auto-detect a base branch. Re-run with scope = base-ref and supply a ref explicitly.base-ref: ask a follow-up free-formAskUserQuestionfor the ref string. Validate it withgit rev-parse --verify <ref>— if the command fails, stop withBase ref '<ref>' not found in this repository.
Store the result as
review_target. Fully construct the object in Phase 0 so proposal DoD mode in step 7 has the data it needs:scope— one ofworking-tree,branch,base-ref.base_ref—nullforworking-tree; the resolved ref string forbranchandbase-ref(kept as display metadata only).base_sha—nullforworking-tree; forbranch/base-ref, the immutable commit SHA resolved frombase_refat Phase 0 viagit rev-parse <base_ref>. All subsequent commands across all 3 cycles (codex--base, diff commands, commit-range enumeration, validity-check scope diff, Part B ownership audit, soft-reset anchor) usebase_sha, neverbase_ref, so a mutable ref (e.g.main,origin/main) advancing mid-run cannot drift the review target. Ifbase_ref != base_shaat any later check (user manually updated the ref), print a one-line warningBase ref '<base_ref>' moved from <base_sha> during the run; continuing against the frozen SHA.and proceed.diff_command— the exactgit diff --name-only …command Claude will reuse for target-kind detection and validity checks:working-tree→git diff HEAD --name-only(paired withgit ls-files --others --exclude-standardfor untracked files)branch/base-ref→git diff --name-only <base_sha>...HEAD(triple-dot: merge-base semantics; uses the frozen SHA, not the mutablebase_ref, so target-kind detection and validity checks cannot drift if the named ref advances mid-run)
diff_files— the executed output ofdiff_command. Forworking-treescope, this MUST be the union ofgit diff HEAD --name-onlyandgit ls-files --others --exclude-standard(tracked-modified + staged + untracked); omitting untracked files would undercount the actual review surface. Forbranch/base-refit is just thediff_commandoutput.diff_numstat— forbranch/base-ref:git diff --numstat <base_sha>...HEAD. Forworking-tree:git diff --numstat HEADPLUS a synthesized per-untracked-file line count (e.g.wc -lon each untracked file, emitted in the same<added>\t<deleted>\t<path>shape as numstat so the total LOC calculation is uniform). Omitting untracked line counts — as an earlier draft did — would let an untracked-only working-tree diff (100% new files) report 0 numstat LOC and silently qualify for proposal-mode DoD with no commit messages or patch to ground intent. Used to size the diff for the proposal-mode threshold (≤ ~100 changed LOC).commit_range—nullforworking-tree;<base_sha>..HEAD(double-dot, using the frozen SHA for commit-delta enumeration) for branch / base-ref. NOTE: diff uses triple-dot (merge-base), commit enumeration uses double-dot (exact commits on HEAD that are not on base). Usingbase_sha(notbase_ref) keeps enumeration stable against mid-run ref movement.commit_messages[]—[]forworking-tree;git log --format='%s%n%b' <commit_range>splits for branch / base-ref, trimmed per commit. Derives from the frozen-SHAcommit_rangeabove. Proposal-mode DoD drafting reads these to ground item 1 (intent) and item 4 (out-of-scope) in what the commits actually claim.diff_patch_excerpts— bounded content-bearing evidence: a handful of representative files shown mostly in full (small untracked files, key tracked hunks), trimmed with[truncated — <M> more lines]when needed. Keep the total roughly on the order of a few KB so the proposal-mode prompt stays manageable. The goal is "enough for Claude to infer intent and out-of-scope boundaries", not byte-exact compliance.working-tree: always synthesize.branch/base-ref: omit only when the proposal-mode evidence gate is already satisfied by commit messages (≥20-char subject + non-empty body in at least one commit in scope). If the evidence gate fails on messages alone — squashed / templated / vague commits — synthesize excerpts sourced exclusively from the target commit range (git diff <base_sha>...HEADoutput), never from local working-tree state or untracked files, using the same bounded-budget shape asworking-tree. If the range cannot yield a usable excerpt (binary-only, no textual diff), fall back tointerviewmode. This preserves the existing invariant that DoD drafting for branch/base-ref never anchors on a short squash-commit title AND never leaks out-of-range evidence into the proposal.
Proposal-mode evidence gate: even when
diff_numstattotals ≤ 100 LOC, proposal mode requires content-bearing evidence.- For
working-treescope: ifcommit_messages[]is empty ANDdiff_patch_excerptshas no non-blank content (e.g. all untracked files are empty or binary, or all tracked-modified hunks collapsed to no patch), fall back tointerviewmode — filenames and line counts alone cannot draft six DoD items with enough fidelity. - For
branch/base-refscope: commit messages alone are NOT sufficient evidence. Squashed, templated, or vague messages like"fix review comments","wip","update tests"can pass the LOC threshold while giving proposal mode no usable intent or out-of-scope signal. Require thatcommit_messages[]contain at least one commit with a subject of ≥20 characters AND a non-empty body, OR fall back to populatingdiff_patch_excerptsfor branch/base-ref (same budget-based heuristic as working-tree) and passing it forward. If neither evidence path is available — all commit messages are short/empty and no patch excerpts are synthesized — fall back tointerviewmode. The risk this gate blocks is a DoD drafted from the title of a squash commit, which then anchorsreject-out-of-scopedecisions for the whole run.
Every cycle reuses the same
review_targetso the diff scope stays stable even after fixes are applied. -
Verify the target has a non-empty diff.
working-tree:git status --porcelainmust be non-empty. If empty, stop withNo working-tree diff to review. The codex-review-cycle skill requires uncommitted changes when scope is working-tree.branch/base-ref:git diff --name-only <base_sha>...HEAD(use the frozen SHA from step 2, not the mutablebase_ref) must be non-empty. If empty, stop withNo committed changes between <base_ref> (<base_sha>) and HEAD. The codex-review-cycle skill requires a non-empty diff for branch/base-ref scopes.
-
Ensure codex is ready. Invoke
Skill(codex:setup)once to confirm the codex CLI is configured. Stop if setup reports a blocking failure. -
Detect target kind.
- Run
review_target.diff_command. Forworking-tree, also rungit ls-files --others --exclude-standardand union the untracked list with the diff output. - Apply the extension rules in
references/focus-text.md. - Record
target_kindas eithercodeorplan.
- Run
-
Ask for review variant (once, via
AskUserQuestion). Two options:reviewandadversarial-review. Store the choice asvariant. -
Pre-collect DoD (adversarial only) and initialize cycle state. Set
rejected_ledger = [],cycle_history = [],dod = null.- If
variant == adversarial-review, collect the six-item Definition of Done now by invoking the four-mode collection flow inskills/review-scope-guard/references/dod-template.md§Collection Modes, passing the fully-constructedreview_targetfrom Phase 0 step 2 (includingdiff_files,diff_numstat,commit_messages[]) as the proposal-mode input contract. Default tointerview; useproposalwhenreview_target.diff_numstattotals ≤ ~100 LOC AND commit-messages or patch excerpts provide content-bearing evidence; usequickwhen the diff is ≤ ~30 LOC AND the user explicitly said "quick DoD" / "minimal DoD" / similar; usefree-textwhen the user has already pasted a DoD block in the conversation. Ifreview_targetis somehow incomplete (defensive check — Phase 0 step 2 should have populated every field), forceinterviewmode per the scope-guard input contract. Cache the result ondodso<review_context cycle="1"><intent>can be populated from DoD item 1 before step 8 runs. Pass the cacheddod(notnull) toreview-scope-guardat step 10a so the scope-triage skill does not re-ask. This solves the cycle-1 dependency where<review_context>would otherwise need intent that had not yet been collected. - If
variant == review, leavedod = nullhere. Native review does not carry<review_context>, so there is no early-intent dependency. Step 10a's firstreview-scope-guardinvocation will collect DoD interactively at that point.
Also record
pre_cycle_1_head = git rev-parse HEAD— this is the anchor for the step 20 soft-reset at termination. Forworking-treescope this value is unused.Subsequent cycles reuse the cached DoD and pass the running
rejected_ledger/cycle_historyforward. - If
Phase 1 — Review Cycle (repeats up to 3 times; counter N = 1..3)
-
Run the review. Compute
codex_scope_argsfromreview_target.scope:working-tree→--scope working-treebranch→--base <review_target.base_sha>(frozen SHA from Phase 0; NOTbase_ref, which is mutable)base-ref→--base <review_target.base_sha>
Cycle-N>1 preflight (
branch/base-refonly): before invoking codex on cycle 2 or 3, verify the state between cycles is as expected. Letexpected_commit = !cycle_history[N-1].no_fix_cycle(true for normal fix cycles, false for V=0 no-fix retries). Run the following single-pass check:- HEAD movement: compare
git rev-parse HEADagainstcycle_history[N-1].pre_pause_head.- If
expected_commitis true: HEAD MUST have advanced. If equal, the user never committed — re-issue the step 14 manual-commit instruction. - If
expected_commitis false (V=0 retry): HEAD MUST equal the stored head. If HEAD moved, the user pulled or committed unrelated work during the override pause; halt with⚠️ HEAD changed during the V=0 override pause. Retry cycle would review an expanded target. Restart the skill or revert the changes.
- If
- Working-tree cleanliness:
- When
expected_commitis true:git status --porcelain -- <cycle N-1's touched_files>MUST be empty (path-restricted to the fix set; staged/unstaged remnants of applied fixes block the cycle). Untracked files unrelated to the review_target are exempt. - When
expected_commitis false (V=0 retry, no touched_files exists):git status --porcelainwith no path restriction MUST be empty, excepting untracked files unrelated to the review_target. This is strictly wider than the expected_commit=true check because no commit was made — any change to tracked files during the override pause would expand the review target and invalidate the retry. On failure, halt with⚠️ Working tree changed during the V=0 override pause. Retry cycle would review an expanded target. Restart the skill or revert the changes.
- When
- Commit-delta coverage (only when
expected_commitis true):git diff --name-only <pre_pause_head>..HEAD -- <touched files>must be non-empty AND must cover every file in cycle N-1'sapplied_fixes[*].touched_files[]list. Any touched file missing from this delta means the user's commit did not include that file. A legitimate fix that reverts a file back to base is still a valid commit delta even though the file disappears from<base_sha>...HEAD— this variant catches that case because it queries the commit-delta range, not the branch-total range. Skipped entirely for V=0 retries (no commits to audit). - Cycle-commit ownership (warn-and-confirm) (only when
expected_commitis true): compare the full commit-delta path list against cycle N-1'stouched_files. Rungit diff --name-only <pre_pause_head>..HEAD(no path restriction) and letcommitted_pathsbe that output. Paths incommitted_pathsthat are NOT in cycle N-1'stouched_files[]are unrelated — typically lint autofixes, typo repairs, or adjacent cleanups the user bundled into the cycle commit. Rather than abort (previous behavior, which was hostile togit commit -amusage), surface them via a singleAskUserQuestion:question: "Cycle N-1 commit includes <K> path(s) that Claude did not touch: <full path list>. These will be preserved by the terminal soft-reset and ship in the final squash. Keep them as part of this review's squash, or abort for amend-drop?"- options:
Keep (continue to cycle N)— record the extras incycle_history[N-1].unrelated_commit_paths[]for the step-20 Part B audit to surface again at terminal reset. Proceed to cycle N.Abort to amend— printAmend your cycle N-1 commit to drop the unrelated paths, then reply continue.and pause the skill like the manual-commit gate in step 14. Rationale: the hard-abort form of this check rejected normal developer workflows. Warn-and-confirm preserves the signal (user sees unrelated paths per-cycle) without blocking lint-fix-plus-cycle-fix commits. Skipped entirely for V=0 retries.
On any mismatch of the bullets above, do NOT proceed. Print a compact explanation naming the specific check that failed and re-issue the step 14 manual-commit instruction (or the V=0 restart message). Wait for the user to correct the state and reply
continue. Do not silently review stale state.Then:
variant == review:node "${CLAUDE_PLUGIN_ROOT}/scripts/codex-companion.mjs" review --wait <codex_scope_args>Capture stdout as free-form text.
variant == adversarial-review:node "${CLAUDE_PLUGIN_ROOT}/scripts/codex-companion.mjs" adversarial-review --wait --json <codex_scope_args> "<focus_text_with_context>"<focus_text_with_context>is the target-kind focus text fromreferences/focus-text.mdfollowed by the<review_context>block (see §Review Context Format). Parse stdout as JSON.- Parse-retry policy (adversarial only): if JSON parsing fails or any required field is missing (
findings[],severity,file,line_start,title,recommendation), retry the exact same call once. A second failure aborts the cycle, surfaces codex's raw stdout verbatim to the user, and ends the skill.
-
Extract findings and assign IDs
F1..Fn.adversarial-review: usefindings[]as-is.review: Claude manually slices the free-form output into finding blocks. Each block must have atitle(first line of the block, verbatim), arecommendation(the action codex suggests, verbatim), a best-effortfile, andline_start(resolve from context, leave null if codex did not cite a location). Findings without at least atitleand arecommendationare dropped with a note in the summary.
-
Run the validity check silently. For every finding, run the six items in
references/validity-checklist.mdwithout echoing the per-item trace to the user. Every item still requires Claude to Read the cited file internally — do not trust codex's body alone — but file reads and item-by-item reasoning are internal only. Assign each finding a three-value outcome:valid,partially-valid, orinvalid. Record a shortClaude's note(≤20 words) for every finding regardless of outcome — forvalidfindings, note the primary reason the finding is grounded (e.g. "confirmed by reading cited lines", "DoD required feature violation"); forpartially-valid/invalid, note the rejection reason. When multiple findings cite the same file, issue a singleReadcall covering the union of cited ranges and reuse the result for every item-2/item-3/item-4 check — do not re-read the same region per finding.External-source rule (warning-only): external reads (dependency crate sources, standard library docs, upstream README) are allowed as background evidence for Claude's internal reasoning, but they MUST NOT flip the validity verdict. The verdict is always determined from the review diff itself plus what the finding claims. If an external read contradicts or confirms the finding, record it as
Claude's note: background — <source>: <what it showed>without changing the outcome. The silent-trace rule still holds for validity determined solely from the diff — thebackgroundnote is only emitted when Claude actually consulted an external source. This rule replaces an earlier "External-source exception" that allowed verdict-flipping with version-pinned sources; in practice Claude cannot reliably pin dependency versions, and the safe constraint is to forbid verdict-flipping entirely.No severity-based tiering: item 3 (premise matches artifact) is mandatory for every finding that could become selectable. Read tiering was considered (skip item 3 on medium/low) but rejected: self-consistency between title and recommendation does not prove the artifact actually has the claimed behavior. Skipping item 3 would let invalid medium/low findings reach the user-selection UI, which is exactly the silent-hallucination failure mode the validity check exists to catch. The Read cost (1 Read per unique cited file, shared across findings in that file via the union rule above) is acceptable; tiering's savings do not justify the safety weakening. 10a. Run scope triage via
review-scope-guard. InvokeSkill(review-scope-guard)passingfindings[](with the validity outcomes already attached), the cacheddod(pre-collected in step 7 when variant is adversarial; null on cycle 1 for review variant — the skill will collect it interactively then), the runningrejected_ledger,cycle_history(for stop-signal evaluation), andreview_target(already fully constructed in Phase 0 step 2 — pass it verbatim without re-deriving any field). Phase 0 step 2 guaranteesreview_targetcarries the full{scope, base_ref, base_sha, diff_command, diff_files, diff_numstat, commit_range, commit_messages[], diff_patch_excerpts}tuple; step 10a simply forwards it. Do not dropdiff_patch_excerpts— scope-guard's proposal-mode evidence gate consumes it for working-tree targets wherecommit_messages[]is empty. The caller MUST passreview_targetso scope-guard'sproposalDoD mode has an authoritative source; without it, scope-guard falls back tointerviewmode (see scope-guard §Inputs). The skill returns a triage verdict per finding (must-fix/minimal-hygiene/reject-out-of-scope/reject-noise), an updatedrejected_ledger, a set of active stop signals, and the collecteddod(on cycle 1). Cache the DoD for later cycles. Store the triage verdicts alongside each finding for step 11. When DoD is missing, the skill still returns classifications inside the 4-category invariant (fall-through lands inminimal-hygiene); render the degraded-mode warning as documented in Failure Modes. -
Render the summary. Use the exact table format in §Summary Output Template. Every finding appears in the table, including
invalidandreject-*ones. Every finding'srecommendationfield is quoted verbatim below the table (per §Summary Output Template). The active stop signals footer is rendered when (a) any signal has statusADVISORY/ACTIVE/WARNING, OR (b) any signal isnot evaluated: metrics missing. Omit the footer only when every signal is trulysilent. When the footer renders solely due tonot evaluatedrows, print a compact one-line notice —Not evaluated (metrics missing): <comma-separated signal names>— instead of the full signal table.Structurally-unevaluable compaction: subtract
structurally_unevaluable_signal_namesfrom thenot_evaluated_signal_namesset before rendering. The structurally-unevaluable names are shown once in cycle 1's footer as_Stop signals unavailable in codex-review-cycle integration: <names> (standalone invocation required for full 5-signal surface)._and omitted from cycle 2+ footers entirely. This replaces the previous behavior wherefile-bloat/reactive-testingappeared in every cycle'sNot evaluatedlist.Additionally, starting from cycle 2, compare the current-cycle
not_evaluated_signal_names(taken fromreview-scope-guard's return value received in step 10a of the current cycle — NOT fromcycle_history[current], which is only appended later in step 15) againstcycle_history[N-1].not_evaluated_signal_names(the immediately previous cycle, not cycle 1) using the element-wise-equal semantics inreview-scope-guard/references/stop-signals.md§Per-cycle suppression. Comparing against N-1 (not cycle 1) prevents flapping from being masked: a set that differs from cycle 1 → matches cycle 2 → differs from cycle 3 would otherwise be silently suppressed if only the cycle-1 baseline were checked. This ordering is required because step 11 runs before step 15 persists the current cycle's entry; readingcycle_history[current]at step 11 would read stale or empty state. If the two lists are equal, print_Not evaluated: unchanged from cycle N-1 — see cycle N-1 summary for signal list._instead of re-listing the names. If they differ, re-render the full list AND add_Not evaluated delta vs cycle N-1: added=<names>, removed=<names>._so the change is visible. The canonical order guarantees ordering-only differences cannot occur; guard for them anyway.Validity fleet-rate check (plan targets only, ≥5 findings): if the current cycle has ≥5 findings and 100% are classified
valid, print a single-line calibration warning at the bottom of the summary:⚠️ 100% valid rate with ≥5 findings is unusual for adversarial-review on plan targets. Re-scan for: (1) vague recommendations that should be 'partially-valid: vague', (2) already-handled premise that should be 'invalid: misread', (3) design-intent reversals that should route through scope triage as 'reject-out-of-scope' instead of being accepted as must-fix.This is a soft prompt, not a hard gate — the cycle proceeds normally. Raised threshold (was ≥3) and plan-only scope prevent false alarms on small focused diffs, where 3 valid findings is a normal outcome. -
Zero-valid check. Let
Vbe the count of findings whose validity outcome isvalidorpartially-validand whose scope category ismust-fixorminimal-hygiene.reject-out-of-scopeandreject-noisefindings are never counted as selectable, even if their validity outcome wasvalid. IfV == 0:- If
N == 3(final cycle), jump to Phase 2 Case A unconditionally — the cap has fired. - If
variant == review(native), jump to Phase 2 Case A unconditionally. The V=0 override is not available for native review because the nativereviewcommand accepts neither a focus-text argument nor a<review_context>block — there is no channel to deliver an<angle_request>instruction. Re-running the same command against the same diff would be a hidden no-op that still consumes one of the 3 cycles. The override is therefore scoped tovariant == adversarial-review. - If
N < 3andvariant == adversarial-review, issue a singleAskUserQuestionbefore terminating:question: "No selectable findings this cycle. Terminate the review, or run one more cycle with a different angle request?"- options (translate to user language per §Language):
Terminate now (Case A)— proceed to Phase 2 Case A.Run cycle N+1— proceed to the no-fix persist step below, then re-enter step 8 withN = N + 1. The next cycle's<review_context>carries a one-line<angle_request>element:<angle_request>Prior cycle produced 0 selectable findings. Try a materially different angle — e.g. a deeper root-cause pass, a different subsystem emphasis, or a scope that cuts across files not yet reviewed.</angle_request>inserted between<previous_fixes>and<rejected_findings>.
No-fix cycle-history persist (before re-entering step 8): because V=0 means no selection and no fix phase, step 15's normal persistence never runs. Without explicit persistence here, the next cycle's
<review_context>andcycle_history[1]reference would be stale or empty. Before returning to step 8, append an entry tocycle_historyfor the just-completed cycle with the following shape:applied_fixes: [](empty — no fix phase)user_declined: [](empty — no selection UI was opened)skipped_for_scope: [](empty)claude_invalid: []— populated from the current cycle's validity check (findings whose validity wasinvalid). This carries forward into the next cycle's<rejected_findings>via the normal union with the ledger.not_evaluated_signal_names: the current-cycle return value fromreview-scope-guardstep 11 (same value used for the step 11 footer comparison)pre_pause_head:nullforworking-tree;git rev-parse HEADotherwise (no branch/base-ref pause occurs on V=0 since there were no fixes to commit)no_fix_cycle: true— explicit marker that this cycle had no fix phase. The cycle-N>1 preflight in step 8 consumes this marker to setexpected_commit = false: HEAD is required to be UNCHANGED (HEAD == pre_pause_head), full working tree required clean (no path restriction), commit-delta and ownership checks skipped. See step 8 §Cycle-N>1 preflight for the full unified rule.
Also persist the current
rejected_ledger(whichreview-scope-guardalready updated) for the next cycle's forwarding. This persistence is cheap (all buckets are empty exceptclaude_invalidandnot_evaluated_signal_names) but required for<review_context>correctness.The override path is bounded by the existing 3-cycle cap: requesting cycle N+1 from a V=0 state still consumes one of the 3 cycles. The user cannot escape the cap this way.
- If
-
Ask the user which findings to fix. Use
AskUserQuestion(multiSelect: true)per §User Selection UI. Only findings with scopemust-fixorminimal-hygieneappear as options (further filtered by validity to excludeinvalid).reject-out-of-scopeandreject-noisefindings are never offered for selection — they live in the summary table for audit trail only. Always append a finalNone — skip all, end cycleoption. 13.5. Fix-weight precheck (self-discipline gate). Before applying any selected finding, verify that the planned edit matches the finding's scope classification. This check runs silently — it adds no user-visible output unless a mismatch is detected.must-fixallows multi-line edits, new sections, flow changes, and cross-file edits within the review diff.minimal-hygieneallows only 1-line edits, a single short paragraph addition, or a 1-sentence rule insertion. Edits that exceed this envelope indicate the finding should have been classifiedmust-fix, not hygiene, and the rest of the workflow would miscount it.- On mismatch (a
minimal-hygienefinding whose planned fix exceeds the hygiene envelope): either (a) simplify the edit to hygiene-scope and apply, or (b) raise anAskUserQuestionasking the user whether to re-classify the finding asmust-fixbefore proceeding. Do not silently apply a must-fix-weight edit to a minimal-hygiene finding. reject-*findings must not trigger any edit — skip entirely.- Rationale: without this gate,
minimal-hygienefindings can receive multi-line structural edits, recreating the over-engineering pattern the skill is designed to prevent. This gate forces the classification and the applied weight to match.
-
Apply fixes. For each selected finding, Claude reads the cited lines, applies the fix via
EditorWrite, and reports the resultinggit difffor the touched files. No sync-sweep, no rescue delegation.Write-scope boundary: Claude edits only files present in
review_target.diff_commandoutput (plus untracked files forworking-treescope). If a finding's fix genuinely needs an out-of-diff file, skip the finding with a noteSkipped: requires out-of-diff write. Out-of-diff writes are a scope expansion that must go through a separate skill invocation, not through the user-selection UI.How the fixes become visible to the next cycle depends on
review_target.scope:working-tree: fixes are left in the working tree. Cycle N+1's--scope working-treereview sees the staged + unstaged + untracked state directly. No commit is needed.branch/base-ref: codex's branch diff is computed as<merge-base>..HEAD, so in-place edits are invisible until they land in a commit on HEAD. Claude does not commit on the user's behalf. Before printing the manual-commit instruction, recordpre_pause_head = git rev-parse HEADintocycle_history[current].pre_pause_head— the next cycle's preflight uses this anchor (plus the per-fixtouched_files[]list from step 15) to verify the user's actual commit delta, not just worktree cleanliness. Then, after all selected fixes are applied this cycle, print a manual-commit instruction and pause the skill:Cycle N fixes applied to working tree. Branch/base-ref scopes require you to commit these changes before cycle N+1 can see them. Recommended commands: git add <touched files> git commit -m "review cycle N fixes" After committing, reply `continue` to proceed to cycle N+1. Reply `stop` to end the skill here.The user owns pre-commit hook outcomes, clean-index concerns, and rollback. If the user replies
stop, end the skill in Case B-like state (applied fixes remain uncommitted in the working tree; the user can deal with them however they like). If the user repliescontinue, proceed to step 8's cycle-N>1 preflight which verifiesgit rev-parse HEADhas moved.
Sibling-doc cascade check: when a fix changes a user-facing contract of the skill (adds a new side effect the skill did not previously have, changes a stated invariant, introduces a step that sibling docs describe as absent), Claude must in the same edit pass grep sibling docs (
README.md, other SKILL.md sections, CHANGELOG entries for the current release) for claims describing the OLD behavior, and update every match. Specifically runrg -n '<characteristic phrase from old behavior>' .for at least one phrase, and either edit every hit or leave an explicit NOTE comment explaining why a mismatch is acceptable. Rationale: catching contract-breaking fixes in the same edit pass prevents silent contract breaks that would only surface in a later cycle. -
Update cycle history and ledger. Append to
cycle_historyan entry for this cycle recording:applied_fixes[]— each entry records{fingerprint, title, file, line_start, scope_category, touched_files[]}.fingerprintis the stable{normalized_title, file, line_start, scope_category}tuple used by step 17's residual matcher.touched_files[]is the exact list of files Claude edited while applying the finding — the preflight in step 8 consumes this list to verify those files are visible in cycle N+1's branch diff.user_declined[]— each entry records{fingerprint, title, file, line_start, scope_category}formust-fix/minimal-hygienefindings the user did not select (including theNone — skip allcase).skipped_for_scope[]— each entry records{fingerprint, title, file, line_start, scope_category, reason}for findings the user selected but Claude skipped because their fix required an out-of-diff write (see step 14 Write-scope boundary). These count as unresolved at termination time — Case A lists them alongside user-declined carry-overs and must not claim clean resolution while the bucket is non-empty.claude_invalid[]— each entry records{fingerprint, title, file, line_start, rejection_reason}forinvalidfindings from the validity check.not_evaluated_signal_names[]— the ordered string array returned byreview-scope-guardstep 11. Stored verbatim, no mutation. Used by step 11's footer rendering in cycle N+1 to decide whether to suppress thenot evaluatedfootnote.unrelated_commit_paths[]— optional, populated only when the user choseKeepat the cycle-N>1 ownership gate. Lists paths from the cycle commit that were NOT inapplied_fixes[*].touched_files[]. The step-20 Part B terminal audit consumes this list to display the unrelated paths one more time before the final squash, so the user can decide anew whether to include them in the final commit.
All four buckets carry fingerprints so step 17's residual accounting matches on the stable
{normalized_title, file, line_start, scope_category}tuple, not on title alone.The
rejected_ledgerreturned by step 10a is already updated withreject-out-of-scopeandreject-noiseentries; persist it as-is for the next cycle. The next cycle's<review_context><rejected_findings>block is populated from the union of ledger entries andclaude_invalidonly — not fromuser_declined[]orskipped_for_scope[]. Declines and out-of-diff skips are deferrals, not rejections: leaving them out of<rejected_findings>lets codex freely re-raise the same findings next cycle so the user can reconsider them. Termination-time accounting still tracks them as unresolved residuals (see step 17 Case A). -
Loop check.
N < 3: setN = N + 1, return to step 8.N == 3: always jump to Phase 2 Case A. The Case A routing internally chooses between the clean-termination variant and the residual-carried-forward variant based on whethercycle_history[*].user_declined[]+cycle_history[*].skipped_for_scope[]leave any unresolved residuals (see step 17). Final-cycle user declines are handled by the residual variant, not by Case B — the user explicitly dispositioned each finding through the selection UI, which is an active close-out, not a cap failure.- Case B is reserved for an explicit cap-stop condition where the cycle could not run the user-selection UI to completion (e.g. the user interrupted mid-paging during an overflow batch, or the skill aborted before step 13). Normal 3-cycle completion with some user declines is Case A residual, not Case B.
Phase 2 — Termination
- Case A — normal termination. Compute the full residual set: scan
cycle_history[*].user_declined[]andcycle_history[*].skipped_for_scope[]across all prior cycles. For each, compute a stable fingerprint{normalized_title, file, line_start, scope_category}(same format asreview-scope-guard's ledger fingerprint — reuse that rule). A residual is "carried" if no later cycle'sapplied_fixes[]contains an entry with a matching fingerprint. Matching on title alone is forbidden because generic adversarial titles collide across unrelated findings and could silently clear a residual. If the carried residual set is empty, printAll findings resolved after N cycle(s).— the clean-termination variant. Otherwise printReview cycle terminated after N cycle(s) with residuals carried forward.(never the "resolved" line) followed byUser-declined valid findings carried to termination:andOut-of-diff skipped findings carried to termination:lists, with each entry showing<title> (<file>:<line_start>, declined in cycle N)so the user can audit. Either way, also print the mandatory⚠️ No automated verification was runwarning and the per-cycle applied fixes summary. - Case B — cap reached. Print the template in §Termination Criteria Case B. Do not automatically start a fourth cycle. Tell the user they can re-invoke the skill to run another 3-cycle pass.
Review Context Format
Used only when variant == adversarial-review. The block is appended to the focus text argument with a single blank line between the two sections:
<review_context cycle="N">
<intent><![CDATA[<one-sentence change intent from Phase 0 step 7 DoD item 1>]]></intent>
<previous_fixes>
<fix cycle="N-1"><![CDATA[<applied finding title + one-line change summary>]]></fix>
</previous_fixes>
<angle_request><![CDATA[<one sentence; present only when V=0 override fired in the previous cycle>]]></angle_request>
<rejected_findings>
<rejected cycle="N-1" reason="invalid: file not in diff"><![CDATA[<finding title>]]></rejected>
<rejected cycle="N-1" reason="reject-out-of-scope: DoD explicit out-of-scope"><![CDATA[<finding title>]]></rejected>
</rejected_findings>
</review_context>
<angle_request>element (optional): present only when the previous cycle terminated at step 12 V=0 and the user choseRun cycle N+1. Contains a single sentence asking codex to try a different angle. Omit the element entirely when absent.
Template note: this block never carries user-declined findings. A user decline is a deferral — codex should remain free to re-raise the same finding next cycle so the user can reconsider. If a template reader is tempted to add a <rejected reason="user declined"> element, stop: that would let declined valid findings disappear from subsequent cycles and make Case A falsely claim resolution.
Rules:
- Cycle 1 carries
<intent>(populated from Phase 0 step 7 DoD item 1 pre-collection);<previous_fixes>and<rejected_findings>are empty. <review_context>is preceded by this literal instruction, on its own line:Do not re-report findings in <rejected_findings> unless you have a materially different angle.- Every user-facing string inside
<!-- CDATA -->is quoted as-is. No JSON encoding. No HTML entity escaping. The CDATA wrapper keeps any<,>,&in codex output from terminating the block. - This skill does not use a separate skip ledger.
<review_context>is the only cross-cycle carry. <previous_fixes>window: the block carries only the immediately prior cycle (N-1), not a cumulative history. Cycle 3's<review_context>contains the 5 fixes from cycle 2; it does NOT also enumerate cycle 1's fixes. Each<fix>element uses the compact form<fix cycle="N-1" category="must-fix|minimal-hygiene"><![CDATA[<title>: <≤40 word summary>]]></fix>— summaries longer than 40 words are forbidden. Codex only needs the latest ground truth for cross-cycle suppression; older history would inflate the context block without improving review quality. V=0 exception: whencycle_history[N-1].no_fix_cycle == true(prior cycle was a V=0 override retry and emitted no fixes), cycle N's<previous_fixes>skips the empty cycle N-1 and carries fixes from cycle N-2 instead. Without this exception, codex would lose context of cycle 1's applied fixes when cycle 2 was V=0 no-fix, causing re-surfacing of already-fixed findings in cycle 3.<rejected_findings>sources: the block aggregates two kinds of prior-cycle rejections — (1) entries in therejected_ledgerreturned byreview-scope-guard(scope-triage rejections:reject-out-of-scope/reject-noise), and (2)claude_invalid[]from the prior cycle's validity check. Each rejection renders as its own<rejected>element with thereasonattribute carrying the original category and rationale (e.g.reason="reject-out-of-scope: DoD explicit out-of-scope",reason="invalid: file not in diff"). Ledger entries withcount >= 2render with an extra hint:reason="reject-noise: already-rejected (count=N)"so codex sees how persistent the complaint is. User-declined findings are NOT included — a decline is a deferral, not a rejection, and codex is free to re-raise the same finding in the next cycle so the user can reconsider it.
Validity Check Summary
Full details live in references/validity-checklist.md. The six items are:
- File exists in the diff —
finding.fileappears in the output ofreview_target.diff_command(plusgit ls-files --others --exclude-standardwhenreview_target.scope == working-tree). - Line range exists —
finding.line_startis within the current file length; flag shifted ranges aspartially-valid. - Premise matches artifact — Claude reads the cited lines and confirms codex's assertion.
- Scope —
line_start..line_endoverlaps a changed hunk in the scope-appropriate diff (git diff HEAD -- <file>for working-tree;git diff <base_sha>...HEAD -- <file>for branch / base-ref, using the frozen Phase-0 SHA), not unchanged code in a touched file. - Recommendation concreteness — a specific failure mode is named, not a vague "consider…".
- Target-kind consistency — plan cycles reject detailed-design nitpicks on
.md/.markdown/.txtfiles.
Outcome: valid (all pass), partially-valid (items 2 or 5 returned partially-valid, no invalid), invalid (any of items 1, 3, 4, 6 returned invalid).
Summary Output Template
Language reinforcement: the template below uses English for readability of the SKILL.md spec itself. When rendering actual output, translate ALL non-verbatim elements to the user's language per §Language: section headers, column headers (except Title (codex verbatim)), Claude's note content, Recommended action values, the recommendation block heading, stop-signal footer text, and termination messages. Only codex's title and recommendation fields stay in their original language (they are contractually verbatim).
Render after every cycle, before the user selection prompt:
### Cycle N review summary (variant: <review|adversarial-review>, target: <code|plan>)
| ID | Severity | File:Line | Title (codex verbatim) | Validity | Scope | Claude's note | Recommended action |
|----|----------|-----------|------------------------|----------|-------|---------------|--------------------|
| F1 | high | src/auth/login.ts:42 | Missing null check on userId | valid | must-fix | DoD required features; core correctness | Apply fix |
| F2 | medium | src/api/user.ts:88 | Consider adding retry logic | partially-valid | reject-noise | vague, no concrete failure mode | Skip |
| F3 | low | docs/plan.md:15 | Rename process to handler | invalid | reject-noise | detailed-design on plan target | Skip |
| F4 | medium | src/curl.rs:130 | --url-query value leaks to URL | valid | minimal-hygiene | value consume + warn; semantics NOT implemented | Apply 1-line hygiene |
| F5 | medium | src/curl.rs:120 | Implement --json shorthand body | valid | reject-out-of-scope | DoD explicit out-of-scope: cURL 7.82+ new | Skip (ledger fwd) |
**Recommendation (per finding)**:
- **F1**: <codex recommendation verbatim>
- **F2**: <codex recommendation verbatim>
...
Quote every finding's `recommendation` field verbatim below the table. Do not skip quoting even when the title seems to imply the recommendation — the user needs the full recommendation text to make an informed fix/decline decision without reading the raw codex JSON.
**Active stop signals** (footer rendered when ≥1 signal is `ADVISORY`/`ACTIVE`/`WARNING` **or** `not evaluated: metrics missing`; omit entirely only when all signals are truly `silent`. When only `not evaluated` rows exist, replace the full table with a compact one-liner `Not evaluated (metrics missing): <names>`):
| Signal | Status | Evidence |
|--------|--------|----------|
| ... | ... | ... |
Format rules that protect finding intent
- The
Title (codex verbatim)column must contain codex'stitlefield exactly. No paraphrase, no shortening, no translation. - The
Recommendation (per finding)block must contain each finding's fullrecommendationfield verbatim, regardless of length. Never truncate, summarize, or abbreviate — the user needs the complete remediation text to make an informed fix/decline decision. - Claude's interpretation lives only in the
Claude's notecolumn and theRecommended actioncolumn. Do not edit any other column based on what Claude thinks the finding "really means". - If Claude judges a finding
invalid, the row still appears in the table with the original title and recommendation. TheClaude's notecolumn then carriesinvalid because <reason>. - If
review-scope-guardclassifies a finding asreject-out-of-scopeorreject-noise, the row still appears in the table for audit. TheScopecolumn carries the category andClaude's notecarries the triage rationale verbatim from the skill's output. - Severity values come from codex. Do not upgrade or downgrade severity based on Claude's validity or scope verdict.
User Selection UI
Language reinforcement: AskUserQuestion question, header, and option label/description fields must be in the user's language per §Language. Codex verbatim titles embedded in labels stay in their original language.
Use AskUserQuestion with multiSelect: true. Only findings whose scope is must-fix or minimal-hygiene AND whose validity is valid or partially-valid appear as options. invalid, reject-out-of-scope, and reject-noise findings are never selectable — the user sees them in the summary table above for audit trail only.
minimal-hygiene options include a (hygiene) marker in the label so the user knows the expected fix is 1-line value consume + warn, not a full implementation.
Base layout. Token rule: each option's description field must carry only the finding's file:line — nothing else. The label already encodes the title, severity, and scope; the summary table above already carries rationale and Claude's note. Repeating any of that in the description is wasted context.
question: "Which findings should I address in cycle N?"
header: "Cycle N fixes"
multiSelect: true
options:
- { label: "F1: Missing null check on userId (high, must-fix)", description: "src/auth/login.ts:42" }
- { label: "F4: --url-query value leaks to URL (medium, hygiene)", description: "src/curl.rs:130" }
- { label: "None — skip all, end cycle", description: "End this cycle" }
Overflow handling (more than 3 selectable findings per severity)
AskUserQuestion accepts maximum 4 options per question; reserve one for None — end cycle, leaving 3 finding slots per question. When a severity bucket has more than 3 selectable findings, issue multiple sequential AskUserQuestion calls (3 findings each) in severity order until every selectable finding has been surfaced. No finding may be silently deferred just because it did not fit on a page — the fix phase does not begin until every selectable finding has been shown to the user and either applied or declined.
Termination Criteria
Language reinforcement: the templates below are in English for spec readability. Actual output must be in the user's language per §Language. Translate all headings, messages, and labels; keep codex verbatim titles and technical identifiers (must-fix, minimal-hygiene, file paths) as-is.
Case A — V == 0 (normal termination):
When the residual set (carried user-declined + carried out-of-diff skipped) is empty:
All findings resolved after N cycle(s).
⚠️ No automated verification was run. This skill never executes tests, lints, builds, or any verification command on behalf of the user. The "resolved" claim only means "codex returned zero selectable findings this cycle and no residuals were carried from prior cycles". Before shipping, review the applied diff and run your own verification (test suite, type check, lint, build, manual smoke) as appropriate for the change.
Applied fixes by cycle:
- Cycle 1: <list of finding titles or "none">
- Cycle 2: <list or "none">
- Cycle 3: <list or "none">
When any residuals exist (declined carry-overs, out-of-diff skips, or final-cycle declines), swap the opening line and list the residuals — do NOT print "All findings resolved":
Review cycle terminated after N cycle(s) with residuals carried forward.
⚠️ No automated verification was run. See the clean-termination variant above for rationale.
Applied fixes by cycle:
- Cycle 1: <list of finding titles or "none">
- Cycle 2: <list or "none">
- Cycle 3: <list or "none">
User-declined valid findings carried to termination: <titles from cycle_history[*].user_declined[] that never appear in a later cycle's applied_fixes[], or "none">
Out-of-diff skipped findings carried to termination: <titles from cycle_history[*].skipped_for_scope[] that never appear in a later cycle's applied_fixes[], or "none">
Case B — 3 cycles complete with unresolved valid findings:
## Review cycle terminated — cap reached
- Cycles run: 3 / 3
- Findings applied: <count>
- Findings still valid and unresolved at cap: <count>
⚠️ No automated verification was run on the applied fixes — see Case A for rationale.
### Unresolved valid findings
<Summary Output Template table, filtered to valid/partially-valid findings that were never applied>
### Next steps
- Re-run `codex-review-cycle` after further work, or
- Address the unresolved findings manually, or
- Explicitly accept them as known residuals.
The skill never advances to a fourth cycle. The user must invoke the skill again to continue.
-
Review assessment. After printing Case A or Case B output, render a concise review assessment block in the user's language (per §Language) to help the user decide whether to re-invoke the skill or move on:
## Review assessment **Trend**: <1 sentence — e.g. "converging (5 → 4 → 3, severity shift from high to medium)", "stable (structural gaps in each cycle)", "cascading (cycle N fixes created cycle N+1 findings)"> **Character**: <1 sentence — e.g. "mostly state-model gaps", "edge cases and design-philosophy arguments", "doc/wording consistency issues"> **Clusters** (optional — render only when ≥2 **rejected-ledger** entries share a `cluster_id`): `<cluster_id>`: <N> ledger entries across <M> cycle(s) (see ledger entries L<i>, L<j>, ...). Emit at most 3 cluster lines, sorted by finding count descending. If no cluster has ≥2 members, omit the line entirely. **Scope limitation**: cluster accounting is intentionally limited to rejected-ledger entries because only those carry `cluster_id` (see `review-scope-guard` Phase 3 step 9 assignment rule). Applied-fix findings do not participate in cluster summary; extending the carrier to applied fixes is deliberately deferred to avoid inconsistent partial counts. **Recommendation**: <"continue reviewing" | "stop and audit scope" | "move to next work" with 1-sentence rationale. Determined from recorded state only: - If any `must-fix` or `minimal-hygiene` residual was carried to termination → "address residuals before shipping" - If any stop signal is `ACTIVE` or `WARNING` → "stop and audit scope" (aligns with review-scope-guard's stop-signal contract: ACTIVE/WARNING means diminishing returns or scope drift, not a reason to run more cycles) - If clean termination (no residuals) AND finding count decreased across cycles AND no stop signal tripped → "move to next work" - Otherwise → "continue reviewing" (default-safe)> **Suggested next action**: <concrete 1-line action — e.g. "squash and merge to main", "run 1-cycle working-tree dogfood on the applied fixes", "address the 2 carried residuals manually before merging">This block is advisory — it does not gate any action. Keep each part to one sentence; do not re-list findings or repeat the termination summary.
-
Soft-reset temporary cycle commits (
branch/base-refonly). During the review run, the user created one commit per cycle at Claude's request (step 14 manual-commit pause). These are intermediate review-cycle artifacts, not the user's intended final commit. To keep the applied code changes while removing the intermediate commit history:-
If
review_target.scope == working-treeor no cycle commits were created, skip this step silently. -
Terminal-cycle verification: before resetting, verify the final cycle's applied fixes were actually committed. Run
git status --porcelain -- <final cycle's touched_files>. If any files have uncommitted changes, print⚠️ Final cycle has uncommitted applied fixes (<file list>). Soft-reset will NOT stage these — only committed changes become staged after reset. Commit them first, or they will be lost from the staged state.and skip the reset with a manual-squash fallback:git reset --soft <pre_cycle_1_head>. -
Retrieve
pre_cycle_1_headfrom Phase 0 step 7 and record the currentHEADasfinal_head. -
Dirty-state audit (pre-preview): before preview, confirm no non-cycle-owned state would be staged by the reset. Compute
cycle_owned_files= union ofcycle_history[*].applied_fixes[*].touched_files[], then:-
Part A (uncommitted): run
git status --porcelainand inspect every entry:- Entries that refer to files outside
cycle_owned_filesare unrelated uncommitted state that would survivegit reset --soft. - Entries that refer to files inside
cycle_owned_filesare also blocking unless they are the final cycle's applied-fix files that the Terminal-cycle verification above already cleared. Any staged/unstaged edit on an earlier-cycle cycle-owned file (or on a final-cycle file that the Terminal verification failed on) bypassesgit reset --soft— soft-reset preserves the index and working tree but will NOT stage an unstaged edit, so the workflow's "all applied fixes are staged" claim becomes false. Surface every such entry in the abort output below, including staged vs unstaged status, so the user can commit/stash before re-running.
- Entries that refer to files outside
-
Part B (committed-range ownership): run
git diff --name-only <pre_cycle_1_head>..<final_head>and compare againstcycle_owned_files. Any path in the committed delta that is NOT incycle_owned_filesis an unrelated file the user accidentally included in a cycle commit;git reset --softwill stage it into the final squash without the preview flagging it (the preview only shows--stat, which lists filenames but does not cross-check ownership). Part B catches what Part A cannot: unrelated work already committed into the cycle range. Paths present incycle_history[*].unrelated_commit_paths[](user-approved during cycle-N>1 ownership gate) are NOT treated as abort-worthy at Part B — they already got a user decision. Part B surfaces them in the preview output with a(user-approved unrelated)tag so the final squash commit accurately reflects what is being shipped. -
If either Part A or Part B reports entries outside
cycle_owned_files, print the following and abort the soft-reset entirely:⚠️ State outside the cycle-owned files detected. `git reset --soft <pre_cycle_1_head>` would preserve this into the final staged index, mixing unrelated work into what looks like a "cycle-only" squash. State leaking into the soft-reset target: <list every entry with its source tag: [A-uncommitted] / [A-dirty-owned] / [B-committed] — one per line; if all three categories are empty this entire abort block is not printed> Resolve before continuing: - [A-uncommitted] paths: commit, stash, or clean them. - [A-dirty-owned] paths: commit or clean the staged/unstaged remnants on cycle-owned files. - [B-committed] paths: amend the offending cycle commit to drop the unrelated file, OR rewrite the commit range to exclude it, OR explicitly include them in a manual post-reset commit by running `git reset --soft <pre_cycle_1_head>` yourself after clearing [A-*]. Re-invoke the skill after the range is cycle-clean. -
Stop the skill here (do NOT proceed to preview or reset) until the user fixes the state and re-invokes. The soft-reset preview gate gives false confidence if the preview shows only the cycle diff while the actual post-reset staged index would contain unrelated work (uncommitted OR committed).
-
-
Soft-reset preview (confirmation gate): only reached when the dirty-state audit passes. Before running
git reset --soft, show the user what will be squashed. Rungit log --oneline <pre_cycle_1_head>..<final_head>to list the cycle commits, andgit diff --stat <pre_cycle_1_head>..<final_head>to show the cumulative change. Print:About to soft-reset <N> cycle commit(s) onto <pre_cycle_1_head>. Why squash: the cycle commits (`review cycle 1 fixes`, `review cycle 2 fixes`, ...) are intermediate artifacts of the review loop. Most users want a single final commit that represents the applied change; soft-reset preserves every line of every cycle fix in the staged index, so you can write the final commit message yourself. Declining keeps the cycle commits in place if you prefer their granular history. After reset, all changes below are staged in the index (working tree unchanged). You create your own commit message. Commits to be collapsed: <output of git log --oneline ...> Cumulative change (will be staged): <output of git diff --stat ...>Approved-unrelated paths notice (only when
cycle_history[*].unrelated_commit_paths[]is non-empty): before the main reset prompt, display an informational section:Approved unrelated paths from earlier cycles (user-tagged during cycle-N>1 ownership gate): - <path> (approved in cycle N) - ... These files are NOT in any applied fix's touched_files. They were bundled into cycle commits and will be preserved by the soft-reset into the final staged index. If you changed your mind about including them, decline the next prompt and amend the cycle commits manually.This is a display-only notice, not an AskUserQuestion — the user already tagged these paths as approved during the cycle-N>1 ownership gate. The main reset prompt below is the opportunity to back out.
Then issue the main reset
AskUserQuestion:question: "Collapse these N cycle commit(s) into a staged index, ready for your commit?"- options:
Yes, soft-reset now— proceed to the next bullet.No, leave cycle commits as-is— skip the reset; printCycle commits left in place. Squash manually withgit reset --soft <pre_cycle_1_head>if desired.and end the skill.
-
Run
git reset --soft <pre_cycle_1_head>. This removes all intermediate cycle commits from HEAD but leaves the accumulated changes staged in the index. The user's working tree is unchanged. -
Print:
Soft-reset: <N> temporary cycle commit(s) (<pre_cycle_1_head>..<final_head>) removed. All applied fixes are staged in the index. Create your own commit: git commit -m "<your message>"
-
Preconditions Recap
- Git CLI available on
PATH. - Current working directory inside a git repository.
- The chosen review target produces a non-empty diff: either uncommitted changes exist (
working-tree), or HEAD is ahead of the auto-detected default branch (branch), or the user-supplied ref exists and<ref>...HEADis non-empty (base-ref). - Codex plugin installed and
Skill(codex:setup)reports a ready state. review-scope-guardskill available (invoked at step 10a for scope triage and DoD collection).- Both
codex-review-cycleandreview-scope-guardare registered with the Claude Code harness. If not (e.g. during local development before marketplace publication), follow the SKILL.md steps manually — every step is self-contained.
Failure Modes
- Codex CLI missing or setup incomplete — stop in Phase 0 step 4. Tell the user to install the codex plugin or run
/codex:setup. - Default branch not detected (scope =
branch) — stop in Phase 0 step 2 with guidance to re-run with scope =base-refand an explicit ref. - User-supplied ref not found (scope =
base-ref) — stop in Phase 0 step 2 withBase ref '<ref>' not found in this repository. - JSON parse failure (adversarial) — retry once; a second failure aborts the cycle with codex's raw stdout surfaced verbatim.
- File cited by codex no longer exists — item 1 of the validity check returns
invalid: file not in diff. The finding is listed in the summary but not selectable. - User has no working-tree diff after a cycle's fixes are applied (scope =
working-tree) — continue to the next cycle anyway (the next review will see the committed state). Do not silently skip cycles. Forbranch/base-refscopes the diff is against a committed base, so in-cycle fixes never empty the diff. - User declines every finding across all 3 cycles — terminate in Case A with the user-declined message, not Case B. The cap did not fire; the user actively closed the loop.
- User declines the DoD interview in cycle 1 step 7 (adversarial) or step 10a (review) —
review-scope-guardstays inside the 4-category invariant: fall-through findings still classify asminimal-hygiene, and ledger/vague findings still classify asreject-noise. No 5thunclassifiedbucket is created. The summary table footer prints⚠️ DoD not collected — scope triage degraded. Review each selectable finding manually before applying; the minimal-hygiene fall-through is weaker than a DoD-anchored classification.The user is the last line of defense in this degraded mode. - Stop signal
ACTIVEorWARNINGduring cycles 1-2 — print the recommendation in the summary but do not auto-stop. The cycle cap still governs termination. - User chooses
Run cycle N+1from a V=0 state but codex again returns 0 selectable findings — the next V=0 offer is still issued per step 12 (adversarial-review variant only); the user can choose to terminate or burn another cycle. The cap still governs. Do not suppress the offer just because it fired before. - V=0 fires under
variant == review— the override path is unavailable; skip directly to Phase 2 Case A as documented in step 12. The summary row for the cycle still renders, and the finalReview assessmentshould note "V=0 under native review — override disabled, see step 12" so the user understands why no cycle N+1 offer appeared. no_fix_cycle: trueentry is internally inconsistent — corruption is defined by same-entry contradiction, not by comparison with earlier cycles. A valid applied-then-V=0-retry sequence (cycle 1: applied_fixes non-empty→cycle 2: no_fix_cycle=true, applied_fixes=[]→cycle 3: uses cycle-2 marker to exempt preflight) must be honored — cycle 2 having a no-fix marker while cycle 1 had fixes is NORMAL. Treat the marker as corrupted ONLY when the same entry that carriesno_fix_cycle: truealso has non-emptyapplied_fixes[],user_declined[], orskipped_for_scope[]. In that (truly contradictory) case, print⚠️ Inconsistent no_fix_cycle marker on cycle N-1 (marker true but the same entry has applied/declined/skipped entries). Running full preflight.and run the full preflight ignoring the marker. This is defense-in-depth against a corrupted state writer; normal applied-then-V=0 flow is untouched.- Conversation context is lost mid-run (e.g. compaction, tab close, long idle) — the skill's state (cycle_history, rejected_ledger, review_target, dod) lives only in the active conversation. If context is truncated or the session resets, the in-flight run CANNOT be resumed automatically. Recovery steps: (1) if any cycle commits exist on
branch/base-refscope, the user may squash them manually withgit reset --soft <pre_cycle_1_head>fromgit reflog; (2) if applied fixes sit uncommitted onworking-treescope, they stay in place and the user commits normally; (3) restart the skill from Phase 0 on the current state — the new run does NOT know about prior cycles' rejected_ledger, so codex may re-raise findings that the earlier run rejected as noise. State persistence across session breaks is deferred to a separate plan; this bullet documents the current fallback. - User wants to cancel the skill mid-cycle — at any
AskUserQuestionprompt, the user can type a message indicating cancellation (e.g. "stop", "cancel", "abort"); Claude treats this as an early termination request. The current cycle's state is preserved as-is (no auto-rollback of applied fixes; no auto-commit). Claude prints a short summary: "Skill cancelled at cycle N step M. Applied fixes in this session: <list>. Remaining state: <working-tree dirty | N cycle commits on <branch>>. Manual cleanup may be needed depending on your preferences (git stash, git reset, amend, etc.)." The skill does NOT attempt any destructive cleanup on behalf of the user. Between-prompts cancellation (user Ctrl-C or tab close without an active prompt) falls under the "Conversation context is lost mid-run" bullet.
References
references/focus-text.md— target-kind detection and the canonical code/plan focus text.references/validity-checklist.md— full details of the six validity items.references/summary-samples.ja.md— 日本語で render する場合の summary table / stop signal footer / 終了メッセージ例。skills/review-scope-guard/SKILL.md— scope triage skill invoked at step 10a (DoD collection, 4-category triage, rejected ledger, stop signals).