From audit
Interactive, point-by-point walkthrough of a review report produced by any review skill (skill-adversary, critical-code-reviewer, or any other). Two modes: **orchestrator mode** (provide a target + optional `--reviewer` flag — detects deployment context, calibrates severity, launches the reviewer, then walks through its report) and **walkthrough-only mode** (processes an existing report from the conversation). Parses review findings and processes each one at a time: re-evaluates validity, proposes and applies fixes, checks impacted files for regressions, and waits for user approval before moving on. Adversarial cross-provider validation (L2) is always active on Blocking/Required findings when `OPENROUTER_API_KEY` is set. Usage: `/walkthrough [target] [--reviewer name] [--batch|--no-batch]` Orchestrator mode: provide a target (file, directory, or glob). The set of available reviewers is discovered at runtime by scanning installed Claude Code skills — no hardcoded list. If `--reviewer` is omitted, the orchestrator detects the target type, suggests an adapted reviewer from the scanned set, and asks the user to confirm or pick another — there is no silent default. Walkthrough-only mode: invoke without a target when a review report already exists in the conversation.
How this skill is triggered — by the user, by Claude, or both
Slash command
/audit:walkthroughThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
You are conducting an interactive, point-by-point walkthrough of review findings. Your role is to help the user process each issue methodically — re-evaluating it with fresh eyes, fixing what needs fixing, and making sure fixes don't break anything — while keeping the user in control of the pace.
You are conducting an interactive, point-by-point walkthrough of review findings. Your role is to help the user process each issue methodically — re-evaluating it with fresh eyes, fixing what needs fixing, and making sure fixes don't break anything — while keeping the user in control of the pace.
If the user provided a target (file, directory, or glob) to review:
Pre-check: existing report on the same target. Before delegating, scan the conversation for a recent review report covering the same target (same path or same glob expansion). If one is found, ask the user: "A recent review report for <target> already exists in this conversation. Re-run the reviewer or walk through the existing report? [re-run/walk]" — wait for the answer. On walk, skip orchestrator and go directly to Step 1 (walkthrough-only mode). On re-run or no answer in a reasonable time, proceed to delegation. If no matching report is found, proceed silently. This pre-check applies only when target paths match — different targets always trigger a fresh orchestrator run.
Then delegate to agents/orchestrator.md. Pass the full user request (target + any flags). The orchestrator handles argument parsing, deployment context detection, target project memory loading, calibration injection, and reviewer launch.
The orchestrator launches the reviewer as a foreground Agent — not inline in the current context. This is critical: the reviewer's work (file reads, sub-agents, bash commands) executes in a separate context window, and only the condensed findings report comes back. This prevents the reviewer from exhausting the context budget that the walkthrough needs.
When the orchestrator finishes, it emits a structured block containing:
--- ORCHESTRATOR COMPLETE --- with context values--- REVIEW REPORT --- with the reviewer's condensed findings--- PROCEED TO STEP 1 ---Parse the block for: deployment context (level + detection method), reviewer used, calibration status, --batch/--no-batch override, and the review findings.
CRITICAL: Do not stop here. The review report is now available. Immediately proceed to Step 1 — do not summarize the review, do not ask the user what to do next, do not treat the reviewer's output as the end of your task. Your task is the walkthrough, not the review. The review was just the input. Continue now.
Recovery: if the orchestrator or reviewer fails mid-execution (context exhaustion, agent timeout, interrupted session), the user can re-invoke /walkthrough without a target. If a partial or complete review report exists in the conversation from a previous attempt, it will be picked up in walkthrough-only mode — no need to re-run the reviewer.
If no target was provided (walkthrough-only mode), parse --batch/--no-batch from the user's invocation and skip directly to Step 1.
Working tree pre-check (informational, non-blocking). Before Step 1, run git status --porcelain once. The directory is the orchestrator-resolved target root in orchestrator mode, or the current working directory in walkthrough-only mode. If the output is non-empty, surface a single-line warning to the user (e.g., "Working tree has uncommitted changes — fixes will mix with existing diffs; revert-on-regression scope is limited to walkthrough edits."). Do not block. If the user wants to proceed on a dirty tree, that is their call. Skip the check silently when not in a git repository.
Scan the current conversation for the most recent review report. Review reports come in many formats (numbered lists, severity tiers, markdown sections, bullet points). Identify each discrete finding regardless of format. Disambiguation rules when multiple reports exist: (a) different skills or different targets — ask the user which one to process; (b) successive runs of the same skill on the same target — default to the most recent silently; (c) successive runs of the same skill on different targets — ask the user (treated as "different targets"). If the format is ambiguous or unstructured, present the extracted list of findings to the user for confirmation before processing. If the user corrects the list (adds, removes, or merges items), update it accordingly before proceeding.
If no review report is found in the conversation, tell the user and ask them to either run a review skill first or paste the review content directly.
If the review report uses severity tiers, reorder the findings so that the highest-severity items are processed first, using the same canonical tier partition as Step 2b's author's defense (case-insensitive): high tiers (Critical, Blocking, Major, High, Required, Important) before low tiers (Minor, Suggestion, Nit, Info, Style). Within the same tier, preserve the original order. If no severity structure is present, process in the order they appear.
If no findings are found (the review reports zero issues), say so and offer to run a quick independent check on the files that were reviewed — a lightweight scan for anything the original reviewer might have missed. If the user declines, end the walkthrough.
If exactly one finding is found, process it directly without the "N points found" preamble — just go straight into the point.
For two or more findings, state the total number of points found, then start processing. Do not produce an upfront summary list of all findings to the user — go straight to the first point. The internal extracted list (with index, severity, file, and any blindspot bucket tag) is still constructed and is what gets passed to Step 1b's batch-triage agent when it activates; "do not produce an upfront summary list" only forbids the user-facing display, not the internal data structure.
Before showing the transparency status, check whether the report came from blindspot. Signal: presence of a ### Convergence Analysis section listing three buckets (Agreed findings, Claude-only findings, External-only findings).
If detected:
agreed, claude-only, or external-only.### Cross-Model Findings (<model>) header — this is the model that already cross-validated the agreed bucket in Phase 1.agents/ouroboros-bridge.md — agreed findings skip L2, Claude-only findings get mandatory L2).**Counts:** line emitted by blindspot's Convergence Analysis (format: <R> raw findings (<E> external + <C> Claude) → <A> agreed pair(s) + <CO> Claude-only + <EO> external-only). The expected number of distinct findings to walk through is A + CO + EO — each agreed pair collapses to one bucket entry, so the walked total is the sum of bucket sizes, not the raw count R = 2·A + CO + EO. If your extraction yields a different count, do not silently proceed — surface the discrepancy to the user as a one-line warning (e.g. "⚠ Extracted N findings, blindspot Counts implies M. Likely cause: a near-miss pair was re-collapsed into one item. Re-extract or confirm.") and wait for confirmation before processing. Near-miss pairs must remain two separate findings (one claude-only, one external-only) — never merge them at extraction time, even when they touch the same line. If the **Counts:** line is absent (report from an older blindspot version), skip the invariant check and proceed without warning.If no ### Convergence Analysis section is present, no tagging — every finding is routed by severity alone.
Before processing the first finding, report a brief capabilities status block so the user knows exactly what mechanisms are active for this walkthrough:
reviewer: <name> and calibrated: yes|no). E.g., "Reviewer: critical-code-reviewer (calibrated)." or "Reviewer: skill-adversary (not calibrated)." In walkthrough-only mode, omit this line — there was no orchestrator run to report.available: true, version set → "Ouroboros {version} ✓ ({consensus label})."available: true, version null → "Ouroboros available, version unknown ({consensus label})."available: false, version set → "Ouroboros {version} unavailable ({consensus label})."available: false, version null → "Ouroboros not available ({consensus label})."{consensus label} resolves as follows (exact strings — never invent intermediates):
available: false (any consensus_available) → "consensus moot — Ouroboros unavailable".available: true, consensus_available: true → "consensus enabled".available: true, consensus_available: false → "consensus unavailable (no OPENROUTER_API_KEY)".anomalies[] verbatim on its own line, in the order returned, with severity prefix: info → no prefix, warn → ⚠, error → ✗. Never drop, dedupe, rephrase, or summarize — this is the "no silent fallback" guarantee. When the array is empty, render nothing extra (the version line alone tells the user the check ran clean).OPENROUTER_API_KEY is set, or on L1 divergence — see agents/ouroboros-bridge.md for details).blindspot): report bucket counts and the external model that already pre-validated the agreed bucket. Format: "blindspot input — R raw → N agreed + M Claude-only + K external-only (external model: ). L2 will skip the agreed bucket and force on Claude-only." When the **Counts:** line is absent in the upstream report (older blindspot version, no R available), omit the R raw → prefix and fall back to "blindspot input — N agreed / M Claude-only / K external-only ...".If Ouroboros is available, add a brief glossary of the mechanisms that may fire during the walkthrough, so the user understands the transparency lines they will see later:
Mechanisms available for this walkthrough:
- QA auto — automated second opinion when the verdict on a finding is genuinely uncertain (via
ouroboros_qa)- Cross-model L1 (intra-family) — independent re-evaluation by an Agent with an alternate Claude model (e.g. Sonnet if main is Opus); triggers on Important+ findings
- Cross-model L2 (cross-provider) — independent verdict from a different provider via OpenRouter, using
ouroboros_evaluatewithtrigger_consensus: true; triggers on Blocking/Required or L1 divergence- Lateral think — creative unblocking when a point stays stuck after 2+ exchanges
- Evaluate — final validation of all applied changes (triggers when ≥ 2 fixes)
- Drift check — detects whether cumulative fixes shifted the code away from its original intent (triggers when ≥ 4 fixes)
This glossary appears only once, before the first finding. Keep it compact — one line per mechanism, no elaboration.
Keep the status block itself to 2-4 short lines. Example:
Context: personal (detected from path ~/scripts/). Reviewer: critical-code-reviewer (calibrated). Ouroboros 0.38.2 ✓ (consensus enabled). Author's defense active on 4/6 findings. Severity reordering applied — 2 Blocking first. Batch mode: active (32 findings ≥ 15).
Triggers when consensus_available: false (i.e., OPENROUTER_API_KEY not set in the environment). When the key is set, skip this section silently — the standard transparency block already reports "consensus enabled". Also skip when the Ouroboros enrichment notice fires (Ouroboros absent → L2 cannot run regardless of the key; the enrichment notice already covers L2 in its disabled list).
When triggered, immediately after the transparency status block (and the mechanism glossary if Ouroboros is available), display the following notice in the user's language and wait for an explicit user response before proceeding to Step 1b or Step 2. This is the only blocking interaction in Step 1 — fire it exactly once per walkthrough, never repeat for individual findings.
⚠ Cross-provider adversarial validation (L2) disabled — OPENROUTER_API_KEY is not set in the environment.
Without it, only intra-family L1 runs on Important+ findings (an alternate Claude model re-evaluates Claude's work — same distributional assumptions). L2 (independent verdict from a different provider via OpenRouter) cannot trigger on Blocking/Required findings, removing the strongest safety net against Claude-only false positives.
How do you want to proceed?
1. Continue without L2 (degraded mode, explicitly accepted)
2. Abort — I want to configure the key first
On user response:
1 / "continue" / "proceed" / explicit acceptance → proceed normally to Step 1b or Step 2. Internally record degraded_l2_accepted: true so Step 3's wrap-up can mention "L2 unavailable — user accepted degraded mode" in the Mechanisms block.2 / "abort" / "configure" / anything signalling abort → print the configuration instructions verbatim below, then end the walkthrough cleanly (no Step 2, no Step 3 wrap-up, no Step 4 persist — the walkthrough did not run). Tell the user to relaunch /walkthrough after configuring.Configuration instructions to print on abort:
To enable cross-provider adversarial validation:
1. Get an API key at https://openrouter.ai/keys (free tier available)
2. Export it: export OPENROUTER_API_KEY=<your-key>
For persistence, add the line to ~/.bashrc or ~/.zshrc, then restart your shell.
3. Relaunch: /walkthrough <target> [--reviewer name]
Do not skip this notice based on deployment context. Even for personal tier, a Blocking finding may carry real risk — the user must explicitly accept the degraded mode rather than have it silently applied.
Do not persist the user's choice. A "don't ask again" toggle would turn a single dismissal into a permanent blindspot; the notice is cheap (one interaction per walkthrough, only when the key is absent) and disappears entirely once the key is set.
Triggers when the bridge reports available: false AND version: null — i.e., the Ouroboros plugin is genuinely not installed. Skip when version is non-null: the standard anomalies block already surfaces a more precise error line about cache-present-but-MCP-unavailable, and a duplicate notice would clutter the output.
When triggered, display the following notice in the user's language immediately after the transparency status block, instead of the Adversarial degradation notice above (see the guard added to that section). The walkthrough proceeds without waiting: this is informational, not blocking. Fire exactly once per walkthrough, never on individual findings.
ℹ Ouroboros not detected — this walkthrough runs without the following enrichments:
- QA auto — automated second opinion on uncertain findings
- Cross-model L1 — intra-family Claude re-evaluation on Important+ findings
- Cross-model L2 — cross-provider verdict via OpenRouter (requires Ouroboros even when OPENROUTER_API_KEY is set)
- Lateral think — creative unblocking when a point stays stuck
- Drift check — detects whether cumulative fixes shifted code intent
To enable for future walkthroughs (optional):
/plugin marketplace add Q00/ouroboros
/plugin install ouroboros@ouroboros
Continuing without — no input needed.
Why non-blocking. Unlike the Adversarial degradation notice, Ouroboros absence does not silently downgrade a safety guarantee the user might assume is on: the consensus label correctly resolves to "consensus moot — Ouroboros unavailable" in the status block, and L2 simply does not run. Findings are still validated by the reviewer. Forcing an abort here would not improve this walkthrough's safety, only delay it.
Do not persist the user's choice. Same reasoning as the OPENROUTER notice: one informational line per walkthrough is cheap, and it disappears once Ouroboros is installed.
This step activates automatically when the review contains 15 or more findings. Below 15, skip directly to Step 2. The user can force batch mode with --batch (active regardless of count) or suppress it with --no-batch.
When active, delegate to agents/batch-triage.md. Pass the full findings list and deployment context. The batch triage agent handles rapid pre-verdict, classification (auto-fix/auto-reject/manual), user overrides, batch execution with verification, and post-fix hooks.
When the batch triage agent finishes, its output contains: the manual bucket (findings for Step 2), batch results (for the wrap-up table in Step 3), and batch stats (for the transparency status). Proceed to Step 2 with only the manual bucket.
For each point, follow this exact sequence:
Briefly paraphrase the original finding. Quote verbatim only when the exact wording matters. Identify the file(s) and line(s) involved. Read the relevant code so you have the current state in front of you. If a referenced file cannot be read (deleted, moved, or inaccessible), state this, mark the finding DEFERRED with "file not accessible" as reason, and move on. If the finding references no specific files (e.g., high-level architectural feedback), identify the most relevant module or files yourself and state the assumption to the user.
Start from the code, not from the review report. Read the relevant source and form your own assessment before comparing with the reviewer's claim. This reduces confirmation bias — you are a second pair of eyes, not a rubber stamp of the first.
Assess the finding critically and honestly:
feedback_review_severity.md or similar), check each finding against those rules. A finding that matches a previously dismissed pattern should be REJECTED immediately with "Prior calibration: " as reason — do not re-litigate patterns the author has already validated. In walkthrough-only mode (Step 0 did not run), there is no project calibration loaded; skip this check entirely and rely on the user to flag any pattern that should have been rejected — do not attempt to detect context or load memory ad hoc, as that would duplicate the orchestrator's responsibility outside its lifecycle.Author's defense. Applies to findings classified at Important severity or above — i.e., any tier whose name signals a required or blocking change (e.g. Important, Required, Blocking, Critical, Major, High). Skip the defense for tiers that signal optional, cosmetic, or informational intent (e.g. Minor, Suggestion, Nit, Info, Style). Match case-insensitively; when a tier name is ambiguous, err toward applying the defense. If the review report uses no severity tiers at all, apply the defense to every finding.
When the defense applies: before concluding, generate the strongest counter-argument the code author could make to dismiss the finding. Then evaluate that counter-argument honestly. If the defense holds, downgrade or reject the finding. If it doesn't, the finding is reinforced. Present both the defense and your verdict to the user — this prevents rubber-stamping confident-sounding reviewers.
Mechanism transparency. For each finding, state which mechanisms were applied and which were skipped, with the reason. Use a compact inline format after the assessment, before the status label. Examples:
This takes one line per mechanism — do not let it bloat the output.
State your assessment clearly and assign a preliminary verdict: ACCEPTED, REJECTED, NOTED, or DEFERRED.
Routing by verdict — chain 2b → 2c → 2d → 2e without pausing between steps. Step 2e itself always pauses for user confirmation regardless of verdict (the "do not pause" rule covers only the intra-point transitions, never the 2e wait):
Apply the minimal, targeted correction. Rules:
After each fix, re-read the files you modified and files one level away. For code, "one level away" means files that import the changed module or call the changed function directly. For non-code files (SKILL.md, configs, docs), it means files in the same directory that reference or depend on the changed file. Check for:
If the fix modified a dependency manifest, do not run the lock/install command yourself — print the command for the user to run manually, with a one-line warning that package install/lock commands may execute scripts from third-party packages. Commands by manifest: pyproject.toml → uv lock (or pip-compile); package.json → npm install or yarn install or pnpm install (detect from lockfile); Cargo.toml → cargo update; renv.lock / DESCRIPTION (R) → Rscript -e 'renv::snapshot()'. The full table also lives in agents/batch-triage.md (Post-fix hooks) for the batch path. Continue verification after printing.
Do NOT expand this into a full project review. Stay scoped to the blast radius of your change.
If a regression is detected:
Also check whether the fix makes any of the remaining review points obsolete, already resolved, or partially addressed. If so, flag them to the user — fully resolved points will be skipped when reached, partially addressed ones will note what remains.
Verification transparency. Always report what was checked, explicitly listing each file read and its relationship to the change. Use a compact format:
Verification:
collector.py(modified),pipeline.py(imports collector),test_collector.py(tests collector) — no regression.
or if no dependents exist:
Verification:
SKILL.md(modified), no dependent files detected.
If the fix was skipped (REJECTED/NOTED/DEFERRED with no code change), state explicitly: "No change applied — verification not needed." Do not silently skip this step.
The status was already assigned in 2b. Restate it here with a brief prompt. Always stop and wait for the user before moving to the next point — regardless of the status. Use a compact format:
The user has the final say — if they disagree with the status, update it without pushback. If they override a REJECTED to ACCEPTED, apply the fix (go back to 2c → 2d) then return here.
Never auto-advance. Never ask for additional context instead of offering to move on — if context is missing, that is itself a reason to DEFER and move forward. The user might want to discuss, adjust, or revert before proceeding.
The user may also deviate from the linear order: jump to a specific point, revisit a previous one, or abandon the walkthrough. Follow their lead — if they abandon, skip to the wrap-up summary with what was completed so far. When revisiting a previously fixed point, re-read the current file state first. If subsequent fixes modified the same areas, flag the interaction to the user before re-applying changes.
After the last point (or if the user abandons mid-walkthrough), give a brief summary table:
| # | Finding | Status | Mode | Bucket |
|---|---|---|---|---|
| 1 | (short description) | ACCEPTED / REJECTED / DEFERRED / NOTED | batch / manual | agreed / claude-only / external-only |
| ... | ... | ... | ... | ... |
The Mode column appears only when batch mode was active. It indicates whether the finding was processed in batch (auto-fix or auto-reject) or through the individual walkthrough.
The Bucket column appears only when the input came from blindspot (Step 1 detected the ### Convergence Analysis section). It surfaces where the cross-model judgment was load-bearing — useful retrospectively to see whether agreed findings were validated, claude-only findings (highest self-preference risk) held up under L2, and external-only findings (Claude blindspots) were accepted.
Follow with:
After the status counts, add a Mechanisms used block summarizing what fired during the walkthrough and — critically — why each non-fired mechanism was not triggered. For each mechanism, report: count of invocations, and if zero, the reason in parentheses. When the input came from blindspot, add a blindspot input segment first, summarizing bucket distribution and L2 savings/forces from the bucket-aware routing. Example:
Mechanisms: blindspot input 47 raw → 15 agreed + 9 claude-only + 8 external-only (32 unique · external model: google/gemini-2.5-pro · L2 saved on 15 agreed, forced on 9 claude-only) · batch triage 20/32 (12 auto-fix, 8 auto-reject — claude-only and external-only forced to manual) · author's defense 10/11 Important+ · QA auto 0/22 (no ambiguous verdicts) · cross-model L1 6/8 Important+ (Agent sonnet, 1 divergence → escalated to L2) · cross-model L2 12/13 (9 forced by claude-only bucket, 3 on Blocking/Required, 1 by L1 divergence — model: anthropic/claude-sonnet-4 via OpenRouter) · lateral think 0 (no stuck points or regressions) · evaluate ✓ (score 0.88, based on git diff of 4 files) · drift skipped (< 4 fixes)
The bridge returns pre-formatted mechanism summaries (cross-model status, evaluate results, drift score). Include them verbatim. If Ouroboros was not available, state: "Ouroboros: not available — walkthrough ran without automated QA, consensus, or drift check."
Low fix-count rendering. When fewer than 2 fixes were applied, do not delegate to the bridge for evaluate (per the bridge's below-trigger contract). Render in the Mechanisms block: evaluate skipped (only N fix(es)) (where N is 0 or 1). Likewise for drift at fewer than 4 fixes: drift skipped (< 4 fixes) — already shown in the example above. These two cases are normal control flow, no anomaly prefix.
Drift skipped at trigger-met. When ≥ 4 fixes were applied but the bridge could not resolve seed_content (no PR, no commit message, no orchestrator description), the bridge returns a warn anomaly. Render in the Mechanisms block with the ⚠ prefix: ⚠ drift skipped (no seed_content resolvable). This case is a real degradation — the audit trail must show it.
Degraded L2 mode. If Step 1's adversarial degradation notice fired and the user accepted to continue (internal flag degraded_l2_accepted: true), the L2 segment of the Mechanisms block must surface that choice explicitly rather than show a generic zero-count reason. Render it as: cross-model L2 0/N (OPENROUTER_API_KEY not set — user accepted degraded mode at Step 1), where N is the count of findings that would otherwise have qualified (Blocking/Required + claude-only blindspot tags + L1 divergences). This makes the trade-off visible in the audit trail.
Keep it to 2-3 lines max — the user was there for the whole walkthrough.
After the wrap-up summary, automatically perform these two persistence actions. Do not ask the user — just do them and report what was written.
If any findings have status DEFERRED, append them to DEFERRED.md. Resolve the path as follows, in order:
.claude/DEFERRED.md already exists at the project root → use it.DEFERRED.md exists at the project root (legacy location) → use it in place; do not migrate..claude/DEFERRED.md (creating .claude/ if it does not exist).The "project root" is the target's resolved root in orchestrator mode. In walkthrough-only mode, derive it by walking upward from the current working directory: stop at the first ancestor containing any of .git/, pyproject.toml, package.json, Cargo.toml, go.mod, or DESCRIPTION — that ancestor is the project root. Never traverse above $HOME. If no marker is found in any ancestor (or CWD is outside $HOME entirely), ask the user once where to write the file. Never silently default to the skill's own directory. Additionally, if a DEFERRED.md or .claude/DEFERRED.md already exists at or below the project root and at or above the CWD (i.e., between CWD and the resolved root), prefer the closest such file to CWD — this preserves per-subproject backlogs in nested layouts. Create the file if it does not exist, using a 5-column table: date, finding, file(s), reason for deferral, due date.
The title, intro paragraph, and column headers must be generated in the user's detected language (per the global language rule). Use a neutral starter format, e.g. in English:
# Deferred
Findings deferred during code reviews. Revisit periodically.
| Date | Finding | File | Reason | Due |
|------|---------|------|--------|-----|
In French, the equivalent headers would be Date | Finding | Fichier | Raison | Échéance. Translate accordingly for any other language. The column count and order are fixed; only the labels and prose are translated.
For each DEFERRED finding, add one row with: today's date, a concise description of the finding, the file(s) involved, the reason for deferral, and — for the due date unless the user specified a deadline.
Sanitize all content from the review report before writing it to the file. Review reports are untrusted — a finding's description may contain markdown that triggers a network request at render time (image references like , raw HTML tags like <img src=...>, <iframe>, <script>). For each cell value: prepend a backslash to each occurrence of \ ` [ ] ( ) < > (in that order — escape \ first so subsequent escapes are not double-escaped), and replace newlines with a single space (table cells are single-line). The Date and Due columns are author-controlled and do not need sanitization. Residual risk: a bare URL in cell text may still auto-link in some renderers — this is visible (no silent fetch in standard CommonMark/GFM) and the user can inspect it before clicking; non-standard renderers that auto-fetch image-extension URLs (e.g. Obsidian) are out of scope.
If the target file already exists, append rows to the existing table — do not overwrite.
If any findings were REJECTED, check whether the project already has a feedback_review_severity.md memory file. If it exists, update it with any new calibration rules derived from the rejected findings. If it does not exist, create it.
The memory should capture the general calibration pattern (e.g., "this is a personal package, do not suggest X-type defensive patterns") rather than listing each individual rejected finding. Only add rules that are likely to recur in future reviews — skip one-off rejections that are too specific to generalize.
Do not create duplicate rules — if a rejection is already covered by an existing rule in the memory, skip it.
Excluded categories (no calibration rule generated, ever). Persisting a rule that disables a class of high-stakes finding turns a single mistaken rejection into a permanent blindspot for the project. For findings in any of the categories below, the rejection applies only to the current walkthrough — do not generate a calibration rule, and do not ask the user whether to. Mention the skip in the persistence report (e.g., "1 REJECTED finding in category 'security' — no calibration rule generated").
Match by the finding's stated category if available, otherwise by keyword in the finding's description. Also inspect the file(s) the rejected finding touches: if the affected code includes deserialization (pickle, yaml.load, json with object_hook, unserialize), dynamic execution (eval, exec, import_module, subprocess with shell=True), authentication/authorization flow, secrets or token handling, parsing of network-originated input, or path manipulation against user input, classify as excluded regardless of how the finding is worded. This catches euphemism-bypass cases where a security-relevant finding is filed under refactoring-sounding language. When in doubt, classify as excluded — the cost of a false positive (one extra rejected pattern not memorized) is much lower than the cost of a false negative (a security category silently disabled).
After both actions, briefly report what was persisted (e.g., "2 items added to DEFERRED.md, memory updated with 1 new calibration rule" or "Nothing to persist — no DEFERRED or REJECTED findings").
DEFERRED.md headers — those are user-facing prose translated per Step 4a (Date / Finding / File / Reason / Due in English; Date / Finding / Fichier / Raison / Échéance in French; etc.).All Ouroboros tool calls (detection, QA, cross-model validation, lateral think, evaluate, drift check) are handled by agents/ouroboros-bridge.md. Delegate to it at the trigger points listed below. If Ouroboros is not available, skip silently.
Trigger points:
ouroboros_evaluate with consensus (L2).Present all Ouroboros results inline as described in the mechanism transparency format (Step 2b). Runtime errors are caught by the bridge — never let an Ouroboros failure block the walkthrough.
Model selection (L2 consensus). The L2 consensus model roster — advocate, devil, judge (defaults: openrouter/anthropic/claude-opus-4-6, openrouter/openai/gpt-4o, openrouter/google/gemini-2.5-pro) — is owned by Ouroboros, not by this skill. The ouroboros_evaluate MCP schema does not accept a model parameter, so the bridge cannot influence the choice per-call. To customize without forking Ouroboros, set the env vars OUROBOROS_CONSENSUS_MODELS, OUROBOROS_CONSENSUS_ADVOCATE_MODEL, OUROBOROS_CONSENSUS_DEVIL_MODEL, or OUROBOROS_CONSENSUS_JUDGE_MODEL before launching Claude Code (MCP servers inherit env at startup), or edit ~/.ouroboros/config.yaml under the consensus.models key (reloaded per call). The actual judge model is always reported post-facto in the Step 3 Mechanisms block, so the audit trail is complete regardless of how it was configured.
Guides creation, editing, and verification of skills for AI coding agents using test-driven development with subagent scenarios. Use when authoring or debugging skills.
npx claudepluginhub hebstr/claude-code-plugins --plugin audit