From Wingman
Run cross-model review, categorize findings, fix issues, and encode learned patterns into project rules
How this skill is triggered — by the user, by Claude, or both
Slash command
/wingman:review-loopThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
Run the full review feedback loop: review code with the configured reviewer
Run the full review feedback loop: review code with the configured reviewer
(WINGMAN_REVIEWER, default codex), present findings for user decision, fix
issues, and encode learned patterns so they don't recur.
Whenever the agent runs git push in a repo that has Wingman installed
(detectable by the presence of the # --- Wingman: Codex review marker
in the repo's pre-push hook), the review runs in the background and
finishes ~60-120 seconds later. The agent MUST:
git push completes
(the hook is non-blocking — don't wait for the review).ScheduleWakeup(delaySeconds: 120))
that re-enters this skill..reviews/*.json for the latest file whose status
is needs_categorization; if present, proceed to Step 1 below and
render the findings table inline in chat, unprompted.This makes the review an ambient part of the push flow, not a manual step the developer has to remember.
The current Wingman hook version is 4. Before gathering findings, read the
installed pre-push hook (the file containing the # --- Wingman: Codex review
marker — check git config core.hooksPath, then .git-hooks/, .githooks/,
.husky/, .git/hooks/) and find its # wingman-hook-version: N stamp.
If N is missing or less than 4, surface a one-line, non-blocking notice
before continuing:
⚠️ Your Wingman hook is v
N(latest is v4). Run/wingman:review-setupto upgrade — it's in-place, keeps exactly one hook block, and preserves your.reviews/data.
Then proceed normally — never block the review on a stale hook. (Updating the Wingman plugin does not update the git hook; this is the reminder to re-run setup.)
Check .reviews/ for files with "status": "needs_categorization".
If uncategorized findings exist, use those. If none exist (or .reviews/ is empty), run a fresh review:
WINGMAN_REVIEWER, default codex) to review the diff between the current branch and main. For codex, use /codex:rescue or codex review; for gemini/claude the pre-push hook constructs the prompt automatically (or invoke the CLI directly with the same instructions)..reviews/<timestamp>-<branch>.json in the standard formatwingman_schema_version: "2"){
"wingman_schema_version": "2",
"branch": "feature-x",
"timestamp": "2026-04-27-104051",
"base": "main", // git base used for the diff (WINGMAN_BASE)
"reviewer": {
"tool": "codex", // reflects WINGMAN_REVIEWER (codex|gemini|claude)
"tool_version": "0.121.0", // reviewer CLI version
"model": "gpt-5.5", // model that did the review
"provider": "openai",
"reasoning_effort": "medium",
"session_id": "019dce11-...",
"wall_seconds": 87 // time codex spent generating the review
},
"raw_review": "...full codex output, kept for forensics...",
"findings": [], // populated by /review-loop categorization
"resolutions": [], // populated when fixes land
"status": "needs_categorization"
}
Older files written by wingman v1 (no wingman_schema_version field) are
upgraded by python3 scripts/migrate-reviews.py from the wingman repo.
Consumer code should branch on the presence of wingman_schema_version
to handle both shapes.
For each uncategorized review file, parse the raw_review field and extract individual findings. Categorize each as:
| Category | Description |
|---|---|
| lint | Code style, formatting, naming |
| logic | Correctness, edge cases, bugs |
| architecture | Structure, coupling, abstractions |
| security | Vulnerabilities, injection, auth |
For EACH finding, Claude must also provide its own independent assessment in a "Claude" column:
Present the full overview table sorted by file path (alphabetical) then line number.
Table formatting rules:
01/16, 02/16, etc.· Header ·Example:
| · # · | · Severity · | · Category · | · File · | · Claude · |
|-------|----------------|--------------|------------------------------------------------|--------------------------------------------------------------------|
| 01/16 | 🔴 CRITICAL | security | `hello_world.py:114` | Agree — eval() is never safe on user input |
| 02/16 | 🟠 HIGH | security | `hello_world.py:22` | Agree — classic SQL injection |
| 03/16 | 🟠 HIGH | security | `hello_world.py:60` | Agree — same pattern as #2 |
| ... | ... | ... | ... | ... |
| 14/16 | 🟡 MEDIUM | architecture | `hello_world.py:34` | Disagree — test file, refactoring adds no value |
| 15/16 | 🔵 LOW | lint | `hello_world.py:81` | Agree — ruff already catches this with F841 |
After the table, show the action bar:
all fix everything · agreed fix only where Claude agrees · one walk through 1-by-1 · defer mark non-actionable with rationale · per-row e.g.
01:fix 02:defer 03:fix· none done
Users may reply with a space-delimited list pairing each finding number
with an action (fix, defer, skip). Examples:
02:fix 03:fix 01:defer — fix two, defer oneall:fix — same as the top-level allall:defer — defer everything (record rationale for each)When the user defers a finding, always prompt once for a one-line
rationale unless they included it inline (e.g. 01:defer="hifi-aligned").
That rationale lands in the review JSON AND in the rules file (Step 4),
so future reviews know not to re-flag the same pattern.
Spawn a background Agent to fix all findings in parallel:
fix(review): <description> with Wingman-finding: securityfix(review): <description> with Wingman-finding: logicfix(review): <description> with Wingman-finding: architecturechore(lint): add rule for <pattern>.claude/rules/review-patterns.md (Step 4)Same as "all" but only fix findings where Claude's assessment is "Agree" (including "Agree, bump severity"). Skip findings where Claude said "Disagree" or "Skip". Run as background agent.
Walk through each finding sequentially. Present each finding using this exact format:
Example:
| · # · | · Severity · | · Category · | · File · | · Claude · |
|-------|----------------|--------------|------------------------------------------------|--------------------------------------------------------------------|
| 01/16 | 🔴 CRITICAL | security | `hello_world.py:114` | Agree — eval() is never safe on user input |
```diff
- result = eval(user_input)
+ try:
+ result = int(user_input)
+ except ValueError:
+ print("Invalid input"); sys.exit(1)
```
> **y** fix · **n** skip · **all** fix remaining · **stop** done
Format rules:
--- between findingsWhen user says "y":
When user says "n":
When user says "all":
When user says "stop":
defer):Don't modify code. For each deferred finding:
resolution: "deferred" and add
rationale: "<user's reason>"..claude/rules/review-patterns.md under
a dedicated ### Known false positives / deferrals subsection within
the appropriate category (Logic / Architecture / Security Rules), so
future reviews from the same reviewer model don't re-flag the pattern.
Include the commit SHA where the deferred state was recorded.Deferrals are a first-class resolution — they are NOT "skip". Skip means "not applicable to this repo"; defer means "legitimate pattern, but the reviewer doesn't have the context to understand why it's correct".
Skip all fixes. Proceed directly to Step 4 to encode patterns for future prevention.
When fixing a lint finding (whether individually or in bulk mode):
pyproject.toml with [tool.ruff] → ruff.eslintrc* or eslint.config.* → eslintbiome.json → biomechore(lint): add rule for <pattern>For non-lint findings that were fixed (not skipped), add new entries to .claude/rules/review-patterns.md under the appropriate section (Logic Rules, Architecture Rules, or Security Rules).
Each pattern entry should be a single clear sentence that Claude Code can follow. Include enough context to apply the rule but keep it concise. Example:
## Logic Rules
- Always check for None before accessing `.metadata` on Lead objects — the field is nullable but commonly assumed present
- QuerySet.update() calls must explicitly set `updated_at=timezone.now()` since it bypasses Model.save()
Cap enforcement: If the file exceeds 30 patterns, remove the oldest entries from each section. Mention which patterns were archived in the summary.
Commit pattern updates separately:
docs(review): add learned patterns from Wingman review
For each processed review file:
findings array with structured entries:
{
"file": "campaigns/services/lead_service.py",
"line": 42,
"category": "logic",
"severity": "medium",
"description": "Missing null check on lead.metadata",
"resolution": "Added guard clause in process_lead()",
"status": "fixed",
"commit": "abc1234"
}
"status": "skipped"."status": "deferred" and include a
"rationale" field with the user's one-line reason. Also include a
"pattern_rule" field pointing at the heading in
.claude/rules/review-patterns.md where the deferral was encoded,
so future review passes can cross-reference.status to "categorized".Print a summary:
Wingman Review Summary
======================
Total findings: N
- Critical: N | High: N | Medium: N | Low: N
Fixed: N
- Security: N | Logic: N | Architecture: N | Lint: N
Deferred: N (with rationale encoded to rules)
Skipped: N
New linter rules added: N
New patterns encoded: N (of 30 max)
Patterns archived: N
Commits created:
- abc1234 fix(review): ...
- def5678 chore(lint): ...
- ghi9012 docs(review): ...
Provides a checklist for code reviews covering functionality, security, performance, maintainability, tests, and quality. Use for pull requests, audits, team standards, and developer training.
npx claudepluginhub ashbrener/wingman --plugin wingman