From gilfoyle
Use when responding to PR review feedback (human or bot) before applying any changes. Treats each finding as a hypothesis to verify, not an instruction to follow. Decides per finding whether the bug claim is real AND whether the proposed fix is right AND whether to accept, modify, or reject. Refuses to apply changes without per-finding verification.
How this skill is triggered — by the user, by Claude, or both
Slash command
/gilfoyle:assessing-review-feedbackThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
A reviewer gave you findings. Maybe a person. Maybe a bot. Maybe ten bots running in parallel. The instinct is to apply them all and move on. Resist it.
A reviewer gave you findings. Maybe a person. Maybe a bot. Maybe ten bots running in parallel. The instinct is to apply them all and move on. Resist it.
Each finding is a hypothesis with two parts:
Both can be wrong independently. The reviewer can be right about the bug but propose a fix that masks a deeper issue. The reviewer can propose a fine fix but the bug doesn't actually exist in your code today (vestigial finding from a tool with stale context). The reviewer can be right about both but the fix conflicts with project conventions you can't see from inside their analysis.
Treating "the reviewer said it" as sufficient grounds to apply a change is regression to the workflow that produced the original bugs. Verification still has to happen, the same way it did in prove-it-prototype and falsifiable-design.
For each finding:
1. Verify the bug claim. Reproduce the alleged problem before accepting it exists.
2. Evaluate the proposed fix on its own merits. Is it the right fix, or a band-aid?
3. Decide: accept, modify, reject. Document why.
4. Apply only after step 3.
No applying without verifying. No "the reviewer said so."
After receiving review feedback on code that has passed gilfoyle/checkpointed-build. Before applying any review-driven changes.
Equally applicable to: human reviewer comments on a PR, bot review output (Claude code-reviewer, Sonar, GitHub Copilot), output from /pr-review-toolkit:review-pr, drive-by suggestions in chat.
Not applicable to: hard CI failures (failing tests, broken builds). Those are facts, not hypotheses.
Each finding falls into one of:
Different categories get different verification standards. A bug claim that can't be reproduced gets rejected. A style claim with no convention to point at is the reviewer's preference, not yours.
Before applying the fix, prove the bug exists:
A reviewer who said "this might fail in production" without naming a reproduction is not a finding. It's a feeling.
Even if the bug is real, the proposed fix may be wrong. Check:
If the reviewer's fix is wrong, write a better one. Document the divergence so the reviewer (when human) understands you considered their suggestion and chose differently.
Three outcomes:
Each decision is documented. One line per finding, kept with the code or PR. Future reviewers (and future you) need to see "we considered finding X and decided Y because Z."
Deferral has a tracker-discipline tax. Any decision that's effectively "real concern, defer to follow-up" — i.e., a Reject (defer) or Modify (deferred work) — must name a tracker ID in the decision log's Note column. The procedure:
Note column. The decision becomes "Reject (defer): tracked at rivets-XXX."rivets create --title "..." --type task --priority N --description "<one-line context + the finding's key claim>". Put the new ID in the Note column.A "defer" decision with no tracker reference is a silent drop. Six months from now, no one knows the finding existed — and the next reviewer is likely to find it again, with both of you re-doing the same triage.
This is the same rule as falsifiable-design's tracker discipline, applied to review-time deferrals.
Standard TDD discipline (gilfoyle/tdd-scoped) for any fix that introduces behavior change. Pure doc/comment edits don't need TDD but still need to be defensible.
If a fix turns out to be wrong during implementation, that's another iteration of step 3. Don't ship a fix you no longer believe in just because you already started writing it.
The output of this skill is a decision log. Markdown is fine. A section in the PR description is fine. A comment thread is fine. The form matters less than the existence.
Example structure:
## Review-feedback decisions
| # | Finding (one line) | Reviewer | Category | Verified? | Decision | Note |
|---|---|---|---|---|---|---|
| 1 | debug_assert on empty prefix → silent in release | silent-failure-hunter | Bug | Yes (read code; release builds skip the assert) | Modify | Runtime guard returning Ok(None); reviewer's instinct right, fix slightly different |
| 2 | Reuse normalize_path helper | code-simplifier + silent-failure-hunter | Style | Yes (db/files.rs:20-27 has identical semantics) | Accept | Two reviewers agreed; eliminates Linux asymmetry |
| ... |
No fix gets applied until:
Note column — either an existing issue whose content covers the deferred work, or a freshly-filed oneIf you can't fill out all five, you're not ready to apply. Keep thinking.
This is not "be contrarian for the sake of pushback." Most well-meaning reviewer findings are real and the proposed fixes are reasonable. The skill exists for the cases where they aren't — and to ensure the author thinks per-finding rather than batch-applying.
This is also not a license to ignore feedback. If you reject a finding, the rationale is part of the decision log. A future reviewer can challenge it. The asymmetry "easy to reject silently, harder to verify rejection" is exactly what causes findings to slip through.
A decision log committed to the repo or attached to the PR. Plus the actual code changes for accepted / modified findings, each in a separate commit (so the decision log lines up with git history).
If no findings were applied, the decision log alone is the output, and the reasoning is the artifact.
npx claudepluginhub dwalleck/gilfoyle --plugin gilfoyleProvides a checklist for code reviews covering functionality, security, performance, maintainability, tests, and quality. Use for pull requests, audits, team standards, and developer training.