Skill

gh-adversarial-review

Use before tagging a release or merging important work, when the user wants a thorough, multi-agent, adversarial review that won't rubber-stamp the code. Triggers on "adversarial review", "pre-release review", "review before tagging", "multi-agent review", "red-team this", "tear this apart", "find every bug before we ship", "verify findings", "review with multiple reviewers", "is this really ready to release". Runs N reviewers across distinct dimensions, then adversarially verifies each finding before acting — fixing real issues with regression tests, recording declined findings with reasons. Run it as the gate before `v1.0.0`.

Invocation

How this skill is triggered — by the user, by Claude, or both

Slash command

/gh-toolkit:gh-adversarial-review

User invocable

Model invocable

Inline context

Default effort

Context Preview

The summary Claude sees in its skill listing — used to decide when to auto-load this skill

A pre-release gate that finds real problems and refuses to act on fake ones. The defining move:

SKILL.md

83 lines · ~956 tokens

Stats

LanguageShell

Stars0

MaintenanceExcellent

Last CommitJun 18, 2026

Actions

View Source View Plugin View on GitHub View README

Stats

Actions

gh-adversarial-review

A pre-release gate that finds real problems and refuses to act on fake ones. The defining move: every finding is adversarially verified before it's fixed, and every fix gets a regression test. No finding is trusted just because a reviewer is confident.

When to run

Before tagging a release (the v1.0.0 gate from gh-repo-mission), before merging a large branch, or whenever "looks fine" isn't good enough.

Procedure

1. Fan out: N reviewers × dimensions

Spawn several independent reviewers, each owning a distinct dimension so coverage doesn't collapse into one lens. Typical dimensions:

Correctness — logic bugs, wrong outputs, broken edge cases.
Contracts/interfaces — frontmatter/schema/CLI contracts honored; inputs validated.
Security & secrets — no committed secrets, injection, unsafe shell/eval, leaked tokens.
Idempotency & failure modes — re-runs safe? partial failure handled? exit codes correct?
Docs ↔ behavior — does the README/skill description match what the code actually does?
Portability — paths, shells, OS assumptions, missing tools.

Each reviewer returns findings as {title, location, severity, why-it's-a-bug, repro}.

2. Adversarially verify each finding

For every finding, run an independent check whose default is to refute it. A finding survives only if it reproduces. Concretely:

Reproduce the claim (run the command, trace the path, write the failing case).
Ask "what would make this NOT a bug?" and test that.
Drop findings that don't reproduce or that misread the code. A confident reviewer is not evidence; a reproduction is.

For high-stakes findings, use multiple verifiers with different lenses (does-it-reproduce, security-impact, is-it-intended) and require a majority to confirm.

3. Fix real ones — with a regression test

For each confirmed finding:

Write a test that fails against the current code (captures the bug).
Fix the code until that test passes.
Keep the test — it's the regression guard so the bug can't silently return.

4. Record declines

For every finding you do not fix (didn't reproduce, intended behavior, out of scope, accepted risk), write a one-line decline with the reason. Declines are part of the record — they show the finding was considered, not missed.

Output

A short report:

Confirmed & fixed — finding → fix → the regression test that now guards it.
Declined — finding → reason.
Verdict — READY TO TAG only if no confirmed-unfixed findings remain; otherwise BLOCKED with the blockers listed.

Scaling the effort

Quick check: a few reviewers, single-pass verification.
"Be thorough" / pre-v1.0.0: more reviewers, 3+ independent verifiers per finding, and a final completeness critic asking "what dimension didn't we review, what claim went unverified?".

Integration

Invoked by the adversarial review clause that gh-repo-mission bakes into every mission.
Runs immediately before the release tag; gh-publish only proceeds once the verdict is READY TO TAG.

gh-adversarial-review

Invocation

Context Preview

SKILL.md

gh-adversarial-review

Invocation

Context Preview

SKILL.md

gh-adversarial-review

When to run

Procedure

1. Fan out: N reviewers × dimensions

2. Adversarially verify each finding

3. Fix real ones — with a regression test

4. Record declines

Output

Scaling the effort

Integration

Similar Skills

gh-adversarial-review

When to run

Procedure

1. Fan out: N reviewers × dimensions

2. Adversarially verify each finding

3. Fix real ones — with a regression test

4. Record declines

Output

Scaling the effort

Integration

Similar Skills