From proving-ground
Use when reviewing code that implements rules from an external authority — tax, accounting, financial, legal or regulatory, pricing, unit-conversion, or date and interest logic — where being correct means matching the real-world rule, not merely passing tests or matching the repo's own spec. Use when asked "is this calculation correct", "review this tax/finance logic", "does this match the regulation", or before shipping any logic where a subtle domain error carries real money, legal, or compliance consequences. Use especially when the code, its comments, its tests, and its spec doc might all encode the same wrong assumption and need checking against authoritative ground truth.
How this skill is triggered — by the user, by Claude, or both
Slash command
/proving-ground:adversarial-domain-reviewThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
Some code is governed by rules that live *outside* the repository — tax law, accounting standards, financial regulations, SI units, interest and day-count conventions, pricing contracts. For this code, "correct" does not mean "clean," "tested," or "matches the spec doc." It means **matches the external authority.** A function can be elegant, fully tested, internally consistent with its own comm...
Some code is governed by rules that live outside the repository — tax law, accounting standards, financial regulations, SI units, interest and day-count conventions, pricing contracts. For this code, "correct" does not mean "clean," "tested," or "matches the spec doc." It means matches the external authority. A function can be elegant, fully tested, internally consistent with its own comments and spec — and still be wrong, because the assumption it encodes is wrong about the real world.
Core principle: Assume a subtle domain bug exists, and try to prove it. Default to wrong until verified against the authority. Internal consistency is not correctness — it usually just means the same wrong assumption was copied into the code, the comments, the tests, and the spec, so nothing inside the repo can flag it.
The failure this prevents: a reviewer grades the code against the repo's RULES.md / comments / PR description, finds they agree, and approves — when the rule those docs state is itself wrong. The most dangerous bug is the one everyone in the repo already agrees on.
Not for: general code-quality or bug review with no external rule to match — use a normal code-review skill. This skill is about domain truth, not style or generic bugs.
These are not evidence of domain correctness — refuse to treat them as such:
RULES.md / the spec / the comment" → the spec can be wrong; that's often the actual bug.When you catch yourself grading the code against another artifact inside the repo, stop. The repo cannot certify itself.
List every real-world rule the code depends on — each rate, threshold, ordering, floor/cap, rounding rule, eligibility condition, and interaction between rules. Write them as falsifiable statements: "franking credits are non-refundable," "the levy is computed before offsets," "income in this band is taxed at 19%."
For each claim, go to the external source of truth: the tax authority's published rule, the standard, the regulation, the contract. Cite it. If you cannot name the authority you checked, you have not verified the claim — you have only confirmed the repo agrees with itself. If you're unsure, look it up; do not defer to the repo's assertion.
Pin the version. Domain rules change — rates, thresholds, effective dates, standard revisions, contract amendments. Verify against the authority for the specific period, jurisdiction, and version that applies to this code. Last year's rate is this year's bug, and the right answer for one jurisdiction is wrong for the next.
When the code's spec/comment states a domain rule, treat that statement as a hypothesis to test against the authority, not as the answer. If the spec is wrong, that is your highest-value finding.
Pick concrete inputs, compute the expected output by hand from the authority, and compare to what the code produces. Spend your examples where bugs hide:
The single example the buggy code's tests never include is usually the one that exposes it. (A refund floor hides until a worked example puts the credit above the tax.)
Single rules are often right in isolation and wrong in combination. Map the order and interaction: does offset A reduce base B before levy C is computed? Does one credit's phase-out affect another's eligibility? Bugs concentrate in the seams between rules that were each tested alone.
State each finding as one of:
Never recommend changing code or comments to agree with a spec you haven't verified — that cements the bug.
# Domain Review: [what] — verdict against authority
## Verdict
CORRECT / INCORRECT against the external authority. One line.
If incorrect: how many domain bugs, and the worst consequence.
## Domain claims checked
| Claim (as the code implements it) | Authority checked (cite it) | Verdict |
For each: ✓ matches authority / ✗ contradicts authority / ❓ couldn't verify.
## Worked examples
| Input | Expected (by hand, per authority) | Code output | Match? |
Include at least one boundary case and one excess/overflow case.
## Findings
Each: is it an implementation bug (code ≠ spec) or a specification bug
(code = spec, spec ≠ reality)? Consequence in real terms (who is over/under-
charged, the legal/compliance exposure). The fix, applied across code + spec
+ tests + comments together when it's a spec bug.
## Rule interactions checked
What combinations you tested and what you found.
| Mistake | Fix |
|---|---|
| Grading the code against the repo's spec/comments/tests | Grade it against the external authority; cite the authority |
| Treating green tests as proof of domain correctness | Tests can be co-wrong; write a worked example for the untested edge |
| "Spec and code agree, so it's correct" | Verify the spec itself; a spec bug is the highest-value finding |
| Recommending the comment/code be aligned to an unverified spec | Don't cement a rule you haven't checked against reality |
| Only testing typical inputs | Bugs hide at boundaries and excess/overflow cases |
| Using a rule from the wrong year or jurisdiction | Pin the effective period and jurisdiction; cite the dated authority |
| Checking rules one at a time | Check interactions — bugs live in the seams |
| "Couldn't find the rule, looks fine" | ❓ is not ✓. An unverified claim is an open risk, name it |
RULES.md / the spec / the comment."Provides UI/UX resources: 50+ styles, color palettes, font pairings, guidelines, charts for web/mobile across React, Next.js, Vue, Svelte, Tailwind, React Native, Flutter. Aids planning, building, reviewing interfaces.
Fetches up-to-date documentation from Context7 for libraries and frameworks like React, Next.js, Prisma. Use for setup questions, API references, and code examples.
npx claudepluginhub aidenhiew/proving-ground --plugin proving-ground