Skill

adversarial-domain-review

Use when reviewing code that implements rules from an external authority — tax, accounting, financial, legal or regulatory, pricing, unit-conversion, or date and interest logic — where being correct means matching the real-world rule, not merely passing tests or matching the repo's own spec. Use when asked "is this calculation correct", "review this tax/finance logic", "does this match the regulation", or before shipping any logic where a subtle domain error carries real money, legal, or compliance consequences. Use especially when the code, its comments, its tests, and its spec doc might all encode the same wrong assumption and need checking against authoritative ground truth.

Invocation

How this skill is triggered — by the user, by Claude, or both

Slash command

/proving-ground:adversarial-domain-review

User invocable

Model invocable

Inline context

Default effort

Context Preview

The summary Claude sees in its skill listing — used to decide when to auto-load this skill

Some code is governed by rules that live *outside* the repository — tax law, accounting standards, financial regulations, SI units, interest and day-count conventions, pricing contracts. For this code, "correct" does not mean "clean," "tested," or "matches the spec doc." It means **matches the external authority.** A function can be elegant, fully tested, internally consistent with its own comm...

SKILL.md

121 lines · ~2k tokens

Stats

Stars0

MaintenanceGood

Last CommitJun 5, 2026

Actions

View Source View Plugin View on GitHub View README

Stats

Actions

Adversarial Domain Review

Overview

Some code is governed by rules that live outside the repository — tax law, accounting standards, financial regulations, SI units, interest and day-count conventions, pricing contracts. For this code, "correct" does not mean "clean," "tested," or "matches the spec doc." It means matches the external authority. A function can be elegant, fully tested, internally consistent with its own comments and spec — and still be wrong, because the assumption it encodes is wrong about the real world.

Core principle: Assume a subtle domain bug exists, and try to prove it. Default to wrong until verified against the authority. Internal consistency is not correctness — it usually just means the same wrong assumption was copied into the code, the comments, the tests, and the spec, so nothing inside the repo can flag it.

The failure this prevents: a reviewer grades the code against the repo's RULES.md / comments / PR description, finds they agree, and approves — when the rule those docs state is itself wrong. The most dangerous bug is the one everyone in the repo already agrees on.

When to Use

Reviewing tax, payroll, accounting, or financial calculations
Legal / regulatory / compliance logic (thresholds, eligibility, disclosures)
Unit conversions, interest, dates, day-counts, currency rounding
Pricing, billing, proration, discount stacking
Any change where a wrong number has money, legal, or safety consequences

Not for: general code-quality or bug review with no external rule to match — use a normal code-review skill. This skill is about domain truth, not style or generic bugs.

The Trap to Refuse

These are not evidence of domain correctness — refuse to treat them as such:

✗ "The tests pass" → tests can encode the same wrong rule (co-wrong tests).
✗ "It matches RULES.md / the spec / the comment" → the spec can be wrong; that's often the actual bug.
✗ "The PR says it was verified" → a claim, not a verification.
✗ "The code is clean and the logic is internally consistent" → consistency ≠ truth.

When you catch yourself grading the code against another artifact inside the repo, stop. The repo cannot certify itself.

The Method

1. Extract the domain claims

List every real-world rule the code depends on — each rate, threshold, ordering, floor/cap, rounding rule, eligibility condition, and interaction between rules. Write them as falsifiable statements: "franking credits are non-refundable," "the levy is computed before offsets," "income in this band is taxed at 19%."

2. Verify each claim against the authority — not the repo

For each claim, go to the external source of truth: the tax authority's published rule, the standard, the regulation, the contract. Cite it. If you cannot name the authority you checked, you have not verified the claim — you have only confirmed the repo agrees with itself. If you're unsure, look it up; do not defer to the repo's assertion.

Pin the version. Domain rules change — rates, thresholds, effective dates, standard revisions, contract amendments. Verify against the authority for the specific period, jurisdiction, and version that applies to this code. Last year's rate is this year's bug, and the right answer for one jurisdiction is wrong for the next.

When the code's spec/comment states a domain rule, treat that statement as a hypothesis to test against the authority, not as the answer. If the spec is wrong, that is your highest-value finding.

3. Build worked examples with known-correct answers

Pick concrete inputs, compute the expected output by hand from the authority, and compare to what the code produces. Spend your examples where bugs hide:

Boundary cases: exactly on a threshold, $1 either side.
Excess / overflow cases: the credit larger than the tax, the discount larger than the price, the value past the cap.
Edge inputs: zero, negative, the lowest and highest bands.

The single example the buggy code's tests never include is usually the one that exposes it. (A refund floor hides until a worked example puts the credit above the tax.)

4. Check rule interactions

Single rules are often right in isolation and wrong in combination. Map the order and interaction: does offset A reduce base B before levy C is computed? Does one credit's phase-out affect another's eligibility? Bugs concentrate in the seams between rules that were each tested alone.

5. Separate "matches spec" from "matches reality"

State each finding as one of:

Code disagrees with the spec (implementation bug), or
Code agrees with the spec, but the spec is wrong (specification bug — fix the rule, the code, the tests, and the comment together).

Never recommend changing code or comments to agree with a spec you haven't verified — that cements the bug.

Output Template

# Domain Review: [what] — verdict against authority

## Verdict
CORRECT / INCORRECT against the external authority. One line.
If incorrect: how many domain bugs, and the worst consequence.

## Domain claims checked
| Claim (as the code implements it) | Authority checked (cite it) | Verdict |
For each: ✓ matches authority / ✗ contradicts authority / ❓ couldn't verify.

## Worked examples
| Input | Expected (by hand, per authority) | Code output | Match? |
Include at least one boundary case and one excess/overflow case.

## Findings
Each: is it an implementation bug (code ≠ spec) or a specification bug
(code = spec, spec ≠ reality)? Consequence in real terms (who is over/under-
charged, the legal/compliance exposure). The fix, applied across code + spec
+ tests + comments together when it's a spec bug.

## Rule interactions checked
What combinations you tested and what you found.

Common Mistakes

Mistake	Fix
Grading the code against the repo's spec/comments/tests	Grade it against the external authority; cite the authority
Treating green tests as proof of domain correctness	Tests can be co-wrong; write a worked example for the untested edge
"Spec and code agree, so it's correct"	Verify the spec itself; a spec bug is the highest-value finding
Recommending the comment/code be aligned to an unverified spec	Don't cement a rule you haven't checked against reality
Only testing typical inputs	Bugs hide at boundaries and excess/overflow cases
Using a rule from the wrong year or jurisdiction	Pin the effective period and jurisdiction; cite the dated authority
Checking rules one at a time	Check interactions — bugs live in the seams
"Couldn't find the rule, looks fine"	❓ is not ✓. An unverified claim is an open risk, name it

Red Flags — stop and verify against the authority

You're about to approve because "it matches RULES.md / the spec / the comment."
You're citing the PR description or a test as evidence the rule is right.
You haven't computed a single worked example by hand from the authority.
Every input you considered keeps the credit/discount/value below the cap or floor.
You're recommending a comment change to match the code, without checking which one reality agrees with.
You can't name the authoritative source you verified against.

adversarial-domain-review

Invocation

Context Preview

SKILL.md

adversarial-domain-review

Invocation

Context Preview

SKILL.md

Adversarial Domain Review

Overview

When to Use

The Trap to Refuse

The Method

1. Extract the domain claims

2. Verify each claim against the authority — not the repo

3. Build worked examples with known-correct answers

4. Check rule interactions

5. Separate "matches spec" from "matches reality"

Output Template

Common Mistakes

Red Flags — stop and verify against the authority

Similar Skills

Adversarial Domain Review

Overview

When to Use

The Trap to Refuse

The Method

1. Extract the domain claims

2. Verify each claim against the authority — not the repo

3. Build worked examples with known-correct answers

4. Check rule interactions

5. Separate "matches spec" from "matches reality"

Output Template

Common Mistakes

Red Flags — stop and verify against the authority

Similar Skills