From research-workflow
Use when you have a result you are about to trust — to red-team it against confirmation bias and the stable-but-wrong failure mode (numerical artifact, latent bug, boundary effect, or a mundane alternative explanation that fits the same data). Produces the strongest attacks plus the cheapest discriminating test for each. Don't use for reviewing CODE (→ scientific-code-reviewer and the Review cluster), the neutral close-out format (→ verification-gate), reporting the result's uncertainty budget (→ uncertainty-reporting-gate), or explaining a convergence/refinement-floor behavior (→ numerical-method-validation).
How this skill is triggered — by the user, by Claude, or both
Slash command
/research-workflow:adversarial-result-checkThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
Posture: skeptical-by-default. The goal is to **kill the result**, not to confirm it. A result you wanted to be true, that didn't crash, and that "looks reasonable" is exactly the dangerous case — convergence to a clean wrong answer is silent. Stop and attack before you believe it, cite it, or build on it.
Posture: skeptical-by-default. The goal is to kill the result, not to confirm it. A result you wanted to be true, that didn't crash, and that "looks reasonable" is exactly the dangerous case — convergence to a clean wrong answer is silent. Stop and attack before you believe it, cite it, or build on it.
Hard: actually running the cheapest discriminating test before trusting a result — an unrun test is an open hole, not a pass. Adaptable: which attack lanes you use and in what order.
jax.grad — disagreement beyond ~1e-5 (float64) means a wrong gradient; (2) jnp.isnan/isinf on the gradient even where the forward value is finite (a sqrt/log/pow at the boundary back-propagates nan); (3) a silent zero or blocked gradient from stop_gradient, argmax/argsort (zero a.e.), a where() whose dead branch is still differentiated through a singular point, or clip/floor saturation pinning the output. Full protocol → gradient-validation.For each surviving attack, give: the failure it posits → the single cheapest discriminating test → whether you actually ran it (ran / not run). For any ran claim, attach the command and its output — a "ran" with nothing pasted is a not run. Rank by (plausibility × damage-if-true) / cost-to-test; do the top one or two now. Never report "survived" for a test you only described — an unrun discriminator is an open hole, not a pass.
verification-gate — sharpens its "what is not yet proven" into active attacks.discriminating-experiment-design — design the discriminator before the result exists; this skill attacks one already in hand.gradient-validation — the full protocol for the gradient/autodiff attack lane (finite-difference grad-check, nan/zero/blocked-gradient traps).uncertainty-reporting-gate — quantify the uncertainty once the result survives attack.reference-parity-audit — one strong way to attack a claimed result.npx claudepluginhub drannarosen/research-workflow --plugin research-workflowGuides creation, editing, and verification of skills for AI coding agents using test-driven development with subagent scenarios. Use when authoring or debugging skills.