From parallel-minds
Use when code has been written and is producing wrong results, when you suspect an implementation is incorrect but can't pinpoint why, when algorithm/logic/integration code needs verification against known-correct references, or when multiple fix attempts have failed to resolve an issue
How this skill is triggered — by the user, by Claude, or both
Slash command
/parallel-minds:implementation-scrutinyThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
Dispatch parallel verification agents to investigate whether an implementation is correct. Each agent attacks the problem from a different angle — hunting reference implementations, writing empirical tests, auditing invariants, comparing against framework source — and every finding lands as a validatable artifact you can re-run.
Dispatch parallel verification agents to investigate whether an implementation is correct. Each agent attacks the problem from a different angle — hunting reference implementations, writing empirical tests, auditing invariants, comparing against framework source — and every finding lands as a validatable artifact you can re-run.
Core principle: every finding needs a validatable artifact. Prose reasoning is not evidence.
A "validatable artifact" is one of two things: a self-contained script the calling agent can execute via Bash, or a URL the calling agent can fetch via WebFetch. The calling agent — you — validates every artifact before accepting it. Scripts get run, URLs get fetched, outputs get compared. Findings that can't survive that gate are demoted to UNVERIFIED.
Two consequences fall out of this:
Use when code produces wrong output and you've already tried fixing it, when multiple confident-sounding fixes have failed, when you need to verify an implementation against a known-correct reference, when the domain needs specialized knowledge (DSP, distributed systems, security, data pipelines, ML), or when you suspect a subtle logic error that reads correct but behaves wrong.
Skip it when code doesn't compile (fix syntax first), when requirements are unclear (brainstorm first), or when the bug is obviously a typo or off-by-one (just fix it).
Write a short problem statement: what the code does, what it should do, what it's actually doing, and what's already been tried. Before asking the user for this, infer it from git diff, recent test failures, and error output, then present the inferred statement for confirmation.
Domain detection (one sentence, no tools): classify the problem as numerical/dsp, web-backend/distributed, frontend, data/ml, database, security, or general. This determines which optional agent roles activate — using DSP roles on a web bug wastes context.
Invariant extraction (one fast agent): read the code under scrutiny and produce a structured list of falsifiable statements pulled from the spec or intent, not from what the code currently does. Examples: "after this function, X should equal Y", "this value should never exceed Z", "these two collections should have the same length", "this endpoint should return 401 when the token is expired". Every verification agent receives this list as a testing target.
Fast mode (default): three core agents. Use when the problem is focused and the domain is clear. Full mode: three core + three to five domain-specific. Use when the problem is layered or core agents disagree. Auto-escalate: start fast; if core agents conflict, leave gaps, or can't locate the bug, escalate.
The three core agents always dispatch:
See domain-agents.md for the per-domain agent catalog (numerical/dsp, web-backend/distributed, frontend, data/ml, database, security, general).
All agents run with run_in_background: true. Use model: sonnet for most agents; use model: opus for the Invariant Auditor (derivations need stronger reasoning). Start with three agents in fast mode, six-plus in full mode — don't dispatch ten when three would suffice.
Every agent receives the Null Hypothesis Protocol: predict what correct code would produce before testing for bugs, then report prediction and result. Findings must come back as scripts or URLs, never as opinions. Agents must not suggest fixes — fix-suggestions anchor everyone on the first plausible patch.
See null-hypothesis-protocol.md for the full agent prompt template and required output format.
Collect all agent output, then run the Validation Gate before trusting anything:
ARTIFACT_TYPE: script finding, write the script, execute it via Bash, capture stdout/stderr verbatim, and compare against EXPECTED. Don't trust an agent's ACTUAL line — run it yourself.ARTIFACT_TYPE: url finding, fetch via WebFetch and confirm the cited content exists. 404 or content mismatch demotes the finding to UNVERIFIED.Group findings into PROVEN BUG / LIKELY BUG / CONTESTED / UNVERIFIED, persist validated scripts to scrutiny/, and present a coverage map rather than a "Verified Correct" stamp. Only after presenting findings, ask: "Want me to fix the proven bugs?"
See validation-gate.md for the validation procedure, artifact persistence rules, and the synthesis report template.
Problem: Shopping cart API occasionally shows stale item counts after concurrent add/remove operations. Multiple fix attempts (adding locks, retry logic) each fixed the test case but broke under load.
Domain detected: web-backend/distributed.
Invariant Extractor found: "cart.items.length should equal the number of successful add operations minus successful remove operations" and "concurrent operations on the same cart should serialize."
Reference Hunter found: Three cart implementations (Shopify Liquid, Medusa.js, Saleor) all use optimistic locking with version counters. Our code uses last-write-wins without versioning. Behavioral difference: all references reject stale writes; ours silently overwrites.
Empirical Tester found: Test with 10 concurrent add operations on the same cart. Expected: 10 items. Actual: 7–9 items (varies per run). Test executed, stdout captured.
Concurrency Auditor found: The updateCart function reads current state, modifies in memory, then writes back without any atomicity guarantee. Between read and write, another request can interleave. Traced three specific interleaving paths that produce data loss.
Result: Proven bug with triple-source evidence. Root cause: missing optimistic concurrency control. The previous "fix" (adding a mutex) only worked in single-process mode; under load balancing, requests hit different processes and the mutex is local.
Creates, edits, and optimizes skills for Claude Code, including drafting, evaluating with test prompts, iterating on performance, and improving skill descriptions for better triggering accuracy.
npx claudepluginhub ogabrielluiz/parallel-minds --plugin parallel-minds