From probe-loop
Runs a verification pass that catches bugs surviving green test suites by exercising real production paths, independently verifying actual output, and locking findings as regression tests.
How this skill is triggered — by the user, by Claude, or both
Slash command
/probe-loop:probe-loopWhen to use
When the user asks to harden, verify, audit, or stress-test code where unit tests pass but real-world behavior hasn't been checked. Also trigger on silent failures, "status says success but output is wrong" bugs, environment mismatches, cross-layer integration risks, or AI-generated code that hasn't been exercised through the real path. Signal phrases include "does this actually work", "verify end-to-end", "test against the real API", "find bugs that survived testing", "harden this", "audit this", "is the output actually correct", "the tests pass but I'm not sure if it works".
This skill is limited to the following tools:
The summary Claude sees in its skill listing — used to decide when to auto-load this skill
This skill runs a verification pass that catches the bug class which survives contract-level testing. Use it after tests are green and the architecture is settled, when the remaining question is whether the system actually works against the real runtime.
This skill runs a verification pass that catches the bug class which survives contract-level testing. Use it after tests are green and the architecture is settled, when the remaining question is whether the system actually works against the real runtime.
The core principle: tests verify contracts; probes verify reality. The bugs that survive a green test suite live in the gap between what the system reports and what it actually does. Status codes are not verification. Logs are not verification. Bytes, pixels, and stream events are verification.
Execute these in sequence, within a single continuous session whenever possible.
Submit a real operation through the real production path. Cross every layer boundary. Do not mock anything where the seam itself is what matters. Do not use fixtures where the runtime environment matters. If the operation hits a live API, hit a live API. If the runtime is an iOS app, run on a simulator or device. If the output is a file, write a real file.
When generating probes, scan the codebase for:
If the codebase is large, ask the user which subsystem or feature is in scope before generating probes. Do not silently probe the entire system.
Do not trust what the system reports. Read the actual output. Compare it independently against what the operation should have produced.
For each probe, write an explicit verification step that opens the produced artifact (file, stream, response body, database row) and asserts properties that distinguish "actually worked" from "reported success." When in doubt, hash the expected output and compare hashes.
The gap between what the system reports and what it actually does is where the bug lives. The reported value is usually correct in some narrow contract sense. The actual behavior diverges at a boundary nobody tested across.
When a probe fails or returns unexpected output, classify the failure against the six known bug categories in the taxonomy below. The category usually points to which layer holds the root cause.
Trace the divergence to its root layer. This is rarely the layer that surfaced the symptom. Read the full call stack across every layer the operation crossed. Identify the assumption that broke. Fix the cause, not the symptom.
Common patterns:
Before applying the fix, state the failure trace explicitly: which layer surfaced the symptom, which layer holds the cause, why the cause was invisible to existing tests. This trace becomes the rationale for both the fix and the regression test.
Convert the probe into a regression test. The behavior becomes a permanent assertion against the real production path, not a fixture in a sandbox.
The regression test should:
If the probe cannot be converted into a fully automated regression test (for example, it requires a live third-party API), lock it as a manual probe in a documented hardening checklist instead. Do not let the probe disappear.
When a probe fails, classify the failure. Each category is a structural blind spot of contract testing, and the category usually points to where the root cause lives.
Silent no-op. The operation completes successfully and has no effect. Output equals input. Status reports success. Only independent verification of the output catches it.
Environment mismatch. The code is correct against its specification and broken against the actual runtime. The test environment and the production environment differ in a way the developer did not anticipate.
Swallowed error. A failure path discards the diagnostic, catches and ignores the error, or reports success with truncated output.
Hardcoded parameter. A value that should come from configuration is hardcoded. The code works for the hardcoded case and silently produces wrong output for every other case.
Cross-layer trust violation. Layer A assumes Layer B handles something. Layer B assumes Layer A handles it. Neither does. Each layer passes its own tests because the tests do not cross the boundary.
Destructive edge case. A code path that is safe for the common case destroys data in an edge case nobody probed for. The edge case is obvious in retrospect but invisible until it is named.
Run a probe loop when any of these apply:
When the loop completes, report in this structure:
Do not:
npx claudepluginhub amirshayegh/probe-loop --plugin probe-loopProvides behavioral guidelines to reduce common LLM coding mistakes, focusing on simplicity, surgical changes, assumption surfacing, and verifiable success criteria.
Searches, retrieves, and installs Agent Skills from prompts.chat registry using MCP tools like search_skills and get_skill. Activates for finding skills, browsing catalogs, or extending Claude.
Creates, edits, and optimizes skills for Claude Code, including drafting, evaluating with test prompts, iterating on performance, and improving skill descriptions for better triggering accuracy.