Skill

probe-loop

Runs a verification pass that catches bugs surviving green test suites by exercising real production paths, independently verifying actual output, and locking findings as regression tests.

Invocation

How this skill is triggered — by the user, by Claude, or both

Slash command

/probe-loop:probe-loop

User invocable

Model invocable

Inline context

Default effort

When to use

When the user asks to harden, verify, audit, or stress-test code where unit tests pass but real-world behavior hasn't been checked. Also trigger on silent failures, "status says success but output is wrong" bugs, environment mismatches, cross-layer integration risks, or AI-generated code that hasn't been exercised through the real path. Signal phrases include "does this actually work", "verify end-to-end", "test against the real API", "find bugs that survived testing", "harden this", "audit this", "is the output actually correct", "the tests pass but I'm not sure if it works".

Tool Access

This skill is limited to the following tools:

BashRead

Context Preview

The summary Claude sees in its skill listing — used to decide when to auto-load this skill

This skill runs a verification pass that catches the bug class which survives contract-level testing. Use it after tests are green and the architecture is settled, when the remaining question is whether the system actually works against the real runtime.

SKILL.md

126 lines · ~2.1k tokens

Stats

Parent stars0

MaintenanceGood

Last CommitMay 18, 2026

Actions

View Source View Plugin View on GitHub View README

Stats

Actions

The Probe Loop

The core principle: tests verify contracts; probes verify reality. The bugs that survive a green test suite live in the gap between what the system reports and what it actually does. Status codes are not verification. Logs are not verification. Bytes, pixels, and stream events are verification.

The five stages

Execute these in sequence, within a single continuous session whenever possible.

1. Probe

Submit a real operation through the real production path. Cross every layer boundary. Do not mock anything where the seam itself is what matters. Do not use fixtures where the runtime environment matters. If the operation hits a live API, hit a live API. If the runtime is an iOS app, run on a simulator or device. If the output is a file, write a real file.

When generating probes, scan the codebase for:

Paths that have unit test coverage but no end-to-end coverage against the real runtime
Boundary crossings: HTTP, file I/O, network, stream, IPC, provider switches, platform APIs
Code paths added recently or generated by an agent without being exercised through the real path
Operations the user has explicitly asked to verify

If the codebase is large, ask the user which subsystem or feature is in scope before generating probes. Do not silently probe the entire system.

2. Verify

Do not trust what the system reports. Read the actual output. Compare it independently against what the operation should have produced.

Status codes are not verification. Read the response body.
"Completed" is not verification. Inspect the artifact.
A passing test is not verification of new behavior. Verify the underlying output directly.

For each probe, write an explicit verification step that opens the produced artifact (file, stream, response body, database row) and asserts properties that distinguish "actually worked" from "reported success." When in doubt, hash the expected output and compare hashes.

3. Discover

The gap between what the system reports and what it actually does is where the bug lives. The reported value is usually correct in some narrow contract sense. The actual behavior diverges at a boundary nobody tested across.

When a probe fails or returns unexpected output, classify the failure against the six known bug categories in the taxonomy below. The category usually points to which layer holds the root cause.

4. Fix

Trace the divergence to its root layer. This is rarely the layer that surfaced the symptom. Read the full call stack across every layer the operation crossed. Identify the assumption that broke. Fix the cause, not the symptom.

Common patterns:

Symptom in the response handler, root in the parser
Symptom in the parser, root in the runtime that fed it
Symptom in the runtime, root in the platform API that diverges from its documented spec
Symptom in one provider integration, root in shared code that all providers depend on

Before applying the fix, state the failure trace explicitly: which layer surfaced the symptom, which layer holds the cause, why the cause was invisible to existing tests. This trace becomes the rationale for both the fix and the regression test.

5. Lock

Convert the probe into a regression test. The behavior becomes a permanent assertion against the real production path, not a fixture in a sandbox.

The regression test should:

Run against the same real path the probe ran against, or as close as the test environment allows
Verify the same actual output the probe verified, not just a status code
Fail if the original bug reappears
Carry a comment naming the bug category from the taxonomy

If the probe cannot be converted into a fully automated regression test (for example, it requires a live third-party API), lock it as a manual probe in a documented hardening checklist instead. Do not let the probe disappear.

The bug taxonomy

When a probe fails, classify the failure. Each category is a structural blind spot of contract testing, and the category usually points to where the root cause lives.

Silent no-op. The operation completes successfully and has no effect. Output equals input. Status reports success. Only independent verification of the output catches it.

Environment mismatch. The code is correct against its specification and broken against the actual runtime. The test environment and the production environment differ in a way the developer did not anticipate.

Swallowed error. A failure path discards the diagnostic, catches and ignores the error, or reports success with truncated output.

Hardcoded parameter. A value that should come from configuration is hardcoded. The code works for the hardcoded case and silently produces wrong output for every other case.

Cross-layer trust violation. Layer A assumes Layer B handles something. Layer B assumes Layer A handles it. Neither does. Each layer passes its own tests because the tests do not cross the boundary.

Destructive edge case. A code path that is safe for the common case destroys data in an edge case nobody probed for. The edge case is obvious in retrospect but invisible until it is named.

When to run the probe loop

Run a probe loop when any of these apply:

A feature crosses provider, runtime, file, stream, or network boundaries
Tests are green but the actual output has not been independently inspected
Fixtures represent the API contract but not the live runtime
An agent generated code that appears correct but has not been exercised through the real path
The system reports success before anyone has inspected the artifact
A new layer was added without verifying that existing safety, timeout, or cancellation guarantees still cross it

When not to run the probe loop

New code being developed for the first time: use TDD, not probes
Pure refactoring with no intended behavior change: characterization tests are the right tool
Style, linting, or formatting passes: probes are not for cosmetic verification

Reporting the results

When the loop completes, report in this structure:

Probes run. A list of operations probed, each with the real path it exercised and what was verified.
Findings. Each bug found, with the category from the taxonomy and the root layer where the cause lives.
Fixes applied. Code changes made, with the failure trace as rationale.
Regression tests added. Locked probes, with the assertion each one enforces and the bug category it guards against.
Clean probes. Probes that ran and found nothing. List these too: a clean probe is still coverage added, and the user needs to know what was checked.

Anti-patterns

Do not:

Test against fixtures and call it a probe. The probe must exercise the real path.
Assert on status codes when the actual output is available. Read the output.
Fix the symptom. Trace to the root layer and fix the cause.
Skip the lock step. An unfixed probe is a one-time discovery; a locked probe is a permanent gate.
Treat all failures as the same category. Classify, then act.
Probe silently across an entire large codebase without scoping with the user first.

probe-loop

Invocation

Tool Access

Context Preview

SKILL.md

probe-loop

Invocation

Tool Access

Context Preview

SKILL.md

The Probe Loop

The five stages

1. Probe

2. Verify

3. Discover

4. Fix

5. Lock

The bug taxonomy

When to run the probe loop

When not to run the probe loop

Reporting the results

Anti-patterns

Similar Skills

The Probe Loop

The five stages

1. Probe

2. Verify

3. Discover

4. Fix

5. Lock

The bug taxonomy

When to run the probe loop

When not to run the probe loop

Reporting the results

Anti-patterns

Similar Skills