From dh
Enforces scientific validation protocol for bug fixes and code changes: reproduce failing state, define success criteria, apply fix, verify observed outcomes. Use when claiming fixes complete.
How this skill is triggered — by the user, by Claude, or both
Slash command
/dh:validation-protocolThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
Before claiming any fix works, follow this scientific validation protocol. Success means observing the intended behavior, not merely the absence of errors.
Before claiming any fix works, follow this scientific validation protocol. Success means observing the intended behavior, not merely the absence of errors.
Success = Observing the intended behavior, not absence of errors.
A fix that "runs without failing" is not validated. A fix that demonstrates the specific expected outcome is validated.
Objective: Establish the broken baseline before attempting any fix.
Actions:
Why: Without reproducing the failure, you cannot verify the fix addresses the actual problem. You may fix a different issue or introduce new problems.
Example:
# Create the broken state by running the relevant command or test
{test command from language manifest} -k test_broken_function
# Observe the failure
# Expected: Error or incorrect output
# Observed: Error or incorrect output confirmed
Objective: State what specific observable output indicates the fix worked.
Actions:
Why: Clear success criteria prevent false positives where code runs but doesn't actually work correctly.
Example:
Success Criteria:
- Function returns expected value: {"status": "processed", "count": 42}
- No exceptions raised
- Output matches test assertion exactly
- Performance within acceptable range (<100ms)
Objective: Implement the fix and capture what actually happens.
Actions:
Why: The fix may run without errors but still not produce the intended behavior. Observation reveals the truth.
Example:
# Run the fixed code
{test command from language manifest} -k test_fixed_function
# Observe the output
# Expected: All assertions pass with correct values
# Observed: All assertions pass with correct values
Objective: Confirm the broken state is now fixed and success criteria are met.
Actions:
Why: Verification ensures the fix actually solved the problem and didn't just change the symptoms.
Example:
Verification Results:
Step 1 Recheck:
- Reproduction steps no longer trigger failure
- Error message no longer appears
Step 2 Criteria Check:
- Function returns {"status": "processed", "count": 42}
- No exceptions raised
- Output matches test assertion
- Performance: 45ms (within <100ms requirement)
Conclusion: Fix verified. All success criteria met with evidence.
"I fixed the bug. The code runs now."
Problem: Without reproducing the failure, you don't know if the fix addresses the actual issue.
Correct Approach:
Step 1: Reproduced failure - function raised ValueError("Invalid input")
Step 2: Success = function returns valid output without error
Step 3: Applied fix - added input validation
Step 4: Verified - function now returns {"result": "valid"}, no ValueError
"The tests pass now, so the fix works."
Problem: Tests passing means no exceptions, not necessarily correct behavior.
Correct Approach:
Step 2: Success = function processes 1000 records and returns count=1000
Step 3: Observed output: {"processed": 1000, "failed": 0}
Step 4: Verified count matches expected value exactly
"I made the change. It should work now."
Problem: "Should work" is speculation, not verification.
Correct Approach:
Step 3: Applied fix
Step 4: Ran test suite - all 45 tests pass
Step 4: Manually tested edge case - correct behavior observed
Step 4: Checked logs - no error messages, expected INFO logs present
"The main case works, so the fix is complete."
Problem: Edge cases and boundary conditions may still be broken.
Correct Approach:
Step 2: Success criteria:
- Normal input: returns expected output
- Empty input: raises ValueError
- Large input (10k records): completes within 5s
- Invalid input: raises TypeError
Step 4: All criteria verified with evidence
This validation protocol complements but does not replace automated testing:
Automated Tests: Prevent regressions, verify expected behavior systematically
Validation Protocol: Ensures the specific fix addresses the specific problem observed
Use both:
Bug Report: "User authentication fails with 500 error"
Step 1: Reproduce Failing State
- Attempt login with valid credentials
- Observe: HTTP 500, server logs show "KeyError: 'user_id'"
- Confirmed: Failure reproduces consistently
Step 2: Define Success Criteria
- Login with valid credentials returns HTTP 200
- Response contains {"status": "authenticated", "user_id": <id>}
- No KeyError in server logs
- Session cookie is set
Step 3: Apply Fix and Observe
- Fixed: Added user_id field validation in auth handler
- Tested: Login with valid credentials
- Observed: HTTP 200 response
- Observed: {"status": "authenticated", "user_id": 42}
- Observed: No errors in logs
- Observed: Session cookie present
Step 4: Verify Result
- Reproduction steps no longer trigger 500 error
- HTTP 200 received (not 500)
- Response contains correct user_id
- No KeyError in logs
- Session cookie set correctly
Conclusion: Fix verified. All success criteria met with evidence.
Additional Verification:
- Added regression test for user_id validation
- Full test suite passes (156 tests)
- Deployed to staging, manual verification successful
This protocol ensures fixes are verified through observation rather than assumption:
Remember: Success = Observing the intended behavior, not absence of errors.
npx claudepluginhub jamie-bitflight/claude_skills --plugin dhReproduces failures before fixes and verifies resolutions after via command mappings, BEFORE/AFTER .progress.md docs, VF tasks, and mock test anti-pattern checks.
Guides bug fixes through reproduction, root-cause analysis, minimal changes, and validation. Use when tasked with finding and fixing a code defect.
Fixes bugs using test-first loop: add minimal failing reproduction test, apply smallest fix to affected module, verify full test suite and linters. Use for reported bugs needing verified low-impact fixes.