From claude-commands
Analyzes recurring failures to identify gaps in instructions, skills, tests, CI gates, and lint rules, then produces harness-level fixes.
How this skill is triggered — by the user, by Claude, or both
Slash command
/claude-commands:harness-engineeringThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
When a mistake or failure pattern is identified, analyze whether the root cause is a gap in the **harness** (instructions, skills, tests, guardrails, automation) rather than just the code. Produce a concrete fix at the harness level so the same class of mistake cannot recur without human intervention.
When a mistake or failure pattern is identified, analyze whether the root cause is a gap in the harness (instructions, skills, tests, guardrails, automation) rather than just the code. Produce a concrete fix at the harness level so the same class of mistake cannot recur without human intervention.
Command: ~/.claude/commands/harness.md
~/.claude/commands/harness.md and this skill apply to any repo unless a project overrides them..claude/commands/harness.md and/or .claude/skills/<name>/SKILL.md that extend user-scope rules (for example gateway operations). When both exist, read the repo-local file for project-specific failure modes. Collision: workspace-local .claude/commands/ overrides the same-named global command in that workspace.~/.openclaw: Use the openclaw-harness skill in that repo for gateway, canary, deploy, and lane-backlog triage. Tracked user-scope copies for drift control live under docs/harness/ in jleechanclaw.| Layer | Files | What it prevents |
|---|---|---|
| Instructions | CLAUDE.md, AGENTS.md (global + repo) | Wrong approach, wrong assumptions, wrong defaults |
| Skills | ~/.claude/skills/*.md, ~/.claude/commands/*.md | Repeated manual workflows, forgotten validation steps |
| Memory | ~/.claude/projects/*/memory/*.md | Forgetting user preferences, past corrections, project context |
| Integration tests | tests/test_integration_*.py, tests/test_*_test.py | Regressions in real behavior |
| CI gates | .github/workflows/*.yml, pre-commit hooks | Merging broken code, mislabeled artifacts |
| Lint/validation rules | .pre-commit-config.yaml, custom linters | Style drift, naming violations, structural problems |
When invoked, execute this sequence in full, every time:
Classify what went wrong:
.bashrc-sourced secrets silently disappear; the process appears alive but all API calls failAsk "Why?" five times about the technical failure, drilling into root cause:
Why 1: Why did the observable failure happen?
Why 2: Why did the mechanism that caused Why 1 exist?
Why 3: Why wasn't that mechanism caught or prevented?
Why 4: Why wasn't there a guardrail at that level?
Why 5: Why was the system designed without that guardrail?
→ Root cause: <single sentence>
Stop earlier if you hit bedrock. Each answer should be more specific than the last.
Ask "Why?" five times about why the LLM (Claude Code or any coding agent) went down the wrong path. This is mandatory. Every harness failure has two dimensions: the technical problem AND the agent reasoning failure that let it slip through.
Why 1: Why did the agent not catch/prevent the failure?
Why 2: Why did the agent reason or act that way?
Why 3: Why didn't the agent's instructions prevent that reasoning?
Why 4: Why wasn't there a skill, memory, or rule that would have redirected the agent?
Why 5: Why was the harness incomplete for this class of agent behavior?
→ Agent root cause: <single sentence>
Key questions to drive this:
heuristic_decision() but not is_approval_candidate() will silently fail — is_approval_candidate() is the gate that decides whether heuristic_decision() ever runs)gh api, jq, or CLI command into CLAUDE.md / skills / memory, run it against a real target (e.g., a real PR) and confirm the output is non-empty and correct. Do NOT encode a command pattern until you have verified it returns the expected output for the actual API response shape. Failure mode: agent writes .state == "FAILURE" (wrong field for GitHub Actions) when it should be .conclusion == "FAILURE" — command returns 0 failures silently.For each failure class, check which harness layers are missing or insufficient:
~/.claude/CLAUDE.md, repo CLAUDE.md, ~/.codex/AGENTS.md
~/.claude/skills/, ~/.claude/commands/
~/.claude/projects/*/memory/
Critical check — harness layer present but broken: For each harness layer that exists, verify it actually works, not just that it exists. A broken guardrail is as bad as a missing one and is harder to detect.
Output a concrete action plan:
FAILURE CLASS: <classification>
5 WHYS — TECHNICAL:
1. <why>
2. <why>
3. <why>
4. <why>
5. <why>
→ Root cause: <sentence>
5 WHYS — AGENT PATH:
1. <why>
2. <why>
3. <why>
4. <why>
5. <why>
→ Agent root cause: <sentence>
HARNESS FIXES (in order of priority):
1. [LAYER] FILE: <path> — <what to add/change>
2. [LAYER] FILE: <path> — <what to add/change>
...
VERIFICATION: <how to confirm the fix prevents recurrence>
After user approval (or if invoked with --fix):
Observable: Gateway returns unauthorized / embedded fallback; logs show token value __OPENCLAW_REDACTED__ or similar.
Failure class: Silent degradation (harness script looked correct: “get token, export, run”) plus knowledge gap (CLI intentionally does not echo real secrets).
Technical 5 Whys (compressed): Export used openclaw config get … → CLI prints redacted sentinel → shell passed sentinel to gateway → auth failed → embedded fallback.
Agent 5 Whys (compressed): Pattern “export token then call CLI” is common in docs → agent did not verify the value was non-redacted → no instruction forbidding export of config get for secrets → recurrence.
Harness fixes (typical):
~/.claude/CLAUDE.md: never export gateway tokens from config get when output can be redacted; unset override env vars; read JSON or use provider-specific keys (MINIMAX_API_KEY) as documented.~/.claude/commands/claw.md (or repo overlay): validate token with a file read / non-redacted check; document the redaction trap explicitly.MEMORY.md.Verification: Run the command path and confirm the exported string is not a known redaction sentinel and gateway returns status: ok without “falling back to embedded”.
Observable: PR shows Green Gate completed/success in gh pr checks but skeptic VERDICT was never posted. Agent reports PR as 7-green.
Failure class: Silent CI success — a workflow can exit 0 at the job level while an inner step fails. The harness layer (Green Gate) is present and running, but it reports the wrong thing at the job level.
Technical 5 Whys:
completed/success even though Gate 7 was failing?Agent 5 Whys:
gh pr checks output to determine Gate 7 status?gh pr checks is insufficient for Gate 7?Harness fixes:
pr-green-definition.md: Add mandatory REST API VERDICT check procedure (not just policy statement)pr-green-definition.md: Add counter-example showing Green Gate completed/success while Gate 7 fails/green.md: Already exists but relied on workflow logs only; updated to mandate REST check for Gate 7feedback_2026_04_21_silent_ci_success.md: Document the specific PRs where this was observedVerification: For any PR where Green Gate shows completed/success, independently run the REST API VERDICT check and confirm the VERDICT comment exists before reporting 7-green.
npx claudepluginhub jleechanorg/claude-commands --plugin claude-commandsEnforces AI governance with safety gates, evidence-based debugging, anti-slack detection, and machine-enforced hooks. Use when AI modifies files, configs, databases, or deployments.
4-phase systematic debugging with entropy analysis and persistent sessions. Enforces investigation before fixes for non-obvious test or context-dependent failures.
Fixes GitHub issues using parallel analysis, hypothesis-based root cause analysis, similar issue detection, and prevention recommendations. Use for debugging errors, regressions, bugs, or triaging.