From harness
Use when harness output has plateaued, the evaluator's grading disagrees with a human spot-check, the same criterion keeps failing across runs, or quality is drifting between iterations. Triggers on "/tune-harness", "tune the harness", "the evaluator is wrong", "the generator keeps missing X", "improve the harness". This is the trace-reading loop — the primary engineering work after instantiation.
How this skill is triggered — by the user, by Claude, or both
Slash command
/harness:tune-harnessThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
This is where the real harness engineering happens. It is a discipline, not a magic trick. The pattern: read traces, find divergence, update one prompt, re-run, compare. Repeat.
This is where the real harness engineering happens. It is a discipline, not a magic trick. The pattern: read traces, find divergence, update one prompt, re-run, compare. Repeat.
state/traces/ and state/verdicts/.planner.md.generator.md.evaluator.md (most common).rubric.json..harness/calibration/ example..harness/state/progress.jsonl:
{"t":"...","event":"prompt_edit","file":"evaluator.md","reason":"approved iter 7 despite c2 obviously failing on line 14"}
| Symptom | First file to inspect |
|---|---|
| Iterations look good individually but drift across runs | evaluator.md — calibration not anchoring |
| Generator builds the wrong thing | planner.md — spec was too vague |
| Evaluator passes obviously broken artifacts | evaluator.md — hedge list or "skeptical" framing missing |
| Same criterion keeps failing without progress | rubric.json — criterion too coarse, OR add calibration example |
| Generator self-approves in handoff | generator.md — role-as-builder not enforced |
| Verdicts are vague ("could be better") | evaluator.md — granularity requirement missing |
| Two criteria contradict each other | rubric.json — merge or re-define |
A tuning session that:
Creates, edits, and optimizes skills for Claude Code, including drafting, evaluating with test prompts, iterating on performance, and improving skill descriptions for better triggering accuracy.
npx claudepluginhub kamilseghrouchni/harness --plugin harness