Domain-agnostic planner / generator / evaluator harness for long-running complex tasks. Scaffolds a .harness/ folder per project with typed JSON handoffs, file-system state, calibrated evaluators, pivot-on-stuck, and continuous JSONL logging.
Use when starting a new project that should run under the planner/generator/evaluator harness, or when an existing project needs a new harness for a different task type. Triggers on "/instantiate-harness", "set up a harness for this project", "scaffold a harness", "I want to run [X] under the harness". Walks an AskUserQuestion wizard and emits a .harness/ folder with config, prompts, and rubric.
Use every time a new model lands (Opus / Sonnet / Haiku release, or a meaningful provider change). Triggers on "/re-assess-harness", "the new model is out — what should we strip", "re-evaluate the harness", "is X still needed in the harness". Strips scaffolding that no longer earns its cost; flags capabilities now unlocked.
Use to execute the harness loop on a project that has a .harness/ folder. Triggers on "/run-harness", "run the harness", "kick off the harness on [X]", "resume the harness". You — Claude Code — are the orchestrator. Each role (planner, generator, evaluator) is a subagent invocation via the Task tool. State lives on disk under .harness/state/.
Use when harness output has plateaued, the evaluator's grading disagrees with a human spot-check, the same criterion keeps failing across runs, or quality is drifting between iterations. Triggers on "/tune-harness", "tune the harness", "the evaluator is wrong", "the generator keeps missing X", "improve the harness". This is the trace-reading loop — the primary engineering work after instantiation.
Use when authoring or revising the evaluator.md system prompt — the most tuning-intensive role in the harness. Triggers on "/write-evaluator-prompt", "draft the evaluator", "rewrite the evaluator", "tune the evaluator", "the evaluator keeps approving bad work". Enforces skeptical voice, operate-the-artifact verification, and calibration grounding.
Own this plugin?
Verify ownership to unlock analytics, metadata editing, and a verified badge. GitHub access is read-only (username + org membership).
Sign in to claimOwn this plugin?
Verify ownership to unlock analytics, metadata editing, and a verified badge. GitHub access is read-only (username + org membership).
Sign in to claimBased on adoption, maintenance, documentation, and repository signals. Not a security audit or endorsement.
A domain-agnostic plugin for running planner / generator / evaluator loops on complex tasks. Built around the lessons from Anthropic's long-running agent harness work, but stripped of coding-specific framing so the same shape works for research briefs, outreach batches, data deliverables, strategy memos, design, and code.
You give it a project — any kind. It scaffolds a .harness/ folder with three roles:
The roles communicate through JSON files on disk. Every event appends to progress.jsonl. When the generator gets stuck hill-climbing on one criterion, the runtime archives the attempt and pivots to a fresh approach. When a new model lands, you re-assess and strip the scaffolding that's no longer load-bearing.
This is a Claude Code plugin. Drop it into a marketplace or install locally:
git clone [email protected]:kamilseghrouchni/harness.git ~/.claude/plugins/harness
Restart Claude Code and the eight skills under skills/ become invokable.
/instantiate-harness
Walks an AskUserQuestion wizard:
Writes .harness/config.json and four prompt files. Drop 3–5 calibration examples into .harness/calibration/, then:
/run-harness
Claude Code is the orchestrator. When you type /run-harness, the skill dispatches each role (planner, generator, evaluator) as a fresh subagent via the Task tool — each gets its own context window. State lives on disk under .harness/state/. The Python under runtime/cli/ is utility-only (validate JSON, append to JSONL, decide pivots, archive stuck attempts) — called from the skill via Bash, not the other way around.
A headless runtime/runner.py exists for automation (cron, CI), but the primary path is Claude Code.
| Path | Purpose |
|---|---|
.claude-plugin/plugin.json | Plugin manifest |
CLAUDE.md | Auto-loaded principles digest |
docs/principles.html | Full architecture spec — the source of truth |
skills/ | Eight skills (instantiate · run · 4× prompt-writers · tune · re-assess) |
templates/ | Files copied into .harness/ on instantiation |
runtime/cli/ | Bash-callable utilities the run-harness skill invokes |
runtime/schemas/ | JSON Schemas for plan, contract, verdict, handoff, progress |
runtime/runner.py | Optional headless orchestrator (non-interactive use) |
examples/outreach-batch/ | Worked instantiation |
Split work across three roles, each with its own context window. Use the file system as state, JSON as the message format. Make the evaluator skeptical and calibrated, not just told to be harsh. Have generator and evaluator negotiate a contract before any work starts. Verify by operating the artifact, not by reading a description of it. Allow throwing the work away when stuck. Read the traces — that is the tuning loop. Re-examine the harness every model release and strip what no longer earns its cost.
See docs/principles.html for the full sixteen.
MIT.
npx claudepluginhub kamilseghrouchni/harness --plugin harnessMemory compression system for Claude Code - persist context across sessions
Editorial "Web Designer" bundle for Claude Code from Antigravity Awesome Skills.
Ultra-compressed communication mode. Cuts ~75% of tokens while keeping full technical accuracy by speaking like a caveman.
Frontend design skill for UI/UX implementation
Comprehensive UI/UX design plugin for mobile (iOS, Android, React Native) and web applications with design systems, accessibility, and modern patterns
Marketing skills for AI agents — conversion optimization, copywriting, SEO, paid ads, ad creative, and growth