Search everything...

Stats

Actions

Available In

harness

Name: harness
Author: kamilseghrouchni

By kamilseghrouchni

Domain-agnostic planner / generator / evaluator harness for long-running complex tasks. Scaffolds a .harness/ folder per project with typed JSON handoffs, file-system state, calibrated evaluators, pivot-on-stuck, and continuous JSONL logging.

npx claudepluginhub kamilseghrouchni/harness --plugin harness

Popularity

Stars

Med: 0·Avg: 285

Installs

Med: 0·Avg: 1

What's Inside

Skills8

instantiate-harness

/instantiate-harness

Use when starting a new project that should run under the planner/generator/evaluator harness, or when an existing project needs a new harness for a different task type. Triggers on "/instantiate-harness", "set up a harness for this project", "scaffold a harness", "I want to run [X] under the harness". Walks an AskUserQuestion wizard and emits a .harness/ folder with config, prompts, and rubric.

re-assess-harness

/re-assess-harness

Use every time a new model lands (Opus / Sonnet / Haiku release, or a meaningful provider change). Triggers on "/re-assess-harness", "the new model is out — what should we strip", "re-evaluate the harness", "is X still needed in the harness". Strips scaffolding that no longer earns its cost; flags capabilities now unlocked.

run-harness

/run-harness

Use to execute the harness loop on a project that has a .harness/ folder. Triggers on "/run-harness", "run the harness", "kick off the harness on [X]", "resume the harness". You — Claude Code — are the orchestrator. Each role (planner, generator, evaluator) is a subagent invocation via the Task tool. State lives on disk under .harness/state/.

tune-harness

/tune-harness

Use when harness output has plateaued, the evaluator's grading disagrees with a human spot-check, the same criterion keeps failing across runs, or quality is drifting between iterations. Triggers on "/tune-harness", "tune the harness", "the evaluator is wrong", "the generator keeps missing X", "improve the harness". This is the trace-reading loop — the primary engineering work after instantiation.

write-evaluator-prompt

/write-evaluator-prompt

Use when authoring or revising the evaluator.md system prompt — the most tuning-intensive role in the harness. Triggers on "/write-evaluator-prompt", "draft the evaluator", "rewrite the evaluator", "tune the evaluator", "the evaluator keeps approving bad work". Enforces skeptical voice, operate-the-artifact verification, and calibration grounding.

Stats

Version0.2.0

LanguagePython

Stars0

MaintenanceGood

LicenseMIT

Last CommitJun 1, 2026

AddedJun 1, 2026

Actions

View on GitHub View README Plugin Marketplace JSON

Own this plugin?

Verify ownership to unlock analytics, metadata editing, and a verified badge. GitHub access is read-only (username + org membership).

Available In

harness

README

harness

A domain-agnostic plugin for running planner / generator / evaluator loops on complex tasks. Built around the lessons from Anthropic's long-running agent harness work, but stripped of coding-specific framing so the same shape works for research briefs, outreach batches, data deliverables, strategy memos, design, and code.

What it does

You give it a project — any kind. It scaffolds a .harness/ folder with three roles:

Planner — turns a short prompt into a high-level spec (intentionally not granular).
Generator — builds the artifact in iterations.
Evaluator — operates the artifact (drives the live thing, walks the citations, dry-runs the email) and grades it against a weighted rubric you wrote.

The roles communicate through JSON files on disk. Every event appends to progress.jsonl. When the generator gets stuck hill-climbing on one criterion, the runtime archives the attempt and pivots to a fresh approach. When a new model lands, you re-assess and strip the scaffolding that's no longer load-bearing.

Install

This is a Claude Code plugin. Drop it into a marketplace or install locally:

git clone [email protected]:kamilseghrouchni/harness.git ~/.claude/plugins/harness

Restart Claude Code and the eight skills under skills/ become invokable.

First run

/instantiate-harness

Walks an AskUserQuestion wizard:

Domain (code · research · writing · outreach · data · design · other)
The user prompt that triggers a run
The verification action the evaluator will perform
Top 3–5 weighted criteria
Model per role, session strategy, budget cap, pivot policy

Writes .harness/config.json and four prompt files. Drop 3–5 calibration examples into .harness/calibration/, then:

/run-harness

How it runs

Claude Code is the orchestrator. When you type /run-harness, the skill dispatches each role (planner, generator, evaluator) as a fresh subagent via the Task tool — each gets its own context window. State lives on disk under .harness/state/. The Python under runtime/cli/ is utility-only (validate JSON, append to JSONL, decide pivots, archive stuck attempts) — called from the skill via Bash, not the other way around.

A headless runtime/runner.py exists for automation (cron, CI), but the primary path is Claude Code.

What's in this repo

Path	Purpose
`.claude-plugin/plugin.json`	Plugin manifest
`CLAUDE.md`	Auto-loaded principles digest
`docs/principles.html`	Full architecture spec — the source of truth
`skills/`	Eight skills (instantiate · run · 4× prompt-writers · tune · re-assess)
`templates/`	Files copied into `.harness/` on instantiation
`runtime/cli/`	Bash-callable utilities the `run-harness` skill invokes
`runtime/schemas/`	JSON Schemas for plan, contract, verdict, handoff, progress
`runtime/runner.py`	Optional headless orchestrator (non-interactive use)
`examples/outreach-batch/`	Worked instantiation

Principles in one paragraph

Split work across three roles, each with its own context window. Use the file system as state, JSON as the message format. Make the evaluator skeptical and calibrated, not just told to be harsh. Have generator and evaluator negotiate a contract before any work starts. Verify by operating the artifact, not by reading a description of it. Allow throwing the work away when stuck. Read the traces — that is the tuning loop. Re-examine the harness every model release and strip what no longer earns its cost.

See docs/principles.html for the full sixteen.

License

MIT.

harness

Popularity

What's Inside

Confidence

README

harness

What it does

Install

First run

How it runs

What's in this repo

Principles in one paragraph

License

Similar Plugins

claude-mem

antigravity-bundle-web-designer

caveman

frontend-design

ui-design

marketing-skills

harness

What it does

Install

First run

How it runs

What's in this repo

Principles in one paragraph

License

Popularity

Health & Quality

Similar Plugins

claude-mem

antigravity-bundle-web-designer

caveman

frontend-design

ui-design

marketing-skills