Search everything...

Stats

Actions

Available In

mh

Name: mh
Author: yannabadie

By yannabadie

Scientific harness optimizer for Claude Code. Proposes controlled candidates, evaluates with evidence, tracks a Pareto frontier.

npx claudepluginhub yannabadie/meta-harness-ygn

Popularity

Stars

Med: 0·Avg: 285

Installs

Med: 0·Avg: 1

What's Inside

Agents4

context-harvester

/context-harvester

Extract and structure project context from CLAUDE.md, memory, git history, docs, and installed plugins for harness optimization.

harness-evaluator

/harness-evaluator

Evaluate harness candidates using deterministic checks and LLM judgment. Reads ONLY disk artifacts — never the proposer's reasoning.

harness-proposer

/harness-proposer

Propose safe, testable improvements to repo-local Claude Code harness assets by inspecting full run history, scores, traces, and regressions.

regression-auditor

/regression-auditor

Analyze regressions across harness candidates using scores, traces, and diffs. Focus on causal explanations and safer next steps.

Skills6

bootstrap

/harness-bootstrap

Analyze the current project and generate initial eval tasks for harness optimization. Creates regression and capability eval tasks based on project structure.

dashboard

/harness-dashboard

Full Meta-Harness status view — Pareto frontier, recent runs, regressions, eval health, installed plugins.

eval

/harness-eval

Run the evaluation suite on the current harness or a specific candidate run. Reports deterministic check results and LLM-judge assessment.

evolve

/harness-evolve

Evolve repo-local Claude Code harness assets through a 5-phase pipeline — harvest context, propose candidate, evaluate with evidence, audit regressions, report results.

frontier

/harness-frontier

Summarize the current Meta-Harness-style frontier of harness candidates, including quality, cost, latency, and safety notes.

Hooks1

Event Hooks

File writes

7 hooks across 6 events

MCP Servers1

mh-server

admin

Stats

Version1.3.0

ReleasedApr 8, 2026

LanguagePython

Stars0

MaintenanceExcellent

LicenseMIT

Last CommitApr 8, 2026

AddedApr 7, 2026

Actions

View on GitHub View README Plugin Marketplace JSON Homepage

Own this plugin?

Verify ownership to unlock analytics, metadata editing, and a verified badge. GitHub access is read-only (username + org membership).

Safety Signals

Critical

Admin access level

Server config contains admin-level keywords

Caution

Modifies files

Hook triggers on file write and edit operations

README

⚗ Meta-Harness-YGN

Don't guess. Evolve. Prove.

The Problem

You've spent hours tweaking your CLAUDE.md, writing custom skills, adjusting agent prompts. But you have no idea if any of it actually helped. Did that new rule reduce errors? Did that prompt rewrite cost more tokens? Did the last edit break something that used to work?

Every other approach to harness optimization is guesswork:

Edit, hope, repeat — no measurement, no history, no rollback
Copy someone else's CLAUDE.md — their project isn't yours
Add more instructions — research shows this often makes things worse (ETH Zurich: LLM-generated context files degrade performance by 3%)

The Solution

Meta-Harness turns harness engineering into a scientific process:

You describe what to improve — /mh:evolve "reduce tool thrashing on refactoring tasks"
The plugin proposes a controlled change — one hypothesis, one patch, with predicted impact and risk assessment
It evaluates the change with evidence — 9 deterministic checks, not vibes
It tracks everything on a Pareto frontier — score vs. latency vs. token cost, so you see trade-offs
If something regresses, it explains why — causal analysis, not just "score went down"

Every improvement has a measured before/after delta. Every regression has a diagnosis. Nothing is lost.

What Can You Do With It?

"My CLAUDE.md is 300 lines and I don't know what's helping"

/mh:eval

Runs 9 deterministic checks against your current harness. Shows exactly what's valid, what's broken, and what's untested. Then:

/mh:evolve "simplify CLAUDE.md — remove instructions Claude follows without being told"

The proposer reads your CLAUDE.md, compares against actual Claude behavior, and suggests specific deletions with predicted token savings.

"Claude keeps editing files it shouldn't touch"

/mh:evolve "add scope constraints to prevent application code edits"

The proposer creates a .claude/rules/ file with path-scoped constraints. The evaluator checks that the files_in_scope guard passes. If promoted, the change is tracked with a reversible patch.

"Someone changed the prompts and now everything is worse"

/mh:regressions

Shows which run caused the score drop, compares the patch diff against the frontier leader, and identifies confounds ("prompt rewrite and stop condition changed simultaneously — test them in isolation").

/mh:rollback run-0011

Reverse-applies the patch with a safety git tag. One command, no risk.

"I want to optimize but I don't know where to start"

/mh:bootstrap

Analyzes your project — CLAUDE.md, rules, skills, agents, git history, installed plugins — and generates initial eval tasks. Creates both regression tests (things that should always work) and capability tests (things you want to improve).

"I have 8 plugins installed but no idea how they interact"

/mh:dashboard

Scans all installed Claude Code plugins, maps their skill/agent/hook surfaces, shows your Pareto frontier, eval health, and active regressions in one view.

"I want to know if my harness is actually getting better over time"

Run /mh:evolve repeatedly. Each run is recorded on the Pareto frontier with full metrics:

◆ FRONTIER ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
| Run       | Score | Latency | Tokens | Risk |
|-----------|-------|---------|--------|------|
| run-0012  | 0.82  | 7340ms  | 10.9K  | low  |
| run-0009  | 0.76  | 7800ms  | 12.1K  | low  |
| run-0006  | 0.95  | 5200ms  | 8.5K   | low  |

Non-dominated: 3 | Total runs: 12
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

Only non-dominated candidates stay on the frontier. You always know the best trade-offs.

Quick Start

# 1. Clone
git clone https://github.com/yannabadie/Meta-Harness-YGN.git
pip install "mcp>=1.12"   # optional — MCP server only

# 2. Load
claude --plugin-dir ./Meta-Harness-YGN

# 3. Go
/mh:bootstrap                    # generate eval tasks for your project
/mh:evolve "improve validation"  # propose a measured improvement
/mh:dashboard                    # see the full picture

How It Works

When you run /mh:evolve, five phases execute in sequence:

View full README on GitHub

mh

Popularity

What's Inside

Confidence

README

⚗ Meta-Harness-YGN

The Problem

The Solution

What Can You Do With It?

"My CLAUDE.md is 300 lines and I don't know what's helping"

"Claude keeps editing files it shouldn't touch"

"Someone changed the prompts and now everything is worse"

"I want to optimize but I don't know where to start"

"I have 8 plugins installed but no idea how they interact"

"I want to know if my harness is actually getting better over time"

Quick Start

How It Works

Similar Plugins

claude-buddy

context7-plugin

startup-business-analyst

octo

creative-writing

dotnet-skills

More by yannabadie

cgpro

kodo

aletheia-nexus

⚗ Meta-Harness-YGN

The Problem

The Solution

What Can You Do With It?

"My CLAUDE.md is 300 lines and I don't know what's helping"

"Claude keeps editing files it shouldn't touch"

"Someone changed the prompts and now everything is worse"

"I want to optimize but I don't know where to start"

"I have 8 plugins installed but no idea how they interact"

"I want to know if my harness is actually getting better over time"

Quick Start

How It Works

Popularity

Health & Quality

More by yannabadie

cgpro

kodo

aletheia-nexus

Similar Plugins

claude-buddy

context7-plugin

startup-business-analyst

octo

creative-writing

dotnet-skills