By yannabadie
Scientific harness optimizer for Claude Code. Proposes controlled candidates, evaluates with evidence, tracks a Pareto frontier.
Extract and structure project context from CLAUDE.md, memory, git history, docs, and installed plugins for harness optimization.
Evaluate harness candidates using deterministic checks and LLM judgment. Reads ONLY disk artifacts — never the proposer's reasoning.
Propose safe, testable improvements to repo-local Claude Code harness assets by inspecting full run history, scores, traces, and regressions.
Analyze regressions across harness candidates using scores, traces, and diffs. Focus on causal explanations and safer next steps.
Analyze the current project and generate initial eval tasks for harness optimization. Creates regression and capability eval tasks based on project structure.
Full Meta-Harness status view — Pareto frontier, recent runs, regressions, eval health, installed plugins.
Run the evaluation suite on the current harness or a specific candidate run. Reports deterministic check results and LLM-judge assessment.
Evolve repo-local Claude Code harness assets through a 5-phase pipeline — harvest context, propose candidate, evaluate with evidence, audit regressions, report results.
Summarize the current Meta-Harness-style frontier of harness candidates, including quality, cost, latency, and safety notes.
Admin access level
Server config contains admin-level keywords
Modifies files
Hook triggers on file write and edit operations
Own this plugin?
Verify ownership to unlock analytics, metadata editing, and a verified badge. GitHub access is read-only (username + org membership).
Sign in to claimOwn this plugin?
Verify ownership to unlock analytics, metadata editing, and a verified badge. GitHub access is read-only (username + org membership).
Sign in to claimBased on adoption, maintenance, documentation, and repository signals. Not a security audit or endorsement.
Uses power tools
Uses Bash, Write, or Edit tools
Uses power tools
Uses Bash, Write, or Edit tools
Don't guess. Evolve. Prove.
You've spent hours tweaking your CLAUDE.md, writing custom skills, adjusting agent prompts. But you have no idea if any of it actually helped. Did that new rule reduce errors? Did that prompt rewrite cost more tokens? Did the last edit break something that used to work?
Every other approach to harness optimization is guesswork:
Meta-Harness turns harness engineering into a scientific process:
/mh:evolve "reduce tool thrashing on refactoring tasks"Every improvement has a measured before/after delta. Every regression has a diagnosis. Nothing is lost.
/mh:eval
Runs 9 deterministic checks against your current harness. Shows exactly what's valid, what's broken, and what's untested. Then:
/mh:evolve "simplify CLAUDE.md — remove instructions Claude follows without being told"
The proposer reads your CLAUDE.md, compares against actual Claude behavior, and suggests specific deletions with predicted token savings.
/mh:evolve "add scope constraints to prevent application code edits"
The proposer creates a .claude/rules/ file with path-scoped constraints. The evaluator checks that the files_in_scope guard passes. If promoted, the change is tracked with a reversible patch.
/mh:regressions
Shows which run caused the score drop, compares the patch diff against the frontier leader, and identifies confounds ("prompt rewrite and stop condition changed simultaneously — test them in isolation").
/mh:rollback run-0011
Reverse-applies the patch with a safety git tag. One command, no risk.
/mh:bootstrap
Analyzes your project — CLAUDE.md, rules, skills, agents, git history, installed plugins — and generates initial eval tasks. Creates both regression tests (things that should always work) and capability tests (things you want to improve).
/mh:dashboard
Scans all installed Claude Code plugins, maps their skill/agent/hook surfaces, shows your Pareto frontier, eval health, and active regressions in one view.
Run /mh:evolve repeatedly. Each run is recorded on the Pareto frontier with full metrics:
◆ FRONTIER ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
| Run | Score | Latency | Tokens | Risk |
|-----------|-------|---------|--------|------|
| run-0012 | 0.82 | 7340ms | 10.9K | low |
| run-0009 | 0.76 | 7800ms | 12.1K | low |
| run-0006 | 0.95 | 5200ms | 8.5K | low |
Non-dominated: 3 | Total runs: 12
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Only non-dominated candidates stay on the frontier. You always know the best trade-offs.
# 1. Clone
git clone https://github.com/yannabadie/Meta-Harness-YGN.git
pip install "mcp>=1.12" # optional — MCP server only
# 2. Load
claude --plugin-dir ./Meta-Harness-YGN
# 3. Go
/mh:bootstrap # generate eval tasks for your project
/mh:evolve "improve validation" # propose a measured improvement
/mh:dashboard # see the full picture
When you run /mh:evolve, five phases execute in sequence:
npx claudepluginhub yannabadie/meta-harness-ygnTalk to ChatGPT 5.5 Pro (extended reasoning + live web search) from Claude Code via the cgpro CLI.
Intelligent security, memory, and planning plugin for Claude Code
Safety runtime for AI coding agents. Blocks destructive commands via AST analysis, auto-checkpoints before risky operations, and detects agent meltdown patterns. Works with or without daemon.
Permanent coding companion for Claude Code — survives any update. MCP-based terminal pet with ASCII art, stats, reactions, and personality.
Upstash Context7 MCP server for up-to-date documentation lookup. Pull version-specific documentation and code examples directly from source repositories into your LLM context.
Comprehensive startup business analysis with market sizing (TAM/SAM/SOM), financial modeling, team planning, and strategic research
v9.44.1 — Patch release for Gemini environment/version detection and qwen auth gating. Run /octo:setup.
Complete creative writing suite with 10 specialized agents covering the full writing process: research gathering, character development, story architecture, world-building, dialogue coaching, editing/review, outlining, content strategy, believability auditing, and prose style/voice analysis. Includes genre-specific guides, templates, and quality checklists.
Comprehensive .NET development skills for modern C#, ASP.NET, MAUI, Blazor, Aspire, EF Core, Native AOT, testing, security, performance optimization, CI/CD, and cloud-native applications