Search everything...

Stats

Actions

Available In

harness-engineering

Name: harness-engineering
Author: emingenc

By emingenc

Dual-track workflow plugin for Claude Code: Track 1 (surgical fixes) and Track 2 (spec-driven features). Enforces TDD, context budgets, PTC scripts, and micro-task decomposition via the MACHINE framework.

npx claudepluginhub emingenc/harness-engineering --plugin harness-engineering

Popularity

Stars

Med: 0·Avg: 285

Installs

Med: 0·Avg: 1

What's Inside

Slash Commands11

/auto — Auto-Execute Loop

/auto

Auto-execute Track 2 tasks with HIL-only pauses

/dashboard — Task Progress Dashboard

/dashboard

Display task progress dashboard with dependency graph and metrics

/execute — Track 2 Execution

/execute

Track 2 Phase 4: Execute one micro-task from tasks.json with TDD

/fix — Track 1 Surgical Fix

/fix

Track 1: Apply a surgical fix to the codebase

/handoff — Context Transfer

/handoff

Write a structured handoff document for context transfer between sessions

Skills7

executor

/executor

Execute one micro-task from tasks.json using TDD. Use when user says "execute", "run next task", "implement next", "continue execution", or runs /execute. This is the fourth phase of Track 2. Enforces TDD gate: failing tests FIRST, then implementation.

planner

/planner

Generate design documents for Track 2 features. Use when user says "plan this", "design this feature", "create a plan", "write a design doc", or after research is complete and the user wants to move to planning. This is the second phase of Track 2.

prompt-enhancer

/prompt-enhancer

Improve and enhance prompts for LLM interactions. Use when user says "improve this prompt", "make this prompt better", "enhance prompt", "review my prompt", or shares a prompt and asks for feedback.

researcher

/researcher

Conduct codebase research using parallel sub-agents and PTC scripts. Use when user says "research", "investigate", "explore", "understand how", "find out about", "analyze the codebase", or before planning a feature. This is the first phase of Track 2.

skill-factory

/skill-factory

Create new Claude Code skills from scratch. Use when the user says "new skill", "create a skill", "make a skill for X", "turn this into a skill", "automate this pattern", or "scaffold a skill".

Hooks1

Event Hooks

6 hooks across 4 events

Stats

Version0.1.0

LanguagePython

Stars0

Forks1

MaintenanceExcellent

LicenseMIT

Last CommitApr 3, 2026

AddedMar 22, 2026

Actions

View on GitHub View README Plugin Marketplace JSON

Own this plugin?

Verify ownership to unlock analytics, metadata editing, and a verified badge. GitHub access is read-only (username + org membership).

Available In

harness-engineering

README

Harness Engineering

A Claude Code plugin that keeps AI output quality high by keeping context clean.

Stop one-shotting entire apps. Start engineering the harness.

The Problem

You've seen it happen. Claude starts strong — clean code, sharp reasoning — then 40 minutes in, it loses the thread. Repeats itself. Forgets decisions it made 10 messages ago. Hallucinates file states. The code quality drops off a cliff.

This isn't a model failure. It's context rot.

┌─────────────────────────┐
│                         │
│    Context Rot Zone     │  ← Quality degrades here.
│    ···················  │    The model is "drunk"
│    ···················  │    on its own noise.
│                         │
├ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┤  ← ~50% utilization threshold
│                         │
│                         │
│     Quality Zone        │  ← Sharp, coherent output.
│                         │    This is where you want
│                         │    to stay.
│                         │
└─────────────────────────┘
       Context Window

Research from both Anthropic and OpenAI confirms it: past ~40-50% context utilization, model performance degrades. The bigger the task, the faster you hit the rot zone. That's why "just asking Claude to build the whole thing" doesn't scale.

Why This Exists

The LLM is a brain. Its "IQ" — the quality of its output — depends entirely on what's in its context window.

%%{init: {"theme": "base", "themeVariables": {"primaryTextColor": "#111827", "clusterTextColor": "#111827", "clusterBkg": "#f8fafc", "clusterBorder": "#e2e8f0", "lineColor": "#94a3b8", "fontFamily": "sans-serif"}}}%%
graph TD
    classDef default fill:#ffffff,stroke:#94a3b8,stroke-width:2px,color:#0f172a,rx:4,ry:4;
    subgraph "LLM — The Brain"
        IQ["Output Quality<br/><i>what you actually get</i>"]
    end

    W["1. Weights<br/><small>Training data — fixed</small>"] --> IQ
    P["2. Prompt & History<br/><small>Your instructions + conversation</small>"] --> IQ
    D["3. Dynamic Sources<br/><small>RAG, MCP servers, tools, files</small>"] --> IQ

    style IQ fill:#2d7d46,stroke:#1a5c30,color:#fff
    style W fill:#4a4a4a,stroke:#333,color:#fff
    style P fill:#2563eb,stroke:#1e4fba,color:#fff
    style D fill:#7c3aed,stroke:#5b21b6,color:#fff

You can't change the weights. But you can engineer what goes into the prompt, history, and dynamic context. That's what this plugin does.

All we're trying to do is optimize context to maximize output quality.

The Three Disciplines

This isn't a new idea — it's the natural evolution of how we work with LLMs:

%%{init: {"theme": "base", "themeVariables": {"primaryTextColor": "#111827", "clusterTextColor": "#111827", "clusterBkg": "#f8fafc", "clusterBorder": "#e2e8f0", "lineColor": "#94a3b8", "fontFamily": "sans-serif"}}}%%
graph LR
    classDef default fill:#ffffff,stroke:#94a3b8,stroke-width:2px,color:#0f172a,rx:4,ry:4;
    subgraph HE["Harness Engineering"]
        subgraph CE["Context Engineering"]
            subgraph PE["Prompt Engineering"]
                pe_desc["Craft better prompts<br/><small>roles, examples, formatting</small>"]
            end
            ce_desc["Manage what enters the<br/>context window<br/><small>RAG, tools, trimming, MCP</small>"]
        end
        he_desc["Orchestrate the full<br/>development lifecycle<br/><small>tasks, TDD, state, hooks</small>"]
    end

    style PE fill:#dbeafe,stroke:#2563eb,color:#1e3a5f
    style CE fill:#ede9fe,stroke:#7c3aed,color:#3b1d6e
    style HE fill:#fef3c7,stroke:#d97706,color:#78350f

Discipline	What it optimizes	Example
Prompt Engineering	The instruction itself	"You are a senior engineer. Write tests first."
Context Engineering	What's in the window	PTC scripts return 50 tokens instead of 2000. Sub-agents get fresh context.
Harness Engineering	The entire workflow	Track routing, TDD gates, micro-task decomposition, state recovery across sessions.

Each layer contains the previous. Prompt engineering alone can't save you from context rot. Context engineering alone can't enforce TDD. You need the full harness.

Think of It Like Water Bottles

%%{init: {"theme": "base", "themeVariables": {"primaryTextColor": "#111827", "clusterTextColor": "#111827", "clusterBkg": "#f8fafc", "clusterBorder": "#e2e8f0", "lineColor": "#94a3b8", "fontFamily": "sans-serif"}}}%%
graph TB
    classDef default fill:#ffffff,stroke:#94a3b8,stroke-width:2px,color:#0f172a,rx:4,ry:4;
    subgraph PE_COL["Prompt Engineering"]
        direction TB
        POUR["Pour water<br/><small>craft tokens</small>"]
        BOTTLE1["🫙 One bottle"]
    end

View full README on GitHub

harness-engineering

Popularity

What's Inside

Confidence

README

Harness Engineering

The Problem

Why This Exists

The Three Disciplines

Think of It Like Water Bottles

Similar Plugins

claude-pilot

claude-harness

conductor

flow

tandemkit

claudekit

Harness Engineering

The Problem

Why This Exists

The Three Disciplines

Think of It Like Water Bottles

Popularity

Health & Quality

Similar Plugins

claude-pilot

claude-harness

conductor

flow

tandemkit

claudekit