By emingenc
Dual-track workflow plugin for Claude Code: Track 1 (surgical fixes) and Track 2 (spec-driven features). Enforces TDD, context budgets, PTC scripts, and micro-task decomposition via the MACHINE framework.
Auto-execute Track 2 tasks with HIL-only pauses
Display task progress dashboard with dependency graph and metrics
Track 2 Phase 4: Execute one micro-task from tasks.json with TDD
Track 1: Apply a surgical fix to the codebase
Write a structured handoff document for context transfer between sessions
Execute one micro-task from tasks.json using TDD. Use when user says "execute", "run next task", "implement next", "continue execution", or runs /execute. This is the fourth phase of Track 2. Enforces TDD gate: failing tests FIRST, then implementation.
Generate design documents for Track 2 features. Use when user says "plan this", "design this feature", "create a plan", "write a design doc", or after research is complete and the user wants to move to planning. This is the second phase of Track 2.
Improve and enhance prompts for LLM interactions. Use when user says "improve this prompt", "make this prompt better", "enhance prompt", "review my prompt", or shares a prompt and asks for feedback.
Conduct codebase research using parallel sub-agents and PTC scripts. Use when user says "research", "investigate", "explore", "understand how", "find out about", "analyze the codebase", or before planning a feature. This is the first phase of Track 2.
Create new Claude Code skills from scratch. Use when the user says "new skill", "create a skill", "make a skill for X", "turn this into a skill", "automate this pattern", or "scaffold a skill".
Own this plugin?
Verify ownership to unlock analytics, metadata editing, and a verified badge. GitHub access is read-only (username + org membership).
Sign in to claimOwn this plugin?
Verify ownership to unlock analytics, metadata editing, and a verified badge. GitHub access is read-only (username + org membership).
Sign in to claimBased on adoption, maintenance, documentation, and repository signals. Not a security audit or endorsement.
A Claude Code plugin that keeps AI output quality high by keeping context clean.
Stop one-shotting entire apps. Start engineering the harness.
You've seen it happen. Claude starts strong — clean code, sharp reasoning — then 40 minutes in, it loses the thread. Repeats itself. Forgets decisions it made 10 messages ago. Hallucinates file states. The code quality drops off a cliff.
This isn't a model failure. It's context rot.
┌─────────────────────────┐
│ │
│ Context Rot Zone │ ← Quality degrades here.
│ ··················· │ The model is "drunk"
│ ··················· │ on its own noise.
│ │
├ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┤ ← ~50% utilization threshold
│ │
│ │
│ Quality Zone │ ← Sharp, coherent output.
│ │ This is where you want
│ │ to stay.
│ │
└─────────────────────────┘
Context Window
Research from both Anthropic and OpenAI confirms it: past ~40-50% context utilization, model performance degrades. The bigger the task, the faster you hit the rot zone. That's why "just asking Claude to build the whole thing" doesn't scale.
The LLM is a brain. Its "IQ" — the quality of its output — depends entirely on what's in its context window.
%%{init: {"theme": "base", "themeVariables": {"primaryTextColor": "#111827", "clusterTextColor": "#111827", "clusterBkg": "#f8fafc", "clusterBorder": "#e2e8f0", "lineColor": "#94a3b8", "fontFamily": "sans-serif"}}}%%
graph TD
classDef default fill:#ffffff,stroke:#94a3b8,stroke-width:2px,color:#0f172a,rx:4,ry:4;
subgraph "LLM — The Brain"
IQ["Output Quality<br/><i>what you actually get</i>"]
end
W["1. Weights<br/><small>Training data — fixed</small>"] --> IQ
P["2. Prompt & History<br/><small>Your instructions + conversation</small>"] --> IQ
D["3. Dynamic Sources<br/><small>RAG, MCP servers, tools, files</small>"] --> IQ
style IQ fill:#2d7d46,stroke:#1a5c30,color:#fff
style W fill:#4a4a4a,stroke:#333,color:#fff
style P fill:#2563eb,stroke:#1e4fba,color:#fff
style D fill:#7c3aed,stroke:#5b21b6,color:#fff
You can't change the weights. But you can engineer what goes into the prompt, history, and dynamic context. That's what this plugin does.
All we're trying to do is optimize context to maximize output quality.
This isn't a new idea — it's the natural evolution of how we work with LLMs:
%%{init: {"theme": "base", "themeVariables": {"primaryTextColor": "#111827", "clusterTextColor": "#111827", "clusterBkg": "#f8fafc", "clusterBorder": "#e2e8f0", "lineColor": "#94a3b8", "fontFamily": "sans-serif"}}}%%
graph LR
classDef default fill:#ffffff,stroke:#94a3b8,stroke-width:2px,color:#0f172a,rx:4,ry:4;
subgraph HE["Harness Engineering"]
subgraph CE["Context Engineering"]
subgraph PE["Prompt Engineering"]
pe_desc["Craft better prompts<br/><small>roles, examples, formatting</small>"]
end
ce_desc["Manage what enters the<br/>context window<br/><small>RAG, tools, trimming, MCP</small>"]
end
he_desc["Orchestrate the full<br/>development lifecycle<br/><small>tasks, TDD, state, hooks</small>"]
end
style PE fill:#dbeafe,stroke:#2563eb,color:#1e3a5f
style CE fill:#ede9fe,stroke:#7c3aed,color:#3b1d6e
style HE fill:#fef3c7,stroke:#d97706,color:#78350f
| Discipline | What it optimizes | Example |
|---|---|---|
| Prompt Engineering | The instruction itself | "You are a senior engineer. Write tests first." |
| Context Engineering | What's in the window | PTC scripts return 50 tokens instead of 2000. Sub-agents get fresh context. |
| Harness Engineering | The entire workflow | Track routing, TDD gates, micro-task decomposition, state recovery across sessions. |
Each layer contains the previous. Prompt engineering alone can't save you from context rot. Context engineering alone can't enforce TDD. You need the full harness.
%%{init: {"theme": "base", "themeVariables": {"primaryTextColor": "#111827", "clusterTextColor": "#111827", "clusterBkg": "#f8fafc", "clusterBorder": "#e2e8f0", "lineColor": "#94a3b8", "fontFamily": "sans-serif"}}}%%
graph TB
classDef default fill:#ffffff,stroke:#94a3b8,stroke-width:2px,color:#0f172a,rx:4,ry:4;
subgraph PE_COL["Prompt Engineering"]
direction TB
POUR["Pour water<br/><small>craft tokens</small>"]
BOTTLE1["🫙 One bottle"]
end
npx claudepluginhub emingenc/harness-engineering --plugin harness-engineeringSPEC-First development workflow with TDD, Ralph Loop, and autonomous agent coordination for Claude Code
Long-running agent harness with 5-layer memory architecture, GitHub integration, autonomous batch processing, Agent Teams with ATDD, 9 hooks (safety, quality gates, team coordination), and 6 Agent Skills
Context-Driven Development framework for Claude Code
Unified toolkit for Context-Driven Development with spec-first planning, TDD workflow, and Beads integration
Describe your goal, approve the spec, then step away — Claude and Codex loop together until it's right.
Verification-first engineering toolkit for Claude Code. 15 skills across a 5-phase spine (Investigate → Design → Implement → Verify → Ship), 8 specialist agents, an interactive setup wizard. Every skill has rationalizations + evidence requirements. Built for senior ICs and tech leads.