From harness
Health check and optimization for existing project Harness setups. Activate when the user mentions "check Harness", "optimize CLAUDE.md", "Agent keeps making mistakes", "Harness health", "audit Harness", "evaluate AI coding environment", "harness audit", "check Agent config", "why won't the Agent follow instructions", "improve Agent effectiveness", "add Harness to existing project", or "legacy optimization". Also use this Skill when the user complains that Agent behavior deviates from expectations, the Agent repeatedly makes the same mistakes, or the project has been around for a while but lacks a systematic Harness framework — diagnose and improve.
How this skill is triggered — by the user, by Claude, or both
Slash command
/harness:auditThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
> This Skill performs a systematic diagnosis of an existing project's Harness framework,
This Skill performs a systematic diagnosis of an existing project's Harness framework, identifies weak points, and provides concrete optimization recommendations. Core principle: Build constraints around failure modes you have actually observed, not hypothetical ones.
Use an Explore subagent or directly scan the following files and directories:
# Check each of the six Harness layers
echo "=== 1. Memory Layer ===" && wc -l CLAUDE.md 2>/dev/null
echo "=== 2. Rules Layer ===" && cat .claude/settings.json 2>/dev/null
echo "=== 3. Skills Layer ===" && ls .claude/skills/ .claude/commands/ 2>/dev/null
echo "=== 4. Agents Layer ===" && ls .claude/agents/ 2>/dev/null
echo "=== 5. Hooks Layer ===" && grep -r "hooks" .claude/settings.json 2>/dev/null
echo "=== 6. Tools Layer ===" && grep -r "mcpServers" .claude/settings.json 2>/dev/null
echo "=== Documentation ===" && ls docs/ 2>/dev/null
echo "=== ADR ===" && ls docs/decisions/ 2>/dev/null
Evaluate and score each dimension (0-3) based on the OpenAI Scorecard framework:
| Dimension | Evaluation Question | 0 Points | 1 Point | 2 Points | 3 Points |
|---|---|---|---|---|---|
| Bootstrap | Can the Agent complete first-time setup and self-test without human intervention? | No automation | Partial scripts | Self-test but manual steps required | Fully automated |
| Task Entry | Are entry tasks clear and discoverable? | No navigation | CLAUDE.md has a list | Has Commands | Has Skills + Commands |
| Validation | Can CI/tests automatically validate Agent output? | No tests | Manual testing | CI has tests | CI + Hooks auto-validate |
| Lint Gates | Do format checks run automatically on pre-commit? | No checks | Exists but manual | pre-commit | PostToolUse Hook |
| Repo Map | Does the repo have a clear domain architecture diagram? | No docs | README | architecture.md | arch.md with dependency rules |
| Structured Docs | Are design docs structured with cross-links? | No docs/ | Exists but scattered | Has structure | Structured + cross-linked |
| Decision Records | Are architecture decisions recorded and maintained as ADRs? | No ADR | Exists but outdated | Exists and updated | Exists with deprecation records |
Score Interpretation:
Check the following common failure modes and generate a diagnostic report:
A. Memory File (CLAUDE.md) Diagnostics
Checklist:
[ ] Line count exceeds 60? -> Needs trimming
[ ] Contains rules the Agent already follows naturally? -> Remove redundant rules
[ ] Contains vague, unverifiable rules (e.g., "write good code")? -> Replace with specific, verifiable rules
[ ] Contains rules that should be enforced by Hooks but are in the memory file? -> Migrate to Hooks
[ ] Contains outdated rules? -> Delete or flag
B. Hook Coverage Diagnostics
Checklist:
[ ] Is there a Stop Hook for quality gates? -> Highest priority
[ ] Is there a PreToolUse Hook to protect sensitive files? -> Security essential
[ ] Is there a PostToolUse Hook for auto-formatting? -> Consistency guarantee
[ ] Are Hooks silent on success? -> Output pollutes context
[ ] Do Hooks use the correct exit code on failure (exit 2)? -> Affects feedback loop
C. Context Health Diagnostics
Checklist:
[ ] Baseline cost (new session) < 20k tokens?
[ ] CLAUDE.md size < 2000 tokens?
[ ] Total MCP tool tokens < 20k?
[ ] Too many MCP Server connections? -> Connect on demand
[ ] Is test output silent on success?
D. Architecture Constraint Diagnostics
Checklist:
[ ] Are there explicit dependency direction rules?
[ ] Are dependency rules automatically validated (Linter / structural tests)?
[ ] Is there an architecture.md documenting module boundaries?
[ ] Are architecture violations caught in CI?
E. Documentation System Diagnostics
Checklist:
[ ] Does architecture.md exist and match the code? -> Check directory structure
[ ] Is the ADR index complete? -> Check decisions/README.md
[ ] Are there ADRs with "deprecated" status? -> This matters
[ ] Is progress tracking using JSON format?
Sort issues by "frequency x severity" and output a structured optimization plan:
## Harness Health Report
### Current Score: XX / 21
### RED — Immediate Action (This Week)
1. [Problem description] -> [Specific fix steps]
2. ...
### YELLOW — Complete This Month
1. [Problem description] -> [Specific fix steps]
2. ...
### GREEN — Continuous Improvement
1. [Problem description] -> [Specific fix steps]
2. ...
If the user agrees, execute the optimization directly:
Set up a "weekly Harness maintenance ritual" for the project:
Recommend setting up /harness:sync-docs and /harness:scan-arch scheduled tasks to automate these checks.
During audits, pay special attention to whether both "feedforward + feedback" controls are in place:
Guides (Feedforward Control): Steer the Agent before it acts
Sensors (Feedback Control): Validate after the Agent acts
Principle: Cover 80% of common issues with Computational approaches first, then use Inferential approaches for the remaining 20% that require semantic understanding.
npx claudepluginhub huangbaixun/harness-engineering --plugin harness-engineeringProvides behavioral guidelines to reduce common LLM coding mistakes, focusing on simplicity, surgical changes, assumption surfacing, and verifiable success criteria.
Searches, retrieves, and installs Agent Skills from prompts.chat registry using MCP tools like search_skills and get_skill. Activates for finding skills, browsing catalogs, or extending Claude.
Creates, edits, and optimizes skills for Claude Code, including drafting, evaluating with test prompts, iterating on performance, and improving skill descriptions for better triggering accuracy.