By tuanle96
Solo-dev harness engineering kit — layered architecture, garbage-collection ritual, structural tests, review subagents. Optimized for Claude Code 2.1+.
Use this skill whenever a decision is made about architecture, dependencies, frameworks, naming conventions, or layer order. Creates a numbered ADR (Architecture Decision Record) in `.harness/docs/adr/` in the canonical Nygard format. Always invoke this before changing layer order, adding a layer, swapping a major dependency, or introducing a new external service.
Run Mini SWE-bench style harness regression tasks and A/B comparisons to measure harness improvement objectively.
Inspect context usage, token budget, compaction history, and overflow risk. Use when sessions get long, before large changes, after compaction, or when cost/context drift is suspected.
Build a compact read-only context packet for a natural-language codebase question. Use before editing unfamiliar code, when tracing task evidence, contracts, validation, or proof paths, or when the relevant files are not obvious.
Create a Story Packet for normal/high-risk features. Use after /feature-intake classifies work as normal or high-risk, or when the user asks to break a feature into acceptance criteria, test expectations, and agent-sized work units.
Use this agent after changing language detection, adapter templates, structural-test runners, or capability declarations. Verifies that README claims, capability matrix, render paths, hooks, and adapter tests agree across TypeScript, Python, Go, Rust, Swift, and Kotlin. Read-only.
Use this agent after adding or modifying any public API endpoint, exported function, CLI command, or RPC handler. Verifies naming, response shape, error format, and versioning conventions match `.harness/docs/api-conventions.md` (or the kit's defaults if that file doesn't exist). Read-only.
Use this agent after adding or modifying eval tasks, hidden checks, or model-judge rubrics. Verifies that deterministic checks cover objective facts and that rubric dimensions require evidence instead of vague pass/fail judgment. Read-only.
Use this agent after adding loops over large collections, database queries, render paths, or anything in a hot path. Catches N+1 queries, missing memoization, accidental quadratic loops, and unindexed sorts. Read-only. Runs on Haiku for speed.
Use this agent before publishing agent-harness-kit or after changes to release, installer, npm package, README, schema, generated templates, or marketplace metadata. Verifies release truth, package surface, and verification gates. Read-only.
Use this skill whenever a decision is made about architecture, dependencies, frameworks, naming conventions, or layer order. Creates a numbered ADR (Architecture Decision Record) in `.harness/docs/adr/` in the canonical Nygard format. Always invoke this before changing layer order, adding a layer, swapping a major dependency, or introducing a new external service.
Run Mini SWE-bench style harness regression tasks and A/B comparisons to measure harness improvement objectively.
Inspect context usage, token budget, compaction history, and overflow risk. Use when sessions get long, before large changes, after compaction, or when cost/context drift is suspected.
Build a compact read-only context packet for a natural-language codebase question. Use before editing unfamiliar code, when tracing task evidence, contracts, validation, or proof paths, or when the relevant files are not obvious.
Create a Story Packet for normal/high-risk features. Use after /feature-intake classifies work as normal or high-risk, or when the user asks to break a feature into acceptance criteria, test expectations, and agent-sized work units.
Matches all tools
Hooks run on every tool call, not just specific ones
Executes bash commands
Hook triggers when Bash tool is used
Own this plugin?
Verify ownership to unlock analytics, metadata editing, and a verified badge. GitHub access is read-only (username + org membership).
Sign in to claimOwn this plugin?
Verify ownership to unlock analytics, metadata editing, and a verified badge. GitHub access is read-only (username + org membership).
Sign in to claimBased on adoption, maintenance, documentation, and repository signals. Not a security audit or endorsement.
Modifies files
Hook triggers on file write and edit operations
Modifies files
Hook triggers on file write and edit operations
Uses power tools
Uses Bash, Write, or Edit tools
Uses power tools
Uses Bash, Write, or Edit tools
The infrastructure layer that makes AI agents production-ready.
Solo-dev harness engineering kit for Claude Code, with an experimental Codex-readable runtime surface. One command, ~30 minutes, and your hobby project gets the patterns that took OpenAI from prototype to 1M lines of agent-generated code: layered architecture, structural tests, garbage collection, review subagents, JSON feature tracking, and pre-completion checklists — without the enterprise overhead.
February 2026: OpenAI published "Harness engineering: leveraging Codex in an agent-first world" documenting how their Frontier Product Exploration team built an internal product with ~1 million lines of code over 5 months — with zero lines manually written by humans.
The results:
The insight: The work shifted from writing code to engineering the harness — the infrastructure, constraints, and feedback loops that make agents reliable at scale.
March 2026: LangChain demonstrated this principle empirically. By improving their agent harness alone (no model changes), they jumped from 52.8% → 66.5% on Terminal-Bench 2.0, climbing 25 spots on the leaderboard.
The pattern is clear: Harness quality matters more than model choice for production outcomes.
You're a solo developer or small team. You don't have OpenAI's infrastructure budget or Stripe's agent platform team. But you can adopt the same patterns at hobby-project scale:
/add-feature, /context-query, /garbage-collection, /remember-project, /project-status, /review-this-pr, etc.)nextjs-saas, api-backend, and python-datapasses: true when the current diff has machine-readable proof, concrete checks, and a diff summaryharness.db records intake, stories, decisions, backlog, traces, friction, and trace quality without hand-editing Markdown tablesnpx claudepluginhub tuanle96/agent-harness-kit --plugin agent-harness-kitFeature development with code-architect/explorer/reviewer agents, CLAUDE.md audit and session learnings, and Agent Skills creation with eval benchmarking from Anthropic.
Comprehensive skill pack with 66 specialized skills for full-stack developers: 12 language experts (Python, TypeScript, Go, Rust, C++, Swift, Kotlin, C#, PHP, Java, SQL, JavaScript), 10 backend frameworks, 6 frontend/mobile, plus infrastructure, DevOps, security, and testing. Features progressive disclosure architecture for 50% faster loading.
Production-grade engineering skills for AI coding agents — covering the full software development lifecycle from spec to ship.
Access thousands of AI prompts and skills directly in your AI coding assistant. Search prompts, discover skills, save your own, and improve prompts with AI.
Comprehensive feature development workflow with specialized agents for codebase exploration, architecture design, and quality review
Complete developer toolkit for Claude Code