Multi-agent sprint orchestration harness for long-running development
npx claudepluginhub xuzhijie-ownself/harnessDomain-blind core harness with 6 agents, GAN-like adversarial evaluation, profile system, sprint contracts, and quantified 0-5 grading
Software delivery lifecycle domain skills for the harness framework. Provides 5 domain skills (SDLC, EA, BA, SA, Ops) and an index skill for profile routing.
Multi-agent sprint orchestration middleware for long-running application development. Two-plugin architecture: a domain-blind core handles all orchestration, and domain skill suites provide profile-specific knowledge.
6 agents: initializer, planner, generator, evaluator, coordinator, releaser.
Dual-runtime: Works with both Claude Code and OpenAI Codex CLI.
Two plugins: harness (core) + harness-sdlc-suite (software delivery domain skills).
Methodology: PDCA (Plan-Do-Check-Act) + two innovations:
The core harness can be used standalone with a custom profile -- no domain skill suite required. Domain suites add pre-built profiles with evaluation criteria, artifact taxonomies, and verification procedures.
References:
# Installs both core harness and SDLC suite
claude plugin install harness@harness
Then reload:
/reload-plugins
# Clone into your project
git clone https://github.com/xuzhijie-ownself/harness.git plugins/harness-repo
# Install (Mac / Linux / Git Bash)
bash plugins/harness-repo/install.sh
# Install (Windows CMD)
plugins\harness-repo\install.bat
The install script copies:
plugins/harness/skills/harness/plugins/harness-sdlc-suite/skills/ (6 skills)If using OpenAI Codex CLI, the plugin auto-loads from the .codex-plugin/plugin.json manifest:
# Clone the repo into your project
git clone https://github.com/xuzhijie-ownself/harness.git plugins/harness-repo
# Codex detects .codex-plugin/plugin.json automatically
# Skills from both plugins are loaded via dual skill paths
codex # start codex in the project directory
# Claude Code marketplace
claude plugin uninstall harness
# Local install (removes core + all domain skills)
bash plugins/harness-repo/install.sh --uninstall
# Marketplace
claude plugin update harness
# Local
cd plugins/harness-repo && git pull && bash install.sh
| Command | Purpose |
|---|---|
/harness:start | Scaffold harness for a new project (run once) |
/harness:session | Run one supervised sprint round |
/harness:run | Continuous coordinator-driven loop (unattended) |
/harness:reset | Checkpoint + handoff when context fills (Variant B) |
/harness:release | Version bump, changelog, and git tag |
| Agent | Spawned by | Reference |
|---|---|---|
| initializer | /harness:start | plugins/harness/skills/harness/roles/initializer.md |
| planner | /harness:start | plugins/harness/skills/harness/roles/planner.md |
| generator | /harness:session, coordinator | plugins/harness/skills/harness/roles/generator.md |
| evaluator | /harness:session, coordinator | plugins/harness/skills/harness/roles/evaluator.md |
| coordinator | /harness:run | plugins/harness/skills/harness/roles/coordinator.md |
| releaser | /harness:release, coordinator | plugins/harness/skills/harness/roles/releaser.md |
The harness follows an adversarial PDCA pattern: the generator produces artifacts (Do) and the evaluator grades them (Check). The generator cannot self-approve; the evaluator cannot edit product artifacts. This separation prevents the common failure mode where a model is too lenient grading its own work. Agent files are thin YAML wrappers -- all instructions live in role files as the single source of truth.
Every evaluation round applies a binary pass/fail Authenticity Gate after domain criteria scoring. This catches technically-competent-but-generic output -- artifacts that score adequately on domain criteria yet show no evidence of project-specific decision-making.
| Dimension | What it checks |
|---|---|
internal_consistency | Artifacts share consistent conventions -- structure, terminology, style form a unified whole |
intentionality | Evidence of project-specific decisions, not unmodified defaults or template output |
craft | Technical fundamentals correct -- hierarchy, structure, naming, formatting |
fitness_for_purpose | Deliverables usable by target audience without additional explanation |
The gate is dual-side: generators apply a pre-implementation checklist (prevention), evaluators apply a post-grading gate (detection). Any dimension failure fails the round regardless of domain scores.