From trine-eval
Initialize eval-driven development with Planner-Generator-Evaluator architecture
How this skill is triggered — by the user, by Claude, or both
Slash command
/trine-eval:harness-kickoffThis skill is limited to the following tools:
The summary Claude sees in its skill listing — used to decide when to auto-load this skill
You are initializing the eval-driven development harness. Follow these steps exactly.
You are initializing the eval-driven development harness. Follow these steps exactly.
The user may pass --mode minimal (or --mode standard, the default) in $ARGUMENTS. Parse it out of the prompt before treating the remainder as the product description.
standard (default): runs the full Planner-Generator-Evaluator harness. All components_enabled flags default to true.minimal: collapses the Planner and Generator into main-thread drafting for roughly 25–40% token savings per sprint. The Evaluator subagent still runs forked — this preserves Generator-Evaluator separation, which is the playbook's core independence guarantee. Contract negotiation still runs (the Evaluator's contract review is the cheapest high-leverage gate).Remember the parsed mode; it drives config defaults in Step 2 and the Planner gate in Step 3.
Check if .harness/config.json already exists. If it does, read it and confirm with the user whether to reinitialize or resume.
If no config exists, determine the project type by:
CLAUDE.md, package.json, pyproject.toml, Cargo.toml, go.mod, or similar project filesweb-app, rag-system, cli-tool, or api-serviceCreate the following structure:
.harness/
├── config.json
├── sprint-state.json
├── contracts/
├── evals/
├── progress.md
Write config.json with detected project type and defaults. The mode field and the four new components_enabled flags (generator_subagent, per_sprint_aci_review, calibration_writes, plus the existing planner) together determine which parts of the harness run as subagents versus in the main thread.
For --mode standard (default), all components_enabled flags are true:
{
"mode": "standard",
"project_type": "<detected>",
"rubric": "<matching rubric name>",
"max_retries": 3,
"pass_threshold": {
"per_dimension_minimum": 2,
"critical_dimensions": ["functionality"],
"critical_minimum": 3
},
"contract_negotiation_rounds": 2,
"git_checkpoint": true,
"components_enabled": {
"planner": true,
"generator_subagent": true,
"contract_negotiation": true,
"sprint_decomposition": true,
"per_sprint_aci_review": true,
"calibration_writes": true,
"eval_summary": true
}
}
For --mode minimal, set mode: "minimal" and flip planner, generator_subagent, per_sprint_aci_review, and calibration_writes to false. Leave contract_negotiation, sprint_decomposition, and eval_summary as true — contract negotiation is the cheapest high-leverage gate and should stay on even in minimal mode.
Initialize sprint-state.json with machine-readable state:
{
"current_sprint": 1,
"sprints": [],
"last_updated": "<ISO 8601 timestamp>"
}
As sprints complete, each entry in sprints will be populated:
{
"number": 1,
"title": "Sprint title from sprints.json",
"status": "pass",
"rounds": 2,
"criteria_passed": 8,
"criteria_total": 10,
"weighted_score": 85
}
Initialize progress.md:
# Harness Progress Log
## Initialized
- Date: <current date>
- Project type: <type>
- Rubric: <rubric>
If .harness/bootstrap/failure-catalog.json exists, read it and pass the failure data to the Planner:
critical severity should appear as criteria in the first sprintIf the catalog does not exist, skip this step. The bootstrap is optional — the harness works without it.
The user's prompt (the text they provided when invoking /harness-kickoff, minus any --mode argument) is the product description. Produce .harness/spec.md and .harness/sprints.json — which route depends on components_enabled.planner in the config you just wrote.
components_enabled.planner is true — standard mode)Spawn the Planner subagent using the Agent tool:
planner agent definition.harness/spec.md and .harness/sprints.jsoncomponents_enabled.planner is false — minimal mode)Do not spawn the Planner subagent. Instead, draft the spec and sprint plan directly in the main thread, producing the same two artifacts (.harness/spec.md and .harness/sprints.json) in the same format — only the author changes, not the file shape or downstream consumers.
.harness/spec.md with the sections defined in agents/planner.md: Product Vision (2–3 sentences), Feature List grouped Must/Should/Nice-to-have, User Interaction Patterns, Technical Constraints drawn from the detected stack, and Success Criteria (testable, unambiguous pass/fail)..harness/sprints.json matching the planner schema: {"sprints": [{"number", "title", "features", "estimated_complexity": "low"|"medium"|"high", "dependencies": [sprint numbers]}]}. Aim for 3–8 sprints, ordered so each builds on the last, each completable in one context window.After either path, verify both files exist and are well-formed:
spec.md should have a product vision, feature list, and success criteriasprints.json should parse as valid JSON with a sprints arrayIf config.json has git_checkpoint: true:
git add .harness
git commit -m "harness: initialize spec and sprint plan"
Print:
/harness-sprint to begin, or edit .harness/sprints.json to adjust the plan.".harness/ directory should be committed to version control so progress survives across sessionsnpx claudepluginhub ats-kinoshita-iso/trine-eval --plugin trine-evalProvides behavioral guidelines to reduce common LLM coding mistakes, focusing on simplicity, surgical changes, assumption surfacing, and verifiable success criteria.
Searches, retrieves, and installs Agent Skills from prompts.chat registry using MCP tools like search_skills and get_skill. Activates for finding skills, browsing catalogs, or extending Claude.
Creates, edits, and optimizes skills for Claude Code, including drafting, evaluating with test prompts, iterating on performance, and improving skill descriptions for better triggering accuracy.