From trine-eval
Initialize eval-driven development with Planner-Generator-Evaluator architecture
How this skill is triggered — by the user, by Claude, or both
Slash command
/trine-eval:harness-kickoffThis skill is limited to the following tools:
The summary Claude sees in its skill listing — used to decide when to auto-load this skill
You are initializing the eval-driven development harness. Follow these steps exactly.
You are initializing the eval-driven development harness. Follow these steps exactly.
The user may pass --mode minimal (or --mode standard, the default) in $ARGUMENTS. Parse it out of the prompt before treating the remainder as the product description.
standard (default): runs the full Planner-Generator-Evaluator harness. All components_enabled flags default to true.minimal: collapses the Planner and Generator into main-thread drafting for roughly 25–40% token savings per sprint. The Evaluator subagent still runs forked — this preserves Generator-Evaluator separation, which is the playbook's core independence guarantee. Contract negotiation still runs (the Evaluator's contract review is the cheapest high-leverage gate).Remember the parsed mode; it drives config defaults in Step 2 and the Planner gate in Step 3.
Check if .harness/config.json already exists. If it does, read it and confirm with the user whether to reinitialize or resume.
If no config exists, determine the project type by:
CLAUDE.md, package.json, pyproject.toml, Cargo.toml, go.mod, or similarweb-app, rag-system, cli-tool, api-service, eval-harness, or harness-build (for agent runtime harnesses evaluated against the playbook stages). For harness-adjacent projects: choose eval-harness (meta layer — grading the eval methodology itself) when the primary deliverable is the eval infrastructure such as contract format, grader hierarchy, or sprint workflow; choose harness-build (runtime layer — grading the agent control plane) when the primary deliverable is the agentic loop, tool registry, sandboxing, or governance. See plugins/trine-eval/skills/eval-rubric/rubrics/README.md for the full decision guide.Create the following structure:
.harness/
├── config.json
├── sprint-state.json
├── contracts/
├── evals/
├── progress.md
Write config.json with detected project type and defaults.
Use the following project_type → rubric mapping to fill the "rubric" field:
web-app → "rubric": "web-app"rag-system → "rubric": "rag-system"cli-tool → "rubric": "cli-tool"api-service → "rubric": "api-service"eval-harness → "rubric": "eval-harness"harness-build → "rubric": "harness-build"The mode field and the four components_enabled flags (generator_subagent, per_sprint_aci_review, calibration_writes, plus the existing planner) together determine which parts of the harness run as subagents versus in the main thread.
For --mode standard (default), all components_enabled flags are true:
{
"mode": "standard",
"project_type": "<detected>",
"rubric": "<matching rubric name — see routing table above>",
"max_retries": 3,
"pass_threshold": {
"per_dimension_minimum": 2,
"critical_dimensions": ["functionality"],
"critical_minimum": 3
},
"contract_negotiation_rounds": 2,
"git_checkpoint": true,
"components_enabled": {
"planner": true,
"generator_subagent": true,
"contract_negotiation": true,
"sprint_decomposition": true,
"per_sprint_aci_review": true,
"calibration_writes": true,
"eval_summary": true
}
}
For --mode minimal, set mode: "minimal" and flip planner, generator_subagent, per_sprint_aci_review, and calibration_writes to false. Leave contract_negotiation, sprint_decomposition, and eval_summary as true — contract negotiation is the cheapest high-leverage gate and should stay on even in minimal mode.
Initialize sprint-state.json with machine-readable state:
{
"current_sprint": 1,
"sprints": [],
"last_updated": "<ISO 8601 timestamp>"
}
As sprints complete, each entry in sprints will be populated:
{
"number": 1,
"title": "Sprint title from sprints.json",
"status": "pass",
"rounds": 2,
"criteria_passed": 8,
"criteria_total": 10,
"weighted_score": 85
}
Initialize progress.md:
# Harness Progress Log
## Initialized
- Date: <current date>
- Project type: <type>
- Rubric: <rubric>
If .harness/bootstrap/failure-catalog.json exists, read it and pass the failure data to the Planner:
critical severity should appear as criteria in the first sprintIf the catalog does not exist, skip this step. The bootstrap is optional — the harness works without it.
The user's prompt (the text they provided when invoking /harness-kickoff, minus any --mode argument) is the product description. Produce .harness/spec.md and .harness/sprints.json — which route depends on components_enabled.planner in the config you just wrote.
components_enabled.planner is true — standard mode)Spawn the Planner subagent using the Agent tool:
planner agent definition.harness/spec.md and .harness/sprints.jsoncomponents_enabled.planner is false — minimal mode)Do not spawn the Planner subagent. Instead, draft the spec and sprint plan directly in the main thread, producing the same two artifacts (.harness/spec.md and .harness/sprints.json) in the same format — only the author changes, not the file shape or downstream consumers.
.harness/spec.md with the sections defined in agents/planner.md: Product Vision (2–3 sentences), Feature List grouped Must/Should/Nice-to-have, User Interaction Patterns, Technical Constraints drawn from the detected stack, and Success Criteria (testable, unambiguous pass/fail)..harness/sprints.json matching the planner schema: {"sprints": [{"number", "title", "features", "estimated_complexity": "low"|"medium"|"high", "dependencies": [sprint numbers]}]}. Aim for 3–8 sprints, ordered so each builds on the last, each completable in one context window.After either path, verify both files exist and are well-formed:
spec.md should have a product vision, feature list, and success criteriasprints.json should parse as valid JSON with a sprints arrayIf config.json has git_checkpoint: true:
git add .harness
git commit -m "harness: initialize spec and sprint plan"
Print:
/harness-sprint to begin, or edit .harness/sprints.json to adjust the plan.".harness/ directory should be committed to version control so progress survives across sessionsThis kickoff skill uses just-in-time (JIT) context retrieval: each step reads only the minimal set of files needed at that point. Context reads are deferred to the step that actually requires them rather than front-loaded at session start.
Deferrable reads in this skill:
bootstrap/failure-catalog.json: deferred until Step 2b, and only if it exists (lazy read)spec.md and sprints.json: deferred until Step 5 when the Planner has already written themReading all available files at Step 1 would waste context window on content that is not yet needed — for example, reading spec.md before the Planner has even written it would fail, and reading the entire project codebase to detect project type is unnecessary when manifest files suffice.
npx claudepluginhub ats-kinoshita-iso/trine-evalProvides behavioral guidelines to reduce common LLM coding mistakes, focusing on simplicity, surgical changes, assumption surfacing, and verifiable success criteria.
Searches, retrieves, and installs Agent Skills from prompts.chat registry using MCP tools like search_skills and get_skill. Activates for finding skills, browsing catalogs, or extending Claude.
Creates, edits, and optimizes skills for Claude Code, including drafting, evaluating with test prompts, iterating on performance, and improving skill descriptions for better triggering accuracy.