From eddie
Fifth phase of EDDIE. Continuous companion to Implement (writes per-slice integration tests as each slice is built) plus a final wrap-up pass (E2E for critical user journeys, optional LLM-judge for AI-output features, project-wide regression check across all runs). Defaults to Kent C. Dodds' Testing Trophy. Maintains per-run RTM and aggregates into project-wide RTM. Adaptive for non-software runs (human-observation rubric).
How this skill is triggered — by the user, by Claude, or both
Slash command
/eddie:eddie-evaluateThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
You are the Evaluate phase. You run in two modes: **per-slice** (called from Implement after each vertical slice) and **wrap-up** (called as the final phase after Implement completes).
You are the Evaluate phase. You run in two modes: per-slice (called from Implement after each vertical slice) and wrap-up (called as the final phase after Implement completes).
tests/<run-slug>/).Non-negotiable for every interview interaction:
Anti-pattern: Numbered question lists. Always one at a time.
These supplement the canonical interview rules above; they do not override them.
software-app and your stack uses TypeScript — I'd default to Playwright. Push back if you want Cypress?" Auto-detect project type and framework from package.json / requirements.txt before asking.<prior-run>. Skipping means you accept that requirement is now broken. Confirm?"If invoked with --slice <Req-ID>: per-slice mode (Step A only).
Otherwise: wrap-up mode (Steps B + C + D).
eddie/<run-slug>/prd.mdeddie/<run-slug>/architecture-design.mdeddie/<run-slug>/evaluation/rtm.md (create if missing using templates/rtm-template.md)eddie/rtm.md (project-wide; create if missing)eddie/<run-slug>/.eddie-config.jsoncodegen)User can override at first invocation. Persist override choice in .eddie-config.json under evaluation.framework.
Called as /eddie:evaluate --slice <Req-ID>.
tests/<run-slug>/<slice-name>.spec.<ext> with a comment header listing the Req IDs it covers.eddie/<run-slug>/evaluation/rtm.md:
| <Req ID> | <PRD Section> | Integration | <test file> | <test name> | passing |
/eddie:implement so the next slice can begin.B1 — Static layer.
tsconfig.json strict mode + ESLint config exists; if missing, scaffold a minimal one and add the lint command to CI.B2 — Integration layer (already built per-slice during Implement).
B3 — E2E critical-journey layer (web apps only).
npx playwright codegen <url> to record interactively.tests/<run-slug>/e2e/.E2E.C1 — Unit tests (only if PRD has pure-logic features).
tests/<run-slug>/unit/.Unit.C2 — LLM-as-judge (only if PRD declares an AI-output feature).
templates/llm-judge-rubric-template.md:
LLM-Judge.C3 — Visual regression — explicit skip in v1. If user requests, walk them through Percy free-tier setup and add to a follow-up run.
tests/<run-slug>/ directory across all prior runs.supersedes: <Req-ID>) and archive the old test (move to tests/_archived/<run-slug>/).eddie/rtm.md (project-wide).Phase
evaluatecomplete. Output: per-run RTM ateddie/<run-slug>/evaluation/rtm.mdand project-wide RTM ateddie/rtm.md. Full project test suite: /. Three options:
- Mark run done — finalize this EDDIE round
- Revise any test or layer
- Stop here
On "mark run done": update .eddie-config.json (phase_status.evaluate = "done"), update eddie/index.md, clear .eddie-current (or keep for resume convenience).
For craft-physical, process-redesign, research-doc:
eddie/<run-slug>/evaluation/observation-rubric.md.Do not mark a run done if:
Provides behavioral guidelines to reduce common LLM coding mistakes, focusing on simplicity, surgical changes, assumption surfacing, and verifiable success criteria.
Searches, retrieves, and installs Agent Skills from prompts.chat registry using MCP tools like search_skills and get_skill. Activates for finding skills, browsing catalogs, or extending Claude.
Creates, edits, and optimizes skills for Claude Code, including drafting, evaluating with test prompts, iterating on performance, and improving skill descriptions for better triggering accuracy.
npx claudepluginhub m3m0ng/eddie --plugin eddie