From mh
Run the evaluation suite on the current harness or a specific candidate run. Reports deterministic check results and LLM-judge assessment.
How this skill is triggered — by the user, by Claude, or both
Slash command
/mh:harness-evalThis skill is limited to the following tools:
The summary Claude sees in its skill listing — used to decide when to auto-load this skill
Run the evaluation suite to measure harness quality.
Run the evaluation suite to measure harness quality.
python3 ${CLAUDE_PLUGIN_ROOT}/scripts/eval_runner.py --eval-dir ${CLAUDE_PLUGIN_ROOT}/eval-tasks --cwd . 2>&1 || echo "Eval runner not available"
llm_judge criteria, evaluate the criteria manually:
If a specific run_id was provided as $ARGUMENTS, evaluate that candidate's artifacts in runs/{run_id}/. Otherwise, evaluate the current harness state.
npx claudepluginhub yannabadie/meta-harness-ygnProvides UI/UX resources: 50+ styles, color palettes, font pairings, guidelines, charts for web/mobile across React, Next.js, Vue, Svelte, Tailwind, React Native, Flutter. Aids planning, building, reviewing interfaces.
Fetches up-to-date documentation from Context7 for libraries and frameworks like React, Next.js, Prisma. Use for setup questions, API references, and code examples.