From crucible
Contains evaluation data for measuring skill selection accuracy, including direct, negative, context-dependent, and cascade-ordering tests.
How this skill is triggered — by the user, by Claude, or both
Slash command
/crucible:skill-selection-evalsThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
This is not an executable skill. It contains evaluation data for measuring the accuracy of skill selection (routing) decisions.
This is not an executable skill. It contains evaluation data for measuring the accuracy of skill selection (routing) decisions.
Crucible's 49 execution evals measure quality once a skill is invoked. Selection evals measure whether the right skill gets invoked in the first place.
Each eval is rated easy/medium/hard based on routing ambiguity. This enables stratified baseline measurement — distinguishing between improvements that lift hard cases (high value) vs confirming easy cases already work (low signal).
evals/evals.json — the eval dataGRADING.md — grading criteria and baseline measurement protocolnpx claudepluginhub raddue/crucibleRuns evaluation pipelines on Claude Code skills to test triggering accuracy, workflow correctness, and output quality. Spawns sub-agents for parallel execution and generates JSON reports.
Evaluates a skill's effectiveness by running behavioral test cases and grading results against assertions. Use to validate improvements, benchmark against baselines, or create eval cases.
Executes skill evaluations against test cases, scores outputs with judges, and reports results. Use when testing a skill, benchmarking, detecting regressions, or verifying changes.