From evalview
Generates EvalView test cases from SKILL.md files using LLM, captures real agent interactions as tests, or creates individual test YAMLs manually.
How this skill is triggered — by the user, by Claude, or both
Slash command
/evalview:generate-testsThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
Use this skill when the user wants to create test cases for their AI agent or skill without writing YAML by hand.
Use this skill when the user wants to create test cases for their AI agent or skill without writing YAML by hand.
Use the generate_skill_tests MCP tool to auto-generate a test suite from a skill definition. This reads the SKILL.md and produces YAML test cases covering explicit triggers, implicit triggers, contextual triggers, and negative cases.
Steps:
generate_skill_tests with:
skill_path: path to the SKILL.md fileoutput_path (optional): where to save the generated YAMLcount (optional): number of test cases (default: 10)run_skill_test.CLI equivalent:
evalview skill generate-tests .claude/skills/my-skill/SKILL.md --auto
evalview skill generate-tests .claude/skills/my-skill/SKILL.md -c 20 -o tests/my-skill-tests.yaml
Use the create_test MCP tool to create a single test YAML file from a description.
Steps:
create_test with the parameters.run_snapshot to establish the golden baseline.Use the CLI evalview capture command to proxy real agent traffic and save interactions as test YAMLs automatically. This records the query, output, and tool calls from live usage.
CLI equivalent:
evalview capture --agent http://localhost:8080/execute --output-dir tests/test-cases
evalview capture --multi-turn # saves all turns as one multi-turn conversation test
Use validate_skill to check a SKILL.md for correct structure and completeness before generating tests from it.
After generating tests, execute them with run_skill_test:
test_file: path to the generated YAMLno_rubric: true for fast deterministic-only checks (no LLM cost)verbose: true for detailed output on all testsCLI equivalent:
evalview skill test tests/my-skill-tests.yaml
evalview skill test tests/my-skill-tests.yaml --no-rubric # fast, $0
evalview skill test tests/my-skill-tests.yaml --verbose --model claude-sonnet-4-20250514
npx claudepluginhub hidai25/eval-viewGenerates evaluation test cases for skills by analyzing skill config and metadata. Bootstraps datasets or expands existing ones for /eval-run.
Scaffolds pytest smoke tests and runs behavioral tests for Claude Code skills in Docker harness. Generates golden files, runs pytest, reports LLM verdicts and costs.
Tests and benchmarks Claude Code skills empirically via evaluation-driven development. Compares skill vs baseline performance using pass rates, timing, token metrics in quick workflow or 7-phase full pipeline.