By matantsach
Eval runner for AI skills. Design test scenarios, run with/without skill comparisons, grade assertions, iterate on quality.
Set up evaluations for an AI skill from scratch — designs test scenarios, writes evals.json, and runs the first benchmark. Use when no evals exist yet and the user wants to evaluate, test, benchmark, or review a skill. Triggers on "evaluate my skill", "test my skill", "set up evals", "how good is my skill", "benchmark this skill", "create evals for", or any request to assess skill quality when there is no existing evals/evals.json file.
Run and iterate on existing skill evaluations. Use when evals/evals.json already exists and the user wants to run evals, re-evaluate after skill changes, check results, compare iterations, add/modify eval cases, or gate CI with thresholds. Triggers on "run evals", "re-eval", "how did it do", "check results", "compare iterations", "run benchmarks", or any eval-related request when evals already exist.
Own this plugin?
Verify ownership to unlock analytics, metadata editing, and a verified badge. GitHub access is read-only (username + org membership).
Sign in to claimOwn this plugin?
Verify ownership to unlock analytics, metadata editing, and a verified badge. GitHub access is read-only (username + org membership).
Sign in to claimBased on adoption, maintenance, documentation, and repository signals. Not a security audit or endorsement.
npx claudepluginhub matantsach/snapevalWatchdog plugin that detects stuck, looping, or stalled agents and nudges them back on track.
Talent scouting system for seed-stage funds — discovers, scores, and tracks pre-founders using LinkedIn intelligence and OSINT enrichment. Built for Israeli cybersecurity, configurable for other verticals.
Benchmark, evaluate, and optimize skills to ensure reliable performance across all LLMs
Agent and skill evaluation harness with MLflow integration
Representation Synthesis workflow for auditing agent skills in Claude Code.
Skill evaluation and benchmarking - test skill effectiveness with behavioral eval cases, grade results, and track quality improvements
Self-evolving skill engine for Claude Code. Creates, scores, repairs, and hardens skills autonomously through recursive improvement cycles.
Professional skill creation with TDD workflow. Features dual-mode (fast/full), behavioral validation, and automated quality gates for 9.0/10+ scores.