From team-shinchan
Views agent evaluation history and detects performance regressions. Supports specific agent views, regression detection, and comparisons.
How this command is triggered — by the user, by Claude, or both
Slash command
/team-shinchan:evalThe summary Claude sees in its command listing — used to decide when to auto-load this command
# Eval Command View agent evaluation history and detect performance regressions. See `skills/eval/SKILL.md` for full documentation. ## Usage
View agent evaluation history and detect performance regressions.
See skills/eval/SKILL.md for full documentation.
/team-shinchan:eval
/team-shinchan:eval --agent bo
/team-shinchan:eval --regression
/team-shinchan:eval --compare
npx claudepluginhub seokan-jeong/team-shinchan --plugin team-shinchan/agent-evalRuns evaluation fixtures against review agents, grades JSON outputs against expected results for status, issues, and summary matches, and reports pass/fail accuracy table. Accepts --agent, --fixture, --trials, --verbose flags.
/improve-agentOptimizes AI agents through performance analysis, user feedback review, failure classification, baseline metrics generation, and prompt engineering enhancements.
/agent-reviewPerformance scorecard for LLM agents — lists all agents with invocation count, cost, and pass rate, or drills into one agent for cost analysis, failure modes, and prompt-tuning suggestions.
/improve-agentOptimizes existing agents via performance analysis (metrics, feedback, failures), baseline reporting, and prompt engineering (chain-of-thought, few-shot examples).
/evaluate-skillEvaluates recent skill executions from logs, prompts for 1-5 effectiveness rating, friction points, and improvement suggestions, then updates log entries with JSON feedback. Supports --all and --date options.
/compareCompares the current harness evaluation side by side with a previous one for history analysis.