Use when running benchmark competitions or improving scoring accuracy. Used by pss-agent-profiler. Trigger with /pss-benchmark.
How this skill is triggered — by the user, by Claude, or both
Slash command
/perfect-skill-suggester:pss-benchmark-agentThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
Benchmark protocol for Opus agents competing to improve the PSS scoring engine. Defines report structure, tracking format, sacred parameters, and anti-patterns.
Benchmark protocol for Opus agents competing to improve the PSS scoring engine. Defines report structure, tracking format, sacred parameters, and anti-patterns.
rust/skill-suggester/src/main.rsdocs_dev/benchmark-v2-prompts-100.jsonl and docs_dev/benchmark-v2-gold-100.jsondocs_dev/methodology-improvement-history.mdcargo build --release in rust/skill-suggester/Write every file to $MAIN_ROOT/reports/pss-benchmark-agent/ — the main
repo root's reports folder, never the worktree's own. Both ./reports/
and ./reports_dev/ are gitignored project-wide. Resolve the path with
this shell prologue at the start of your Bash section:
MAIN_ROOT="$(git worktree list | head -n1 | awk '{print $1}')"
REPORT_DIR="$MAIN_ROOT/reports/pss-benchmark-agent"
mkdir -p "$REPORT_DIR"
TIMESTAMP="$(date +%Y%m%d_%H%M%S%z)" # local time + GMT offset, e.g. 20260421_183012+0200
REPORT_FILE="$REPORT_DIR/$TIMESTAMP-worktree-${AGENT_ID}-report.md"
LOG_FILE="$REPORT_DIR/$TIMESTAMP-worktree-${AGENT_ID}-benchmark-log.md"
docs_dev/methodology-improvement-history.md and current main.rs$REPORT_FILE (resolved via the prologue above)$LOG_FILE (same prologue)cargo test and cargo build --releaseToken savings: Use mcp__plugin_llm-externalizer_llm-externalizer__code_task (when available) to analyze benchmark logs and scoring engine source. Use chat to compare log snapshots in parallel. Reserve Opus reasoning for scoring algorithm changes.
Copy this checklist and track your progress:
cargo build --release && uv run scripts/pss_agent_benchmark.py --binary target/release/pss
$MAIN_ROOT/reports/pss-benchmark-agent/<TS±TZ>-worktree-{AGENT_ID}-report.md -- structured report with all mandatory sections$MAIN_ROOT/reports/pss-benchmark-agent/<TS±TZ>-worktree-{AGENT_ID}-benchmark-log.md -- per-prompt benchmark results (append-only)20260421_183012+0200). Both dirs are gitignored project-wide.rust/skill-suggester/src/main.rs -- with improvements to the scoring engineIf the benchmark script fails or produces unexpected output, check docs_dev/methodology-improvement-history.md for known issues.
rust/skill-suggester/src/main.rs — scoring enginedocs_dev/benchmark-v2-prompts-100.jsonl — promptsdocs_dev/benchmark-v2-gold-100.json — gold standarddocs_dev/methodology-improvement-history.md — historyCreates, edits, and optimizes skills for Claude Code, including drafting, evaluating with test prompts, iterating on performance, and improving skill descriptions for better triggering accuracy.
npx claudepluginhub emasoft/emasoft-plugins --plugin perfect-skill-suggester