From workflow-optimizer
Run a workflow N times and compute aggregate metrics (success rate, duration, failures). Standalone — does not invoke other skills.
How this skill is triggered — by the user, by Claude, or both
Slash command
/workflow-optimizer:measureThis skill is limited to the following tools:
The summary Claude sees in its skill listing — used to decide when to auto-load this skill
1. **workflow-path** — path to the workflow definition (workflow.md)
Read the workflow.md file. Extract:
If the workflow has a setup step, run it first.
For each run (1..N):
fixtures[i % fixtures.length]Run {i}:
success: PASS / FAIL
fixture: {fixture_id}
duration_ms: {wall clock time}
error: {error message or "none"}
output: {first 500 chars of stdout}
If a run exceeds the timeout, mark it as FAIL with error "TIMEOUT".
For workflows with no shared state between runs, use the Agent tool to run subsets concurrently (up to 3 agents). Each agent writes results to .workflow-optimizer/{workflow-id}/runs/run-{i}.md.
Compute across all N runs:
| Metric | Formula |
|---|---|
| Success rate | pass_count / N |
| Avg duration | mean(duration_ms) |
| Failure distribution | count per error type |
============================================================
{workflow-name} — {N} runs
============================================================
Success Rate : XX.X%
Avg Duration : XX.Xs
Failures:
TIMEOUT 2
TIMING 1
============================================================
Write results to .workflow-optimizer/{workflow-id}/baseline.md with the aggregate metrics and per-run details.
npx claudepluginhub yihan2099/workflow-optimizer --plugin workflow-optimizerRuns performance benchmarks for agentic-flow worker systems, including trigger detection, registry CRUD, agent selection, model cache, concurrent workers, and memory key generation. Use when diagnosing worker performance or comparing configurations.
Benchmarks Claude Code skill performance via multiple trials per eval, tracking pass rate, execution time, token usage, and variance. Aggregates to benchmark.json and generates version comparison reports. Use for 'benchmark skill' or performance tracking queries.
Runs evaluation scenarios to benchmark agent performance via reflexion loops, validates success criteria, records metrics, generates reports, and proposes new evals from logs.