From mycelium
A/B tests CLAUDE.md instruction changes against optimization and holdout eval benchmarks. Captures baselines, tests variants, generates comparison reports flagging overfitting.
How this skill is triggered — by the user, by Claude, or both
Slash command
/mycelium:prompt-optimizerThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
Systematically improve Mycelium instructions through measurement. Adapted from n-trax.
Systematically improve Mycelium instructions through measurement. Adapted from n-trax.
baseline -- Capture current performance/mycelium:eval-runner run-split optimization — record as optimization scores/mycelium:eval-runner run-split holdout — record as holdout scores.claude/optimization/baseline.json: timestamp, CLAUDE.md hash, optimization metrics, holdout metrics, overall and per-category metricstest <variant> -- Test a variant.claude/optimization/variants/<variant>.md/mycelium:eval-runner run-split optimization — this is the hill-climbing signal/mycelium:eval-runner run-split holdout — this validates generalization.claude/optimization/results/<variant>.jsonreport -- Compare all variantsGenerate comparison table with split-aware columns:
| Variant | Opt Pass Rate | Holdout Pass Rate | Delta Opt | Delta Holdout | Overfit? | Decision |
Flag Overfit? = YES when optimization delta is positive but holdout delta is negative.
exemplar <eval-name> -- Capture winning trajectoryAfter a clean eval win (1 iteration, fast), save the approach to .claude/optimization/exemplars/.
npx claudepluginhub haabe/mycelium --plugin myceliumRuns autonomous optimization loops to iteratively improve prompts, templates, configs, or code using four-way separation of main agent, eval agent, test runner, and deterministic eval.py judge. Invoke via /autoresearch or 'optimize this prompt'.
Designs test cases, adversarial inputs, and iterates on prompts based on eval results. Useful for prompt-engineering tasks like drafting, testing, and refining prompts and skills.