From viv
Trace a simulation regression across vivarium repositories to identify the behavioral change causing it.
How this command is triggered — by the user, by Claude, or both
Slash command
/viv:model-regression-debugger Describe the regression symptom, repos/branches involved, and any researcher hypotheses.This command is limited to the following tools:
The summary Claude sees in its command listing — used to decide when to auto-load this command
Investigate a simulation regression described by: $ARGUMENTS The fan-out runs in this main-session context (Claude sub-agents cannot spawn further sub-agents, so the `model_regression_debugger` orchestrator agent cannot do this on its own — that's why this slash command exists). ## Phase 1 — Scope the Problem Gather from $ARGUMENTS (and ask the user for anything missing): - **Symptom**: What metric is wrong and in what direction? (e.g., "incidence too low", "mortality underestimated by 15%") - **Repos**: Which repositories are involved? - **Affected entities**: Which diseases, risks, or...
Investigate a simulation regression described by: $ARGUMENTS
The fan-out runs in this main-session context (Claude sub-agents cannot
spawn further sub-agents, so the model_regression_debugger orchestrator
agent cannot do this on its own — that's why this slash command exists).
Gather from $ARGUMENTS (and ask the user for anything missing):
The goal is to establish a good ref and a bad ref for each relevant repository.
git log --after=... --before=... --oneline to find candidate boundary commits.git bisect (with git bisect run if a quick test exists, manual otherwise).For each repository with established good/bad refs, invoke a
_diff_analyzer sub-agent in parallel (one Agent call per repo, all
in a single message). For each, provide:
Wait for all diff analyses to complete before proceeding.
Starting from the affected output metric, trace backward through the code to find where old and new behavior diverge:
git show <ref>:<path> and the current code side by sideFrom the diff analyses and data flow tracing, formulate specific
hypotheses. Then invoke a _hypothesis_tester sub-agent in parallel
for each hypothesis (one Agent call per hypothesis, all in a single
message). For each, provide:
Collect all verdicts (CONFIRMED / REFUTED / INCONCLUSIVE) before proceeding.
Structure findings with these sections:
When analyzing vivarium framework code, avoid these mistakes:
sim.get_value() for AttributePipelines. AttributePipelines are read via population_view.get(index, pipeline_name).register_* calls.pivot_categorical and other data transformation utilities. Their signatures and behavior may have changed.npx claudepluginhub ihmeuw/vivarium-suite --plugin viv