Execute the evidence-gated RL experiment loop: launch approved training/eval jobs, parse results, update the report, decide accept/reject/inconclusive, and propose the next experiment within the approved budget.
How this skill is triggered — by the user, by Claude, or both
Slash command
/rl-experiment-assistant:rl-experiment-loopThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
Run the iterative experiment loop only after `.rlxp/contract.yaml` and `.rlxp/report.md` record task, metrics, tuning scope, budget, hardware, baseline command, and evaluation protocol.
Run the iterative experiment loop only after .rlxp/contract.yaml and .rlxp/report.md record task, metrics, tuning scope, budget, hardware, baseline command, and evaluation protocol.
The agent owns the operational loop. Run approved shell commands, helper scripts, metric parsers, report updates, and subagent reviews yourself when available; ask the user only for missing approvals, credentials, hardware access, or policy decisions. Do not turn bundled Python helper invocations into manual user instructions.
Use these files as the source of truth:
.rlxp/adapter.yaml: canonical repo-specific command/config/log mapping..rlxp/contract.yaml: launch gate for confirmed task, metrics, tuning scope, budget, hardware, baseline, and eval protocol..rlxp/report.md: living human-readable report..rlxp/experiments.yaml: proposed and approved experiment queue..rlxp/ledger.jsonl: append-only machine-readable run ledger..rlxp/runs/<experiment_id>/: logs, command, config diff, metrics, checkpoints, videos, result summary.If .rlxp/contract.yaml is missing, incomplete, contradicts .rlxp/adapter.yaml, lacks status: approved_for_launch, or lacks derived training_allowed: true, stop and ask for confirmation. Do not treat an approved experiment entry as sufficient by itself.
For each approved experiment:
Re-read .rlxp/contract.yaml immediately before launch.
Verify contract status is approved_for_launch, all required confirmations are explicit, approval record is complete, budget remains, hardware target is available, and the queued experiment is inside approved scope.
Re-read .rlxp/adapter.yaml; prefer commands.train_command_runnable or commands.runnable_train_template.
Create an isolated run directory and, if editing code, use a git branch/worktree or clean patch when the target repo is version-controlled.
Record command, git commit if applicable, diff, environment, seed(s), GPU IDs, and expected output paths before launch.
For Holosoma scene traversal, the canonical raw command is:
python -m holosoma.train_agent exp:g1-29dof-scene-traversal-hurdle logger:wandb-packman-scene-traversal
and the runnable command is:
mamba run -n hssim python -m holosoma.train_agent exp:g1-29dof-scene-traversal-hurdle logger:wandb-packman-scene-traversal
Re-read the contract again after any code/config patch. Block if scope, baseline, eval protocol, or budget no longer match.
Run dry-run/smoke-test only if it is approved by the contract and inside budget.
Launch training only within approved budget.
Capture stdout/stderr and structured logs.
Run evaluation using the approved eval protocol.
Parse metrics, using scripts/rlxp_score.py internally when it fits the log format, and update .rlxp/report.md and .rlxp/ledger.jsonl.
When subagents are available, use independent reviewers for metric validity and reward/curriculum/DR risk before accepting a run or launching a changed reward/curriculum configuration.
Classify each run as:
Never accept a change solely because training reward increased.
Choose the next experiment from evidence:
After each loop, produce:
# Experiment Update
## Run summary
## Metric comparison against baseline/current best
## Guardrail status
## Decision
## Evidence for decision
## Remaining budget
## Next proposed experiment
## Report updates
Guides creation, editing, and verification of skills for AI coding agents using test-driven development with subagent scenarios. Use when authoring or debugging skills.
npx claudepluginhub junhyekh/rlxp --plugin rl-experiment-assistant