Skill

rl-experiment-plan

Turn an audited RL task into a user-confirmed experiment plan with metrics, tuning scope, GPU budget, launcher commands, report skeleton, and first baseline/ablation proposal. Use before launching RL training jobs.

Invocation

How this skill is triggered — by the user, by Claude, or both

Slash command

/rl-experiment-assistant:rl-experiment-plan

User invocable

Model invocable

Inline context

Default effort

Context Preview

The summary Claude sees in its skill listing — used to decide when to auto-load this skill

Create a consensus-gated experiment plan for RL training. This skill runs after `rl-task-audit` or after the user provides equivalent task/context details.

SKILL.md

105 lines · ~1.2k tokens

Stats

LanguagePython

Parent stars0

MaintenanceGood

Last CommitMay 14, 2026

Actions

View Source View Plugin View on GitHub View README

Stats

Actions

Non-negotiable gates

Before any GPU-consuming run, obtain or record user confirmation for:

Task definition.
Primary metric and guardrail metrics.
Tuning scope: allowed reward parameters, reward code edits, curriculum/adaptive-sampling changes, domain-randomization changes, disallowed changes.
Budget: max GPU-hours, wall-clock per run, max iterations, seeds, parallelism.
Hardware: machine names or GPU IDs, launcher type, queue constraints.
Baseline command and evaluation command.
Contract file: .rlxp/contract.yaml records the launch gate and explicitly lists missing confirmations.

If confirmation is absent, create/update .rlxp/report.md and list exact missing confirmations. Do not launch training.

Run initialization and report setup yourself when local shell access and edit permission are available. The user-facing interface is the plugin prompt, not manual execution of bundled Python scripts.

Contract status model

draft_blocked: generated default; training_allowed: false; required confirmations missing.
ready_for_user_confirmation: audit/plan filled enough fields for user review; launch still blocked.
approved_for_launch: all required confirmations and approval record are explicit. training_allowed is derived from those records and applies only to queued experiments inside approved scope/budget.

Planning principles

Prefer baseline replication, short smoke test, or metric instrumentation over immediate creative reward rewrites.
Use held-out evaluation whenever possible. Do not optimize only training reward.
Define a scalar comparison score while keeping component metrics visible.
Use at least one safety/physical-plausibility guardrail for robotics.
Keep each experiment minimally confounded unless the user explicitly allows bundled changes.
Prefer reward parameter tuning before reward engineering; prefer curriculum/DR changes only when diagnostics indicate distribution/difficulty mismatch.

Report setup

Create .rlxp/ if missing by running the bundled helper internally:

python <plugin-root>/scripts/rlxp_init.py --root . --project-name <name>

For Holosoma scene traversal, when the user supplies equivalent setup details, prefer split command/fragments to avoid duplicate config fragments:

python <plugin-root>/scripts/rlxp_init.py \
  --root . \
  --project-name holosoma-scene-traversal \
  --profile holosoma-scene-traversal \
  --mamba-env hssim \
  --train-command "python -m holosoma.train_agent" \
  --exp-fragment "exp:g1-29dof-scene-traversal-hurdle" \
  --logger-fragment "logger:wandb-packman-scene-traversal"

Record raw and runnable commands separately:

python -m holosoma.train_agent exp:g1-29dof-scene-traversal-hurdle logger:wandb-packman-scene-traversal
mamba run -n hssim python -m holosoma.train_agent exp:g1-29dof-scene-traversal-hurdle logger:wandb-packman-scene-traversal

Then update:

.rlxp/adapter.yaml using canonical templates/adapter-template.yaml fields.
.rlxp/contract.yaml using canonical templates/contract-template.yaml status and confirmation fields.
.rlxp/report.md.
.rlxp/experiments.yaml.

If the current working directory is not the target RL repository, use the absolute target path for --root and write updates there. Do not write generated .rlxp/ state into the plugin package or installed plugin cache.

First-run proposal template

The first proposal should usually be one of:

Baseline replication: same command as existing repo baseline, fixed seed(s), parse metrics.
Short smoke test: small env count / short horizon to validate command, logging, and metric extraction.
Metric instrumentation run: no training change; add logging for reward components, terminations, tracking errors, contacts, or per-scene bins.

Only propose reward changes after baseline evidence exists, unless the codebase already has a known failing baseline and the user explicitly approves direct intervention.

Output

Return:

# Experiment Plan

## Confirmed objective
## Metrics and score formula
## Guardrails
## Approved tuning scope
## Budget and hardware
## Baseline command
## Evaluation command
## Contract gate
## First experiment proposal
## Report files created/updated
## Items still requiring confirmation

rl-experiment-plan

Invocation

Context Preview

SKILL.md

rl-experiment-plan

Invocation

Context Preview

SKILL.md

Non-negotiable gates

Contract status model

Planning principles

Report setup

First-run proposal template

Output

Similar Skills

Non-negotiable gates

Contract status model

Planning principles

Report setup

First-run proposal template

Output

Similar Skills