Audit an RL training codebase to infer task definition, training/eval commands, rewards, terminations, curriculum, domain randomization, logs, and metrics before planning experiments. Use for robotics/RL codebases, reward tuning, curriculum design, domain-randomization design, or ambiguous task requests.
How this skill is triggered — by the user, by Claude, or both
Slash command
/rl-experiment-assistant:rl-task-auditThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
You are auditing an RL training repository. Your output is a grounded task card, not an experiment proposal yet.
You are auditing an RL training repository. Your output is a grounded task card, not an experiment proposal yet.
templates/adapter-template.yaml, especially environment.manager, environment.command_prefix, commands.train_entrypoint, commands.train_command_raw, commands.train_command_runnable, commands.fragments, paths, metrics, hardware, and contract_gate.required_before_training.mamba, simulators, W&B/network calls, or target train modules from this skill..rlxp/contract.yaml as the launch authority. Audit/setup may only create status: draft_blocked or prepare ready_for_user_confirmation; it must not approve launch.Resolve <plugin-root> to the installed rl-experiment-assistant plugin directory:
<plugin-root>.${CLAUDE_PLUGIN_ROOT} when present; otherwise infer from the loaded plugin/skill path.Look for:
train.py, train_agent.py, runner.py, scripts/train*, launch*.eval.py, eval_agent.py, play.py, replay.py, rollout.py.reward, rewards, terms, tracking, contact, penalty, regularization, alive, termination.curriculum, difficulty, level, adaptive, sampler, hard, success.randomization, domain_rand, friction, mass, com, noise, push, latency.torchrun, slurm, sbatch, tmux, ray, CUDA_VISIBLE_DEVICES, Kubernetes.When the user provides a concrete environment or command, preserve it verbatim in the audit as intent, not permission to train:
Mamba/conda environment, especially hssim.
Raw training command, especially:
python -m holosoma.train_agent exp:g1-29dof-scene-traversal-hurdle logger:wandb-packman-scene-traversal
Runnable command prefix, especially mamba run -n hssim.
Config fragments such as exp:g1-29dof-scene-traversal-hurdle and logger:wandb-packman-scene-traversal.
For Holosoma scene traversal, prefer the module entry point python -m holosoma.train_agent over a source-file path. Treat the full mamba-wrapped command as a launch candidate only.
If .rlxp/ is missing and files may be edited, initialize it in the target repository before writing the report. Use the bundled helpers as internal agent tools:
python <plugin-root>/scripts/rlxp_init.py --root . --project-name <name>
python <plugin-root>/scripts/rlxp_scan.py --root . --out .rlxp/scan.json
If the current working directory is not the target RL repository, pass the absolute target path to --root. Never write .rlxp/ into the plugin package, Codex plugin cache, Claude plugin cache, or agent home directory unless that is explicitly the target repository.
Produce this exact structure in the conversation and write it into .rlxp/report.md if project files may be edited:
# Task Consensus Card
## Codebase-derived task definition
## User objective interpreted as
## Training command candidates
## Evaluation command candidates
## Reward / termination / curriculum / DR surfaces found
## Candidate primary metric
## Candidate guardrail metrics
## Candidate tuning scope
## Required user confirmations before launch
1. Is the task definition correct?
2. Which primary metric should define improvement?
3. Which guardrails should block acceptance?
4. Which reward/curriculum/DR surfaces are allowed to change?
5. What is the GPU/wall-clock/iteration budget?
6. Which machines/GPU IDs may be used?
7. Which baseline command and evaluation protocol are approved?
Assign each finding a confidence level: high if directly from code/config or user-provided setup, medium if inferred from nearby code, low if guessed from naming. User-provided command facts are high-confidence for intent; code support remains unverified until inspected in the target repository.
Guides creation, editing, and verification of skills for AI coding agents using test-driven development with subagent scenarios. Use when authoring or debugging skills.
npx claudepluginhub junhyekh/rlxp --plugin rl-experiment-assistant