By junhyekh
Evidence-driven RL experiment assistant for reward tuning, reward engineering, curriculum design, and domain randomization design.
Coordinates RL experiment planning and execution, ensuring user-confirmed metrics, scope, budget, and evidence-gated decisions.
Analyzes RL training/evaluation metrics, reward components, failure modes, confidence, and guardrails before deciding whether an experiment improved.
Reviews reward/curriculum/domain-randomization changes for correctness, reward hacking risk, confounding, and reproducibility.
Execute the evidence-gated RL experiment loop: launch approved training/eval jobs, parse results, update the report, decide accept/reject/inconclusive, and propose the next experiment within the approved budget.
Turn an audited RL task into a user-confirmed experiment plan with metrics, tuning scope, GPU budget, launcher commands, report skeleton, and first baseline/ablation proposal. Use before launching RL training jobs.
Design evidence-backed RL reward parameter changes, reward code changes, curriculum schedules, adaptive sampling, or domain-randomization distributions. Use after baseline/result analysis identifies a concrete failure mode.
Audit an RL training codebase to infer task definition, training/eval commands, rewards, terminations, curriculum, domain randomization, logs, and metrics before planning experiments. Use for robotics/RL codebases, reward tuning, curriculum design, domain-randomization design, or ambiguous task requests.
Uses power tools
Uses Bash, Write, or Edit tools
Own this plugin?
Verify ownership to unlock analytics, metadata editing, and a verified badge. GitHub access is read-only (username + org membership).
Sign in to claimOwn this plugin?
Verify ownership to unlock analytics, metadata editing, and a verified badge. GitHub access is read-only (username + org membership).
Sign in to claimBased on adoption, maintenance, documentation, and repository signals. Not a security audit or endorsement.
Dual-host Codex and Claude Code plugin for evidence-gated reinforcement-learning experiment work. It gives LLM agents the workflow and deterministic helper tools they need to audit an RL repository, create a task/metric/scope/budget contract, initialize repo-local .rlxp/ state, score results, and propose bounded next experiments without treating training reward as the only objective.
Setup, audit, planning, package validation, and offline validation do not run GPU jobs, simulators, W&B/network calls, or training entry points. Training remains blocked unless the target repository's .rlxp/contract.yaml records explicit approval for:
Generated .rlxp/ state belongs in the target RL repository, not in this plugin package except temporary validation fixtures.
.
├── .agents/plugins/marketplace.json
├── .claude-plugin/marketplace.json
└── plugins/rl-experiment-assistant/
├── .codex-plugin/plugin.json
├── .claude-plugin/plugin.json
├── skills/
├── agents/
├── scripts/
├── templates/
└── examples/
From this package root:
codex plugin marketplace add .
Then open /plugins, install local-rl-experiment-assistant / rl-experiment-assistant, and start a fresh Codex session.
Typical request from a target RL repository:
Use RL Experiment Assistant to audit this RL codebase. Prepare the task card, metric candidates, tuning scope, budget questions, and .rlxp report skeleton. Do not launch training.
To initialize state through the plugin, ask the agent directly:
Use RL Experiment Assistant to initialize .rlxp for this repository, then audit the RL task and prepare the first experiment plan. Use the current working directory as the target repository. Do not launch training.
From this package root:
claude plugin marketplace add .
Then in Claude Code:
/plugin install rl-experiment-assistant@local-rl-experiment-assistant
/reload-plugins
Development fallback without marketplace installation:
claude --plugin-dir ./plugins/rl-experiment-assistant
Claude-specific agents are additive wrappers around the same contract-gated skills:
experiment-director: coordinates audit/plan/loop and owns .rlxp/contract.yaml checks.metrics-analyst: validates run evidence against held-out metrics and guardrails.reward-reviewer: reviews reward/curriculum/domain-randomization changes before deployment.Normal setup is agent-driven. The user should not need to run the bundled Python helpers by hand. Ask Codex or Claude Code to initialize the target repository, and the plugin skill should resolve the installed plugin root, run the helper internally, and write .rlxp/ into the target RL repository.
Good prompt:
Use RL Experiment Assistant to initialize .rlxp for /path/to/target-rl-repo with project name <project-name>. Audit the codebase and leave training blocked until the contract is confirmed.
Expected target-repository state:
.rlxp/
├── adapter.yaml
├── contract.yaml
├── report.md
├── experiments.yaml
├── ledger.jsonl
└── runs/
When this plugin is used inside a Codex session, initialization still writes to the target repository the agent passes as --root. The plugin root, Codex plugin cache, and Codex home are not the experiment target.
Preferred pattern:
Use RL Experiment Assistant to initialize .rlxp for this repository. Use the current working directory as --root. Do not launch training.
If Codex is not running from the target repository, name the absolute target path:
Use RL Experiment Assistant to initialize .rlxp for /path/to/target-rl-repo. Treat that path as --root even if the Codex session is currently elsewhere. Do not launch training.
The agent may run scripts/rlxp_init.py internally as an implementation detail. It should not ask the user to paste Python commands unless the agent cannot execute local commands.
Holosoma support is a bundled profile and validation example, not a package-level assumption. The core workflow is framework-agnostic:
.rlxp/adapter.yaml for commands, config, metrics, logs, and hardware.rlxp/contract.yaml for task, metric, scope, budget, and launch approvalnpx claudepluginhub junhyekh/rlxp --plugin rl-experiment-assistantHarness-native ECC operator layer - 67 agents, 271 skills, 92 legacy command shims, reusable hooks, rules, selective install profiles, and production-ready workflows for Claude Code, Codex, OpenCode, Cursor, and related agent harnesses
v9.44.1 — Patch release for Gemini environment/version detection and qwen auth gating. Run /octo:setup.
AI-powered development tools for code review, research, design, and workflow automation.
Superpowers Plus core skills library for Claude Code: planning, execution routing, TDD, debugging, and collaboration workflows
Unity Development Toolkit - Expert agents for scripting/refactoring/optimization, script templates, and Agent Skills for Unity C# development
Claude harness - A harness for solo developers (Vibecoders) to handle full-cycle contract development.