Skill

rl-reward-curriculum-design

Design evidence-backed RL reward parameter changes, reward code changes, curriculum schedules, adaptive sampling, or domain-randomization distributions. Use after baseline/result analysis identifies a concrete failure mode.

Invocation

How this skill is triggered — by the user, by Claude, or both

Slash command

/rl-experiment-assistant:rl-reward-curriculum-design

User invocable

Model invocable

Inline context

Default effort

Context Preview

The summary Claude sees in its skill listing — used to decide when to auto-load this skill

Design interventions for reward tuning, reward engineering, curriculum learning, adaptive sampling, and domain randomization. Do not invent changes without evidence from code inspection, baseline metrics, reward component statistics, videos, termination causes, per-scene/per-motion performance, or user-provided diagnostics.

SKILL.md

63 lines · ~845 tokens

Stats

LanguagePython

Parent stars0

MaintenanceGood

Last CommitMay 14, 2026

Actions

View Source View Plugin View on GitHub View README

Stats

Actions

Intervention hierarchy

Prefer this order unless evidence justifies skipping:

Instrumentation: log missing reward components, termination causes, tracking errors, contact metrics, difficulty bins, scene IDs, motion windows.
Reward parameter tuning: adjust existing weights/scales/margins/thresholds with bounded search.
Curriculum/adaptive sampling: alter progression, hard-window sampling, scene difficulty distribution, or success thresholds.
Domain randomization tuning: adjust ranges/schedules only when robustness, transfer, or overfitting evidence exists.
Reward engineering: add or replace reward terms only when existing reward lacks a necessary behavioral signal or encourages reward hacking.

Reward parameter tuning rules

Use log-scale multiplicative changes for positive weights.
Avoid changing many weights at once unless the terms are mathematically coupled.
Preserve sign conventions for penalties/rewards.
Normalize by existing component magnitudes when available.
Propose paired ablations for ambiguous trade-offs, such as tracking vs. regularization.

Reward engineering rules

Before editing reward code, produce:

Failure mode: what behavior is missing or exploited.
Proposed term: exact formula and variables required.
Expected effect: which metric improves and which guardrails might degrade.
Static checks: shape, dtype, finite values, no privileged future information unless the task already uses it, no discontinuity that destabilizes training.
Rollout diagnostics: term distribution on baseline rollouts, correlation with success metric, saturation rate.

Curriculum / DR rules

Curriculum should move probability mass toward learnable but currently difficult cases, not only the hardest cases.
Keep a held-out evaluation distribution fixed.
For scene/motion tasks, track per-bin and worst-bin metrics, not only global mean.
DR ranges should start from physically plausible bounds and widen only when policy competence is adequate.
If DR prevents learning, use staged DR: train on nominal dynamics first, then gradually widen.

Output

Return an experiment proposal conforming to templates/experiment-proposal.schema.json, plus a short rationale:

# Intervention Proposal

## Evidence
## Hypothesis
## Exact change
## Command/config patch
## Expected metric movement
## Risks and guardrails
## Decision rule

Also provide the machine-readable proposal JSON, or explicitly state which schema field is still unknown.

rl-reward-curriculum-design

Invocation

Context Preview

SKILL.md

rl-reward-curriculum-design

Invocation

Context Preview

SKILL.md

Intervention hierarchy

Reward parameter tuning rules

Reward engineering rules

Curriculum / DR rules

Output

Similar Skills

Intervention hierarchy

Reward parameter tuning rules

Reward engineering rules

Curriculum / DR rules

Output

Similar Skills