From robotics-connect
Run a trained policy OUTSIDE its RL env — in a control loop or a multi-robot demo, not just the RL harness. Use after training a policy with stage-isaac-rl-env. It reconstructs the exact observation and action map (obs term order, action = raw*scale + default, joint order) so the exported actor runs standalone, and lifts a walk/locomotion ONNX into a torch MLP on the GPU because there is no onnxruntime CUDA provider on the DGX Spark (aarch64/GB10). The reach deploy is ambidextrous with no observation change. Driven by the robot descriptor; reference impl proven on the Unitree G1.
How this skill is triggered — by the user, by Claude, or both
Slash command
/robotics-connect:deploy-policyThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
A policy trained in Isaac Lab is only useful if it runs **outside** the training harness — in a control
A policy trained in Isaac Lab is only useful if it runs outside the training harness — in a control
loop, a multi-robot demo, or eventually on hardware. That means reproducing the env's observation and
action exactly, by hand. scripts/deploy.py does it, lifted from the eye-verified
armwaheed/robots#2 locomotion.py and generalized to read the joint set / EE links / action scale from
the robot descriptor.
The exported actor is a stateless MLP — feed it the wrong observation and it silently misbehaves. The
ReachPolicy rebuilds the training observation term-for-term:
base_lin_vel(3) base_ang_vel(3) projected_gravity(3) hand_target(7 = base-frame pos+quat)
joint_pos_rel(ALL joints) joint_vel_rel(ALL joints) last_action(len action_joints)
and the action map: target = raw * action_scale + default_joint_pos, applied to the descriptor's
action joints (the locked, absent-on-hardware joints are excluded and stay at default — exactly as
in training). The hand target is given in the base frame (the policy is yaw-invariant) and clamped
into the trained command box, so a receding base can't drive an out-of-distribution command (which the
policy would chase by leaning harder → stepping back → a runaway topple).
import deploy
desc = deploy.load_descriptor(".../descriptors/unitree_g1_29dof.json")
reacher = deploy.ReachPolicy(robot, "cuda", ".../exported/policy.pt", desc, action_scale=0.5)
reacher.set_world_target(headward_corner_xyz) # aim it; ambidextrous — picks the same-side hand
# per control step (~50 Hz):
reacher.act()
Ambidexterity is free at deploy — ReachPolicy reads the active hand from the (clamped) target's
y-sign, so two robots flanking opposite sides of a bed each lead with their natural same-side hand, with
no change to the observation. active_wrist_link() tells you which wrist the grasp attaches to.
The DGX Spark (aarch64 / GB10, sm_121) has no onnxruntime CUDA execution provider, and the project
rule is no CPU inference. load_onnx_mlp_to_torch lifts an exported MLP-actor ONNX (a Gemm/ELU stack)
straight into a torch.nn.Linear stack on cuda — ONNX Gemm transB=1 stores weight as (out, in),
exactly torch.nn.Linear layout, so the weights copy 1:1 (parity ~3e-6).
VelocityWalker uses it to run Unitree's velocity-walk policy (480-D obs, 5-step term-major history,
raw*0.25 + default action) so the robot walks to the work site before the reach policy takes over.
The walk's joint order + default pose come from the descriptor (sim_asset.joint_order,
effectors.default_pose).
walker = deploy.VelocityWalker(robot, "cuda", ".../g1_velocity_walk.onnx",
joint_names=desc["sim_asset"]["joint_order"],
default_pos=[...]) # the policy's deploy default
walker.act([vx, vy, wz]) # per control step
Walking to a position, deciding to release / retry / ask a peer, and avoiding another robot are
behaviour-layer jobs, not terms in the low-level RL policy (training them in would make it
intractable). The deploy pattern is: behaviour picks the target and the walk→reach handoff;
VelocityWalker gets the robot there; ReachPolicy does the planted, balanced, ambidextrous reach.
Running the policy in a control loop (above) is not the same as putting it on the real motors.
A whole-body policy drives the legs, so deploying it means taking the legs off the vendor balance
controller and letting the RL net balance — a first transfer falls if anything is off, fast
(sub-second). Stage it as a ladder; each rung gates the next (full rationale + the stop hierarchy
in SAFETY.md). The robot-agnostic implementation is
lib/policy_deploy.py (contract + obs builder + the three rungs, with
SafeStop baked into every motion stage); a robot supplies a RobotIO:
| Rung | Runs | Fall risk | Mechanism |
|---|---|---|---|
| 0 offline | obs+policy, printed, no commands | none | read rt/lowstate; build the obs; print actions |
| 1 partial (preferred) | policy's arms only, legs on vendor balance | none | rt/arm_sdk weight-blend overlay (motor_cmd[29].q = weight) |
| 2 whole-body (last resort) | all joints, operator-driven low-level mode (NOT a software release) | HIGH | rt/lowcmd (explicit VOLATILE QoS), robot feet-on-ground + slack gantry |
Prefer rung 1 for any useful task the arms can do — fall-safe, no gantry. Take rung 2 only if the task demonstrably needs whole-body balance the vendor controller can't provide.
Verify the joint order from the ACTUAL env, not just the descriptor. IsaacLab orders the
articulation DOFs interleaved (left/right pairs by tree depth) — e.g. action index 9 is
left_shoulder_pitch, sitting between the knees (7,8) and ankles (11,12). That is not the
Unitree SDK's 0..28 index order. Dump the ground-truth contract by booting the exact training env
and reading robot.joint_names / the action term's resolved order + scale + default offset
(armwaheed/robots#3 rl/dump_deploy_contract.py → deploy_contract.json), then map sim↔SDK by joint
name. Self-check at rung 0: the predicted standing-pose offsets must land on the named joints
(G1 BalanceStand crouch → left_knee joint_pos_rel ≈ +0.2, hip_pitch ≈ −0.17, arms ≈ 0).
G1 EDU motor table (verified live): indexed like the 29-DOF G1 — legs 0–11, waist_yaw 12,
L-arm 15–19, R-arm 22–26; the 6 joints the EDU lacks (13,14,20,21,27,28) are present-but-zero
and never in the 23-joint action set.
First, the bigger call: PREFER rung 1 (arm overlay) on the G1 EDU. Whole-body rt/lowcmd is only
valid when the high-level controller is fully off (Develop mode); against a live controller it
jitters/oscillates (unitree_sdk2_python#43/#108). The rt/arm_sdk overlay blends with the running
balancer instead of fighting it, so the bedside reach is fall-safe and gantry-free. Use the vendor
LocoClient.Move for locomotion. Treat rung 2 as gantry-only research.
Rung-2 takeover (the parts that bite):
CheckMode. The operator drives the G1 into Develop mode (suspend → L2+B → L2+R2).
CheckMode is useless as a gate — on the EDU it returns name='ai' in both AI-Sport and Develop, and
rt/sportmodestate is silent on this variant. The reliable signal is a read-only loco GET-FSM RPC
(ROBOT_API_ID_LOCO_GET_FSM_ID): it answers (code 0) when the balancer is running and errors/times
out when it's off (Develop → code 3102, verified live). verify_whole_body_ready gates on that
(answers → refuse; down → proceed; probe unavailable → proceed only on an operator Develop assertion).
There is no ReleaseMode() call — that software path took the legs in a real incident (SAFETY.md §0).confirm_abort_live: operator presses+releases,
code sees the latch). In Regular/AI-Sport mode the buttons fire vendor gestures, not aborts.rt/lowcmd writer = explicit VOLATILE + keep-last-1 QoS so a torn-down writer leaves no latched
command. The plain ChannelPublisher can't set this — build the Channel from the factory and call
SetWriter(qos) (verified against the robot's own unitree_sdk2_python, cyclonedds 0.10.2).mode_pr=PR, echo mode_machine from
rt/lowstate, motor_cmd[i].mode=1.arm_sdk weight 0→1 holding the live pose.)SafeStop so it damps on return/exception/SIGINT/
SIGTERM, and never kill -9 it (a hard kill latches the last high-gain command → runaway; this
broke a window once — SAFETY.md §0). Abort/end → low-level damp (kp=0, kd≈3).RobotIO.damp_once
contract + the G1 binding. For a planned exit after reaching into clutter, retrace the recorded
approach path in reverse before releasing — a direct return can catch on the furniture the careful
approach avoided (hardware-observed: the hand snagged the mattress on a direct return).LocoClient created AFTER the 500 Hz
rt/lowstate subscriber (and the policy load) can't match its reply reader → GET_FSM_ID times out
(code 3104) even when the balancer is up (a clean-context probe returns code 0). Prime the
loco client in connect() BEFORE the subscribers and POLL (first code-0 = up); a single in-context
call false-negatives. The robust fix is a clean-SUBPROCESS liveness probe; until then gate runs on an
independent clean GET_FSM_ID 0 (--assume-balancer-up), which is false-negative-only + fall-safe.ReachPolicy and VelocityWalker take the joint set, EE links, and scales from the descriptor, so the
same deploy code runs a 23-DOF or 29-DOF G1, or a new humanoid, once its descriptor and exported policy
exist. The only robot-specific knowledge is in the descriptor — which is the whole point. The hardware
ladder above is likewise robot-agnostic: only the damp/release/gain bindings change per robot, and the
safe-stop contract (lib/safe_stop.py) is mandatory for every robot.
Creates, edits, and optimizes skills for Claude Code, including drafting, evaluating with test prompts, iterating on performance, and improving skill descriptions for better triggering accuracy.
npx claudepluginhub armwaheed/robotics-connect --plugin robotics-connect