From agentdb-learning
Train one of AgentDB's 9 RL algorithms on a stream of episodes. Use when the user has accumulated successful/failed episodes and wants to derive a policy, or when a task type is repeated enough to benefit from RL routing.
How this skill is triggered — by the user, by Claude, or both
Slash command
/agentdb-learning:agentdb-learnThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
Train an RL agent on episode data to derive a policy.
Train an RL agent on episode data to derive a policy.
| Algo | Best for |
|---|---|
| Q-Learning | Tabular state-spaces, discrete actions |
| SARSA | On-policy variant, conservative exploration |
| DQN | High-dimensional state, neural Q-fn |
| PPO | Continuous control, high-dim action |
| Actor-Critic | Baseline reduction, stable training |
| Policy Gradient | Direct policy parameterization |
| Decision Transformer | Offline RL on trajectories |
| MCTS | Tree-search planning under known dynamics |
| Model-Based RL | Sample-efficient when env model is learnable |
If you don't know which to pick, call agentdb_learning_route first — the bandit suggests one based on past performance on similar task signatures.
agentdb_learning_train(
algorithm: <one of above> // or 'auto' to let bandit pick
episodes: [<episodeId>, ...] // or task name → fetch automatically
hyperparams: { lr, gamma, epsilon, ... }
iterations: N
)
agentdb_learning_route(task) may pick this trained skill if it scores well.agentdb_learning_route says "no algorithm has > 0.6 expected reward on this task", the answer is to gather more episodes, not to force-train.npx claudepluginhub ruvnet/agentdb --plugin agentdb-learningAsk the AgentDB bandit which RL algorithm / skill / pattern fits the current task best. Use at task start when there are multiple plausible approaches and you want the data-driven pick.
Guides training RL agents with Stable-Baselines3 (PPO, SAC, DQN, TD3, DDPG, A2C), creating custom Gymnasium environments, and using callbacks. Best for single-agent RL experiments and quick prototyping.
Trains RL agents with Stable-Baselines3 (PPO, SAC, DQN, TD3, DDPG, A2C) using a scikit-learn-like API. Covers custom Gymnasium environments, callbacks, and model saving/loading.