From NVIDIA
Starts or patches Dynamo router modes (round-robin, KV, least-loaded, device-aware) and runs endpoint smoke checks. Useful for bring-up and mode comparison.
How this skill is triggered — by the user, by Claude, or both
Slash command
/nvidia-skills:dynamo-router-starterThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
<!--
Make Dynamo routing feel easy by getting a baseline router mode running, enabling KV-aware routing when appropriate, and proving the endpoint works. Keep the user focused on exact commands and success signals, not router internals.
dynamo package importable (python3 -m dynamo.frontend --help works).kubectl configured with access to the target namespace and a deployed Dynamo recipe./v1/models returns at least one entry).Collect or infer:
round-robin, kv, least-loaded, device-aware-weighted, direct, or random/v1/models cannot discover itFor local bring-up with already registered workers:
python3 -m dynamo.frontend --router-mode round-robin --http-port 8000
For Kubernetes, inspect the selected recipe deploy.yaml and locate the
frontend service. If the recipe is not already deployed, use
dynamo-recipe-runner first.
For local frontend:
python3 -m dynamo.frontend --router-mode kv --http-port 8000
For Kubernetes, patch only the frontend service env:
envs:
- name: DYN_ROUTER_MODE
value: kv
If backend workers are not publishing KV cache events, set approximate mode instead of leaving the router waiting for events:
envs:
- name: DYN_ROUTER_USE_KV_EVENTS
value: "false"
After port-forwarding the frontend service or starting local frontend, run:
python3 scripts/check_router_health.py \
--base-url http://127.0.0.1:8000
This must verify /v1/models and, when a model is discoverable, one
/v1/chat/completions request.
When comparing round-robin vs KV routing:
If the endpoint is unhealthy or workers are missing, switch to
dynamo-troubleshoot.
| Script | Purpose | Arguments |
|---|---|---|
scripts/check_router_health.py | Smoke-test /v1/models and one chat completion against a Dynamo frontend | --base-url, --retries, --timeout |
Invoke via the agentskills.io run_script() protocol:
run_script("scripts/check_router_health.py", args=["--base-url", "http://127.0.0.1:8000"])
Local KV-routed frontend on port 8000, then smoke-test it:
python3 -m dynamo.frontend --router-mode kv --http-port 8000 &
python3 scripts/check_router_health.py --base-url http://127.0.0.1:8000
Kubernetes-deployed frontend reachable via port-forward:
kubectl port-forward svc/qwen-vllm-disagg-frontend 8000:8000 -n dynamo-demo &
python3 scripts/check_router_health.py --base-url http://127.0.0.1:8000 --retries 3
Equivalent through the agent protocol:
run_script("scripts/check_router_health.py", args=["--base-url", "http://127.0.0.1:8000", "--retries", "3"])
Return:
dynamo-benchmark for throughput/latency numbers.| Symptom | Likely cause | Next step |
|---|---|---|
/v1/models returns empty list | No worker registered with the frontend | Verify worker pods are Ready; confirm they connect to the same etcd/NATS |
| Smoke chat request times out | Frontend up, workers not serving | Switch to dynamo-troubleshoot; inspect worker logs |
| KV mode hangs | Workers do not publish KV cache events | Set DYN_ROUTER_USE_KV_EVENTS=false (approximate mode) |
| Connection refused on port-forward | Port-forward dropped or wrong service name | Re-run port-forward; verify the frontend service name matches the recipe |
See BENCHMARK.md for the NVCARPS-EVAL performance report (auto-generated by the NVSkills CI pipeline). To refresh, re-run /nvskills-ci on an upstream PR touching this skill.
references/router-modes.md for the compact mode/env map.scripts/check_router_health.py for endpoint smoke tests.npx claudepluginhub sayalinvidia/sayali-skills-test --plugin nvidia-skillsStarts or patches Dynamo router modes (round-robin, KV, least-loaded, device-aware) and runs endpoint smoke checks. Useful for bring-up and mode comparison.
Generates version-correct YAML config for Apollo Router v1.x and v2.x. Guides setup of CORS, JWT auth, telemetry, Rhai scripts, coprocessors, and connectors.
Deploys and configures Kong or Traefik API gateway for traffic management, authentication, rate limiting, and routing. Use when unifying multiple backend services.