From vanguard-frontier-agentic
Statically reviews Triton Inference Server deployments for model repository layout, config.pbtxt, dynamic batching, ensemble/BLS pipelines, custom backend trust, endpoint auth, response cache, and metrics exposure.
How this skill is triggered — by the user, by Claude, or both
Slash command
/vanguard-frontier-agentic:nvidia-triton-inference-serving-reviewThis skill is limited to the following tools:
The summary Claude sees in its skill listing — used to decide when to auto-load this skill
Static review of Triton Inference Server deployments against NVIDIA's Triton documentation — model repository layout, dynamic batching, ensemble pipelines, custom backend trust, gRPC/HTTP authentication, model encryption at rest, response cache poisoning surface. This skill is doc-anchored: it grounds review findings in NVIDIA's published documentation rather than in a certification blueprint, ...
Static review of Triton Inference Server deployments against NVIDIA's Triton documentation — model repository layout, dynamic batching, ensemble pipelines, custom backend trust, gRPC/HTTP authentication, model encryption at rest, response cache poisoning surface. This skill is doc-anchored: it grounds review findings in NVIDIA's published documentation rather than in a certification blueprint, because no NVIDIA certification currently covers this developer-facing surface as a standalone exam objective.
model_repository/ tree and config.pbtxt files as evidence; otherwise fall back to documentation-based inference.--model-repository mount as a high finding — silent model substitution.:8002) exposed to the public network without scraping ACLs as a medium finding — model name and shape leakage.max_queue_delay_microseconds left at default with latency SLOs in the millisecond range as a low finding — throughput-vs-latency tuning is wrong by default.tritonserver and perf_analyzer commands the user should run — do not execute them.Return, at minimum:
npx claudepluginhub raishin/vanguard-frontier-agentic --plugin vanguard-frontier-agenticStatic review of TensorRT/TensorRT-LLM deployment pipelines: ONNX/PyTorch export, precision selection, calibration cache, dynamic shapes, plugin loading, engine provenance, and runtime memory sizing.
Audits ML pipeline reproducibility, experiment tracking hygiene, and model versioning. Advises on serving patterns and prompt evaluation across MLflow, W&B, SageMaker, Vertex AI.
Compares SGLang, vLLM, and TensorRT-LLM for the same model and workload to find the best deployment command under a given GPU budget and latency SLA.