From vanguard-frontier-agentic
Static review of TensorRT/TensorRT-LLM deployment pipelines: ONNX/PyTorch export, precision selection, calibration cache, dynamic shapes, plugin loading, engine provenance, and runtime memory sizing.
How this skill is triggered — by the user, by Claude, or both
Slash command
/vanguard-frontier-agentic:nvidia-tensorrt-llm-deployment-reviewThis skill is limited to the following tools:
The summary Claude sees in its skill listing — used to decide when to auto-load this skill
Static review of TensorRT and TensorRT-LLM deployment pipelines against NVIDIA's TensorRT Developer Guide — ONNX/PyTorch export, FP16/INT8/FP8/INT4 precision, calibration data integrity, dynamic shape profiles, plugin trust boundaries, engine cache provenance. This skill is doc-anchored: it grounds review findings in NVIDIA's published documentation rather than in a certification blueprint, bec...
Static review of TensorRT and TensorRT-LLM deployment pipelines against NVIDIA's TensorRT Developer Guide — ONNX/PyTorch export, FP16/INT8/FP8/INT4 precision, calibration data integrity, dynamic shape profiles, plugin trust boundaries, engine cache provenance. This skill is doc-anchored: it grounds review findings in NVIDIA's published documentation rather than in a certification blueprint, because no NVIDIA certification currently covers this developer-facing surface as a standalone exam objective.
.engine, .plan) distributed without sha256 verification or provenance attestation as a high finding — silent model substitution.optimization_profiles for variable input shapes as a medium finding — builds either fail at runtime or fall back to padded inference.--workspace or --memory-pool-size values that exceed the deployment GPU's free memory as a medium finding — engine build will OOM in CI.--strict-types without explicit precision tagging on every layer as a low finding — actual precision drifts from intent.trtexec, polygraphy run, or tensorrt_llm/build.py commands the user should run — do not execute them.Return, at minimum:
npx claudepluginhub raishin/vanguard-frontier-agentic --plugin vanguard-frontier-agenticStatically reviews Triton Inference Server deployments for model repository layout, config.pbtxt, dynamic batching, ensemble/BLS pipelines, custom backend trust, endpoint auth, response cache, and metrics exposure.
Analyzes torch.profiler traces for SGLang, vLLM, and TensorRT-LLM. Inspects existing trace.json(.gz) files or drives live profiling, returning kernel, overlap-opportunity, and fuse-pattern tables.
Analyzes Huawei Ascend NPU profiling data to detect performance anomalies and reverse-engineer model architecture. Outputs a Markdown report with bubble detection, wait-anchor analysis, and layer classification.