From vanguard-frontier-agentic
Reviews NVIDIA GPU infrastructure deployments (DGX, HGX, MGX) against reference architectures, checking BMC segmentation, firmware, driver versions, ECC, persistence mode, and MIG configuration.
How this skill is triggered — by the user, by Claude, or both
Slash command
/vanguard-frontier-agentic:nvidia-ai-infrastructure-operationsThis skill is limited to the following tools:
The summary Claude sees in its skill listing — used to decide when to auto-load this skill
Review NVIDIA GPU infrastructure deployments (DGX, HGX, MGX, certified OEM systems) against NVIDIA reference architectures and the NCA-AIIO / NCP-AII certification body of knowledge. Anchor judgments on driver + firmware + CUDA toolkit + AI Enterprise support matrix alignment, BMC/iDRAC/iLO segmentation, and host-level GPU configuration (persistence mode, ECC, MIG capability, vGPU).
Review NVIDIA GPU infrastructure deployments (DGX, HGX, MGX, certified OEM systems) against NVIDIA reference architectures and the NCA-AIIO / NCP-AII certification body of knowledge. Anchor judgments on driver + firmware + CUDA toolkit + AI Enterprise support matrix alignment, BMC/iDRAC/iLO segmentation, and host-level GPU configuration (persistence mode, ECC, MIG capability, vGPU).
nvidia-smi, nvidia-smi -q, dmidecode, ipmitool lan print, dcgmi diag) when the active client exposes it; otherwise fall back to NVIDIA Enterprise Support documentation, sanitized topology diagrams, and the AI Enterprise compatibility matrix.Load these only when needed:
Return, at minimum:
npx claudepluginhub raishin/vanguard-frontier-agentic --plugin vanguard-frontier-agenticReviews day-2 operations of NVIDIA GPU fleets: DCGM exporter/diag posture, GPU telemetry into Prometheus/Grafana, MIG partitioning lifecycle, Xid error runbooks, fleet upgrades, and incident response for GPU-failure modes.
Provides NVIDIA partner intelligence: NIM microservices, NeMo, AI Enterprise, DGX Cloud, Inception program, and EMEA accelerator wave. Use for NVIDIA content, partnership prep, or evaluating new releases.
Checks and compares software component versions across SageMaker HyperPod cluster nodes including NVIDIA drivers, CUDA, cuDNN, NCCL, EFA, OFI NCCL, GDRCopy, MPI, Neuron SDK, Python, and PyTorch. Useful for verifying compatibility, detecting mismatches, and planning upgrades.