From vllm-ascend
vLLM-Ascend serving toolchain. Use when installing vLLM on Ascend NPUs, running offline inference, launching a model as an OpenAI-compatible API server, tuning throughput/latency for a specific serving scenario, or contributing to the vllm-ascend project. Trigger whenever the user discusses vLLM deployment, vLLM errors, serving a model on Ascend, or wants to get inference running before evaluation.
How this skill is triggered — by the user, by Claude, or both
Slash command
/vllm-ascend:vllm-ascend install / run / serve / contributeinstall / run / serve / contributeThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
Handles vLLM-Ascend installation, running, performance tuning, and contribution workflow.
Handles vLLM-Ascend installation, running, performance tuning, and contribution workflow.
Before any vLLM task:
/ascend to verify NPUs are healthy and free (npu-smi info)/model-download. Never pass an online model ID to vLLM; always use a local path.vllm and vllm-ascend, editable installOnce the API server is up, use /aisbench to run accuracy or performance benchmarks against it.
All vLLM commands (offline inference, online serving) must be saved to a shell script and executed through it so output is captured in a timestamped log file. See the template in /ascend → "Common Requirement: Run via Shell Script with Log Output".
vllm, vllm-ascend) are installed in editable mode. Run pip show <package> to find the source directory before modifying or referencing them.git checkout -b debug/TOPIC
Creates, edits, and optimizes skills for Claude Code, including drafting, evaluating with test prompts, iterating on performance, and improving skill descriptions for better triggering accuracy.
npx claudepluginhub starmountain1997/g-claude --plugin vllm-ascend