Run vLLM performance benchmark using synthetic random data to measure throughput, TTFT (Time to First Token), TPOT (Time per Output Token), and other key performance metrics. Use when the user wants to quickly test vLLM serving performance without downloading external datasets.
Benchmark vLLM or OpenAI-compatible serving endpoints using vllm bench serve. Supports multiple datasets (random, sharegpt, sonnet, HF), backends (openai, openai-chat, vllm-pooling, embeddings), throughput/latency testing with request-rate control, and result saving. Use when benchmarking LLM serving performance, measuring TTFT/TPOT, or load testing inference APIs.
Deploy vLLM using Docker (pre-built images or build-from-source) with NVIDIA GPU support and run the OpenAI-compatible server.
Deploy vLLM to Kubernetes (K8s) with GPU support, health probes, and OpenAI-compatible API endpoint. Use this skill whenever the user wants to deploy, run, or serve vLLM on a Kubernetes cluster, including creating deployments, services, checking existing deployments, or managing vLLM on K8s.
Quick install and deploy vLLM, start serving with a simple LLM, and test OpenAI API.
Own this plugin?
Verify ownership to unlock analytics, metadata editing, and a verified badge. GitHub access is read-only (username + org membership).
Sign in to claimOwn this plugin?
Verify ownership to unlock analytics, metadata editing, and a verified badge. GitHub access is read-only (username + org membership).
Sign in to claimBased on adoption, maintenance, documentation, and repository signals. Not a security audit or endorsement.
A personal-maintained collection of skills for deploying and benchmarking vLLM. This project follows the anthropics/skills template format and is installable as a Claude Code plugin marketplace.
This repository provides modular, reusable agent skills for operating and benchmarking vLLM, following the Anthropics SKILL.md specification. Each skill is a self-contained directory implementing automation, scripts, and metadata for a specific operational task.
| Skill | Description |
|---|---|
| vllm-deploy-docker | Deploy vLLM using Docker (pre-built images or build-from-source) with NVIDIA GPU support and run the OpenAI-compatible server. |
| vllm-deploy-k8s | Deploy vLLM to Kubernetes with GPU support, health probes, and OpenAI-compatible API endpoint. |
| vllm-deploy-simple | Quick install and deploy vLLM, start serving with a simple LLM, and test OpenAI API. |
| vllm-prefix-cache-bench | Benchmark the efficiency of vLLM automatic prefix caching using fixed prompts, real datasets, or synthetic prefix/suffix patterns. |
| vllm-bench-random-synthetic | Run vLLM performance benchmark using synthetic random data to measure throughput, TTFT, TPOT, and other key performance metrics without downloading external datasets. |
| vllm-bench-serve | Benchmark vLLM or OpenAI-compatible serving endpoints using vllm bench serve. |
Install directly from the plugin marketplace in Claude Code:
/plugin marketplace add Ben-cpy/Deploy-skill
/plugin install deploy-skill@deploy-skill
Clone the repository and copy skills to your Claude Code skills directory:
git clone https://github.com/Ben-cpy/Deploy-skill.git
cd Deploy-skill
Copy to global skill folder:
cp -r plugins/deploy-skill/skills/vllm-deploy-simple ~/.claude/skills/
Or copy to the project skill folder:
cp -r plugins/deploy-skill/skills/vllm-deploy-simple .claude/skills/
Once installed, use the skills with slash commands or natural language:
/vllm-deploy-simple
Deploy vLLM with Qwen2.5-1.5B-Instruct on port 8000
Install and start a vLLM server using the vllm-deploy-simple skill
See vLLM documentation for the full list.
This project follows the anthropics/skills template. When adding new skills:
plugins/deploy-skill/skills/ (e.g., plugins/deploy-skill/skills/your-skill/)SKILL.md file with YAML frontmatter:
---
name: your-skill
description: Brief description of what this skill does
---
scripts/, references/, and assets/ directoriesLicensed under the Apache License 2.0. See LICENSE.
npx claudepluginhub ben-cpy/deploy-skill --plugin deploy-skillComprehensive skill pack with 66 specialized skills for full-stack developers: 12 language experts (Python, TypeScript, Go, Rust, C++, Swift, Kotlin, C#, PHP, Java, SQL, JavaScript), 10 backend frameworks, 6 frontend/mobile, plus infrastructure, DevOps, security, and testing. Features progressive disclosure architecture for 50% faster loading.
UI/UX design intelligence. 67 styles, 161 palettes, 57 font pairings, 25 charts, 15 stacks (React, Next.js, Vue, Svelte, Astro, SwiftUI, React Native, Flutter, Tailwind, shadcn/ui, Nuxt, Jetpack Compose). Actions: plan, build, create, design, implement, review, fix, improve, optimize, enhance, refactor, check UI/UX code. Projects: website, landing page, dashboard, admin panel, e-commerce, SaaS, portfolio, blog, mobile app. Elements: button, modal, navbar, sidebar, card, table, form, chart. Styles: glassmorphism, claymorphism, minimalism, brutalism, neumorphism, bento grid, dark mode, responsive, skeuomorphism, flat design. Topics: color palette, accessibility, animation, layout, typography, font pairing, spacing, hover, shadow, gradient.
This skill should be used when users need to generate ideas, explore creative solutions, or systematically brainstorm approaches to problems. Use when users request help with ideation, content planning, product features, marketing campaigns, strategic planning, creative writing, or any task requiring structured idea generation. The skill provides 30+ research-validated prompt patterns across 14 categories with exact templates, success metrics, and domain-specific applications.
Develop, test, build, and deploy Godot 4.x games with Claude Code. Includes GdUnit4 testing, web/desktop exports, CI/CD pipelines, and deployment to Vercel/GitHub Pages/itch.io.
Upstash Context7 MCP server for up-to-date documentation lookup. Pull version-specific documentation and code examples directly from source repositories into your LLM context.
A growing collection of Claude-compatible academic workflow bundles. Covers scientific figures, manuscript writing and polishing, reviewer assessment, citation retrieval, data availability, paper reading, literature search, response letters, paper-to-PPTX conversion, and evidence-grounded Chinese invention patent drafting. Rules are organized as reusable skill folders with explicit workflows and quality checks.