Skill

optimize-llm

Provides LLM serving optimization recommendations for latency, inference costs, and throughput. Scans configs, detects stacks like vLLM/TGI, suggests quantization, batching, KV cache, and framework changes.

ai-ml

performance

Popularity

Parent stars

Parent forks

Invocation

How this skill is triggered — by the user, by Claude, or both

Slash command

/systems-design:optimize-llm

User invocable

Model invocable

Inline context

Default effort

Tool Access

This skill is limited to the following tools:

ReadGlobGrepTask

Context Preview

The summary Claude sees in its skill listing — used to decide when to auto-load this skill

Get quick, actionable recommendations for LLM serving optimization.

SKILL.md

82 lines · ~515 tokens

Stats

LanguagePython

Parent stars67

Parent forks10

MaintenanceGood

Last CommitFeb 15, 2026

Actions

View Source View Plugin View on GitHub View README

Optimize LLM Command

Get quick, actionable recommendations for LLM serving optimization.

Usage

/sd:optimize-llm [focus]

Arguments

focus (optional): Optimization priority
- latency - Focus on reducing response time
- cost - Focus on reducing inference costs
- throughput - Focus on maximizing requests/second
- If omitted: Provide balanced recommendations

Examples

/sd:optimize-llm
/sd:optimize-llm latency
/sd:optimize-llm cost

Workflow

Gather Context
- Search for LLM-related configuration files
- Look for: model configs, serving configs, inference scripts
- Identify current serving stack (vLLM, TGI, TensorRT-LLM, etc.)
Spawn LLM Optimization Advisor Agent Use the llm-optimization-advisor agent to analyze and provide recommendations. The agent specializes in:
- Quantization strategies (INT8, INT4, FP16)
- Batching optimization (continuous, dynamic)
- KV cache optimization (PagedAttention)
- Serving framework selection
- Cost reduction strategies
Present Recommendations Display optimization opportunities organized by:
- Quick Wins - Low effort, high impact changes
- Medium Effort - Moderate changes with significant benefits
- Advanced - Architectural changes for maximum performance

Output Format

## LLM Optimization Report

### Current Setup
- Model: [detected or ask]
- Framework: [detected or unknown]
- Hardware: [detected or ask]

### Quick Wins
1. [Optimization] - [Expected impact]
2. ...

### Medium Effort Optimizations
1. [Optimization] - [Expected impact]
2. ...

### Advanced Optimizations
1. [Optimization] - [Expected impact]
2. ...

### Estimated Total Impact
- Latency: [X]% improvement
- Cost: [X]% reduction
- Throughput: [X]x increase

optimize-llm

Popularity

Invocation

Tool Access

Context Preview

SKILL.md

optimize-llm

Popularity

Invocation

Tool Access

Context Preview

SKILL.md

Optimize LLM Command

Usage

Arguments

Examples

Workflow

Output Format

Similar Skills

Optimize LLM Command

Usage

Arguments

Examples

Workflow

Output Format

Similar Skills