From rag-skills
Route RAG performance work for latency, caching, indexing, filtering, batching, and query optimization.
How this skill is triggered — by the user, by Claude, or both
Slash command
/rag-skills:performance-optimizationThis skill is limited to the following tools:
The summary Claude sees in its skill listing — used to decide when to auto-load this skill
Use this parent skill when the RAG system works functionally but is too slow, expensive, or unstable under expected traffic. Route to targeted latency and retrieval optimization guidance.
Use this parent skill when the RAG system works functionally but is too slow, expensive, or unstable under expected traffic. Route to targeted latency and retrieval optimization guidance.
RAG latency can come from embedding calls, vector search, metadata filters, reranking, prompt assembly, or repeated work. Optimization requires profiling the full retrieval path before changing architecture.
Measure embedding, search, filtering, reranking, prompt assembly, and model latency separately.
Use caching, batching, payload indexes, top-k tuning, and reranker gating where profiling shows bottlenecks.
Confirm optimizations do not reduce recall, faithfulness, citation quality, or operational reliability.
npx claudepluginhub goodnight77/rag-skills --plugin rag-skillsCovers RAG architecture including design patterns, chunking strategies, embedding models, retrieval techniques, hybrid search, and context assembly for LLM pipelines.
Designs and implements production-grade RAG systems: chunking documents, generating embeddings, configuring vector stores, hybrid search, reranking, and retrieval evaluation.