Skip to main content

/

/

Stats

Actions

Tags

Stats

Actions

Tags

ClaudePluginHub

Community directory for discovering and installing Claude Code plugins.

Find plugins for your project

AI-powered recommendations based on your stack.

Product

Browse Plugins
Marketplaces
Pricing
About
Contact

Resources

Learning Center
Blog
Weekly Digest
Claude Code Docs
Plugin Guide
Plugin Reference
Plugin Marketplaces

Community

Browse on GitHub
Get Support

Legal

Terms of Service
Privacy Policy

Browse · Plugins · Top Plugins · Marketplaces · Components · Technologies · Skills · Agents · Commands · Hooks · MCP Servers · LSP Servers · Output Styles · Themes · Monitors

Categories · Productivity · Development · Testing · Deployment · Security · Documentation · Data · Utilities

© 2025 ClaudePluginHub

Community Maintained · Not affiliated with Anthropic

ClaudePluginHub

ClaudePluginHub

Tools Learn Pricing

Search everything...

performance-optimization | rag-skills

Home
Skills
rag-skills
performance-optimization

Skill

performance-optimization

From rag-skills

Route RAG performance work for latency, caching, indexing, filtering, batching, and query optimization.

Popularity

Stars

4

Invocation

How this skill is triggered — by the user, by Claude, or both

Slash command

/rag-skills:performance-optimization

User invocable

Model invocable

Inline context

Default effort

Tool Access

This skill is limited to the following tools:

ReadGrepGlob

Context Preview

The summary Claude sees in its skill listing — used to decide when to auto-load this skill

Use this parent skill when the RAG system works functionally but is too slow, expensive, or unstable under expected traffic. Route to targeted latency and retrieval optimization guidance.

SKILL.md

57 lines · ~590 tokens

Stats

LanguagePython

Stars4

MaintenanceExcellent

Last CommitApr 11, 2026

Actions

View Source View Plugin View on GitHub View README

Stats

LanguagePython

Stars4

MaintenanceExcellent

Last CommitApr 11, 2026

Actions

View Source View Plugin View on GitHub View README

Performance Optimization

Overview

Use this parent skill when the RAG system works functionally but is too slow, expensive, or unstable under expected traffic. Route to targeted latency and retrieval optimization guidance.

Problem Statement

RAG latency can come from embedding calls, vector search, metadata filters, reranking, prompt assembly, or repeated work. Optimization requires profiling the full retrieval path before changing architecture.

Key Concepts

Latency budget: Allocate time across embedding, retrieval, reranking, and generation.
Caching: Avoid repeated work for common queries and stable documents.
Index tuning: Match search parameters and indexes to workload.
Query planning: Reduce unnecessary retrieval and reranking work.

Implementation Guide

Step 1: Profile the Retrieval Path

Measure embedding, search, filtering, reranking, prompt assembly, and model latency separately.

Step 2: Apply Targeted Optimizations

Use caching, batching, payload indexes, top-k tuning, and reranker gating where profiling shows bottlenecks.

Step 3: Recheck Quality and Cost

Confirm optimizations do not reduce recall, faithfulness, citation quality, or operational reliability.

When to Use This Skill

Retrieval is too slow for the product experience
Reranking or query expansion adds too much latency
Metadata filters or vector indexes need tuning

When NOT to Use This Skill

Retrieval quality is poor but latency is acceptable
The corpus has not been chunked or indexed yet
The system lacks baseline measurements

Related Skills

Optimize Retrieval Latency
Qdrant for Production RAG
Multi-Pass Retrieval with Reranking

Metrics & Success Criteria

Lower p50 and p95 retrieval latency
Stable answer quality after optimization
Reduced cost per query without hiding failures

$

npx claudepluginhub goodnight77/rag-skills --plugin rag-skills

Similar Skills

rag-architecture

67

Covers RAG architecture including design patterns, chunking strategies, embedding models, retrieval techniques, hybrid search, and context assembly for LLM pipelines.

3 tools

View rag-architecture

rag-architect

14

Designs and implements production-grade RAG systems: chunking documents, generating embeddings, configuring vector stores, hybrid search, reranking, and retrieval evaluation.

5 files

aigroup-workflow

View rag-architect

rag

18

Guides building RAG systems for Q&A, chatbots, knowledge bases, covering embedding models, chunking strategies, vector stores, ingestion pipelines, retrieval optimization.