From rag-development
Comprehensive RAG development knowledge base covering chunking, embeddings, vector databases, retrieval strategies, advanced patterns (Graph RAG, CRAG, Self-RAG, Agentic RAG), evaluation, and production deployment. TRIGGER WHEN: building, optimizing, or auditing RAG systems. DO NOT TRIGGER WHEN: the task is outside the specific scope of this component.
How this skill is triggered — by the user, by Claude, or both
Slash command
/rag-development:rag-developmentThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
Comprehensive knowledge base for building production-grade Retrieval-Augmented Generation systems.
Comprehensive knowledge base for building production-grade Retrieval-Augmented Generation systems.
For 80% of use cases, start with:
text-embedding-3-small (best value) or Cohere embed-v4 (best accuracy)Then upgrade incrementally based on measured failures:
Detailed reference documents are in the references/ directory:
chunking-strategies.md -- all chunking approaches with code, benchmarks, and selection guideembedding-models.md -- model comparison, Matryoshka embeddings, fine-tuning, sparse/dense/multi-vectorretrieval-patterns.md -- hybrid search, HyDE, contextual retrieval, re-ranking, MMRadvanced-rag-patterns.md -- Graph RAG, RAPTOR, CRAG, Self-RAG, Agentic RAG, multi-modal RAGvector-databases.md -- Qdrant deep dive, database comparison, scaling strategiesproduction-guide.md -- evaluation, observability, caching, security, cost optimizationDocument Ingestion:
Raw Docs -> Preprocessing (Unstructured.io) -> Chunking -> Context Enrichment -> Embedding -> Vector DB
Query Pipeline:
User Query -> Query Transform -> Encode (Dense + Sparse) -> Hybrid Search -> Re-rank -> LLM Generation
Evaluation Loop:
Ground Truth + Predictions -> RAGAS/DeepEval -> Faithfulness, Relevancy, Precision, Recall
| Decision | Default | Upgrade When |
|---|---|---|
| Chunking | Recursive 512 tok | Structured docs -> markdown-aware; cross-refs -> late chunking |
| Embedding | text-embedding-3-small | Need accuracy -> embed-v4; self-hosted -> NV-Embed-v2 |
| Vector DB | Qdrant + INT8 | Already on Postgres -> pgvector; need managed -> Pinecone |
| Search | Dense only | Keyword misses -> add sparse hybrid; poor diversity -> add MMR |
| Re-ranking | None | Top-k results contain irrelevant items -> add Cohere Rerank |
| Caching | None | Production latency/cost concerns -> semantic cache |
| Evaluation | Manual spot checks | Any production use -> RAGAS automated metrics |
npx claudepluginhub acaprino/alfio-claude-pluginsCovers RAG architecture including design patterns, chunking strategies, embedding models, retrieval techniques, hybrid search, and context assembly for LLM pipelines.
Build RAG systems for LLM apps using vector databases, embeddings, and retrieval strategies. Use for document Q&A, grounded chatbots, and semantic search.
Guides RAG implementation from requirements to LLM integration, covering embedding selection, vector DB setup, chunking strategies, and retrieval optimization.