From ork
Provides production RAG patterns for grounded LLM responses including core RAG, embeddings, hybrid search, contextual retrieval, HyDE, agentic/multimodal RAG, query decomposition, reranking, and pgvector.
How this skill is triggered — by the user, by Claude, or both
Slash command
/ork:rag-retrievalThis skill is limited to the following tools:
The summary Claude sees in its skill listing — used to decide when to auto-load this skill
Comprehensive patterns for building production RAG systems. Each category has individual rule files in `rules/` loaded on-demand.
checklists/rag-quality.mdchecklists/search-implementation-checklist.mdexamples/chatbot-with-rag-example.tsexamples/examples/orchestkit-retrieval.mdmetadata.jsonrules/_sections.mdrules/_template.mdrules/agentic-adaptive-retrieval.mdrules/agentic-corrective-rag.mdrules/agentic-knowledge-graph.mdrules/agentic-self-rag.mdrules/contextual-hybrid.mdrules/contextual-pipeline.mdrules/contextual-prepend.mdrules/core-basic-rag.mdrules/core-context-management.mdrules/core-hybrid-search.mdrules/core-pipeline-composition.mdrules/embeddings-advanced.mdrules/embeddings-chunking.mdComprehensive patterns for building production RAG systems. Each category has individual rule files in rules/ loaded on-demand.
| Category | Rules | Impact | When to Use |
|---|---|---|---|
| Core RAG | 4 | CRITICAL | Basic RAG, citations, hybrid search, context management |
| Embeddings | 3 | HIGH | Model selection, chunking, batch/cache optimization |
| Contextual Retrieval | 3 | HIGH | Context-prepending, hybrid BM25+vector, pipeline |
| HyDE | 3 | HIGH | Vocabulary mismatch, hypothetical document generation |
| Agentic RAG | 4 | HIGH | Self-RAG, CRAG, knowledge graphs, adaptive routing |
| Multimodal RAG | 3 | MEDIUM | Image+text retrieval, PDF chunking, cross-modal search |
| Query Decomposition | 3 | MEDIUM | Multi-concept queries, parallel retrieval, RRF fusion |
| Reranking | 3 | MEDIUM | Cross-encoder, LLM scoring, combined signals |
| PGVector | 4 | HIGH | PostgreSQL hybrid search, HNSW indexes, schema design |
Total: 30 rules across 9 categories
Fundamental patterns for retrieval, generation, and pipeline composition.
| Rule | File | Key Pattern |
|---|---|---|
| Basic RAG | rules/core-basic-rag.md | Retrieve + context + generate with citations |
| Hybrid Search | rules/core-hybrid-search.md | RRF fusion (k=60) for semantic + keyword |
| Context Management | rules/core-context-management.md | Token budgeting + sufficiency check |
| Pipeline Composition | rules/core-pipeline-composition.md | Composable Decompose → HyDE → Retrieve → Rerank |
Embedding models, chunking strategies, and production optimization.
| Rule | File | Key Pattern |
|---|---|---|
| Models & API | rules/embeddings-models.md | Model selection, batch API, similarity |
| Chunking | rules/embeddings-chunking.md | Semantic boundary splitting, 512 token sweet spot |
| Advanced | rules/embeddings-advanced.md | Redis cache, Matryoshka dims, batch processing |
Anthropic's context-prepending technique — 67% fewer retrieval failures.
| Rule | File | Key Pattern |
|---|---|---|
| Context Prepending | rules/contextual-prepend.md | LLM-generated context + prompt caching |
| Hybrid Search | rules/contextual-hybrid.md | 40% BM25 / 60% vector weight split |
| Complete Pipeline | rules/contextual-pipeline.md | End-to-end indexing + hybrid retrieval |
Hypothetical Document Embeddings for bridging vocabulary gaps.
| Rule | File | Key Pattern |
|---|---|---|
| Generation | rules/hyde-generation.md | Embed hypothetical doc, not query |
| Per-Concept | rules/hyde-per-concept.md | Parallel HyDE for multi-topic queries |
| Fallback | rules/hyde-fallback.md | 2-3s timeout → direct embedding fallback |
Self-correcting retrieval with LLM-driven decision making.
| Rule | File | Key Pattern |
|---|---|---|
| Self-RAG | rules/agentic-self-rag.md | Binary document grading for relevance |
| Corrective RAG | rules/agentic-corrective-rag.md | CRAG workflow with web fallback |
| Knowledge Graph | rules/agentic-knowledge-graph.md | KG + vector hybrid for entity-rich domains |
| Adaptive Retrieval | rules/agentic-adaptive-retrieval.md | Query routing to optimal strategy |
Image + text retrieval with cross-modal search.
| Rule | File | Key Pattern |
|---|---|---|
| Embeddings | rules/multimodal-embeddings.md | CLIP, SigLIP 2, Voyage multimodal-3 |
| Chunking | rules/multimodal-chunking.md | PDF extraction preserving images |
| Pipeline | rules/multimodal-pipeline.md | Dedup + hybrid retrieval + generation |
Breaking complex queries into concepts for parallel retrieval.
| Rule | File | Key Pattern |
|---|---|---|
| Detection | rules/query-detection.md | Heuristic indicators (<1ms fast path) |
| Decompose + RRF | rules/query-decompose.md | LLM concept extraction + parallel retrieval |
| HyDE Combo | rules/query-hyde-combo.md | Decompose + HyDE for maximum coverage |
Post-retrieval re-scoring for higher precision.
| Rule | File | Key Pattern |
|---|---|---|
| Cross-Encoder | rules/reranking-cross-encoder.md | ms-marco-MiniLM (~50ms, free) |
| LLM Reranking | rules/reranking-llm.md | Batch scoring + Cohere API |
| Combined | rules/reranking-combined.md | Multi-signal weighted scoring |
Production hybrid search with PostgreSQL.
| Rule | File | Key Pattern |
|---|---|---|
| Schema | rules/pgvector-schema.md | HNSW index + pre-computed tsvector |
| Hybrid Search | rules/pgvector-hybrid-search.md | SQLAlchemy RRF with FULL OUTER JOIN |
| Indexing | rules/pgvector-indexing.md | HNSW (17x faster) vs IVFFlat |
| Metadata | rules/pgvector-metadata.md | Filtering, boosting, Redis 8 comparison |
from openai import OpenAI
client = OpenAI()
async def rag_query(question: str, top_k: int = 5) -> dict:
"""Basic RAG with citations."""
docs = await vector_db.search(question, limit=top_k)
context = "\n\n".join([f"[{i+1}] {doc.text}" for i, doc in enumerate(docs)])
response = await llm.chat([
{"role": "system", "content": "Answer with inline citations [1], [2]. Use ONLY provided context."},
{"role": "user", "content": f"Context:\n{context}\n\nQuestion: {question}"}
])
return {"answer": response.content, "sources": [d.metadata['source'] for d in docs]}
| Decision | Recommendation |
|---|---|
| Embedding model | text-embedding-3-small (general), voyage-3 (production) |
| Chunk size | 256-1024 tokens (512 typical) |
| Hybrid weight | 40% BM25 / 60% vector |
| Top-k | 3-10 documents |
| Temperature | 0.1-0.3 (factual) |
| Context budget | 4K-8K tokens |
| Reranking | Retrieve 50, rerank to 10 |
| Vector index | HNSW (production), IVFFlat (high-volume) |
| HyDE timeout | 2-3 seconds with fallback |
| Query decomposition | Heuristic first, LLM only if multi-concept |
See test-cases.json for 30 test cases across all categories.
ork:langgraph - LangGraph workflow patterns (for agentic RAG workflows)caching - Cache RAG responses for repeated queriesork:golden-dataset - Evaluate retrieval qualityork:llm-integration - Local embeddings with nomic-embed-textvision-language-models - Image analysis for multimodal RAGork:database-patterns - Schema design for vector searchKeywords: retrieval, context, chunks, relevance, rag Solves:
Keywords: hybrid, bm25, vector, fusion, rrf Solves:
Keywords: embedding, text to vector, vectorize, chunk, similarity Solves:
Keywords: contextual, anthropic, context-prepend, bm25 Solves:
Keywords: hyde, hypothetical, vocabulary mismatch Solves:
Keywords: self-rag, crag, corrective, adaptive, grading Solves:
Keywords: multimodal, image, clip, vision, pdf Solves:
Keywords: decompose, multi-concept, complex query Solves:
Keywords: rerank, cross-encoder, precision, scoring Solves:
Keywords: pgvector, postgresql, hnsw, tsvector, hybrid Solves:
npx claudepluginhub yonatangross/orchestkit --plugin orkBuild RAG systems for LLM apps using vector databases, embeddings, and retrieval strategies. Use for document Q&A, grounded chatbots, and semantic search.
Covers RAG architecture including design patterns, chunking strategies, embedding models, retrieval techniques, hybrid search, and context assembly for LLM pipelines.
<!-- AUTO-GENERATED by export-plugins.py — DO NOT EDIT -->