From qe-framework
Designs and implements RAG pipelines with chunking, embedding, vector DBs (Chroma, pgvector, Pinecone, Qdrant), hybrid search, reranking, and evaluation using RAGAS.
How this skill is triggered — by the user, by Claude, or both
Slash command
/qe-framework:Qrag-architectThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
1. **Requirements** — Identify retrieval needs, latency, accuracy, scale
# Pattern 1: Semantic chunking with overlap
def semantic_chunking(text: str, chunk_size: int = 800, overlap: int = 100):
"""Split text on semantic boundaries with overlap for RAG."""
from langchain.text_splitter import RecursiveCharacterTextSplitter
splitter = RecursiveCharacterTextSplitter(
chunk_size=chunk_size, chunk_overlap=overlap,
separators=["\n\n", "\n", ". ", " "]
)
return splitter.split_text(text)
# Pattern 2: Hybrid search with RRF
def hybrid_search(query: str, vector_results: list, bm25_results: list, k: int = 10):
"""Fuse dense + sparse retrieval using Reciprocal Rank Fusion."""
rrf_scores = {}
for rank, r in enumerate(vector_results[:k], 1):
rrf_scores[r['id']] = rrf_scores.get(r['id'], 0) + 1 / (60 + rank)
for rank, r in enumerate(bm25_results[:k], 1):
rrf_scores[r['id']] = rrf_scores.get(r['id'], 0) + 1 / (60 + rank)
return sorted(rrf_scores.items(), key=lambda x: x[1], reverse=True)[:k]
# Pattern 3: Retrieval evaluation
def evaluate_retrieval(queries: list, retrieved: list, ground_truth: list, k: int = 10):
"""Compute precision@k, recall@k, MRR on held-out queries."""
precisions, recalls, mrrs = [], [], []
for query, ret, rel in zip(queries, retrieved, ground_truth):
rel_set, ret_set = set(rel), set(ret[:k])
if rel_set:
precisions.append(len(rel_set & ret_set) / k)
recalls.append(len(rel_set & ret_set) / len(rel_set))
for rank, doc in enumerate(ret[:k], 1):
if doc in rel_set:
mrrs.append(1 / rank); break
return {
"precision@k": sum(precisions) / len(precisions) if precisions else 0,
"recall@k": sum(recalls) / len(recalls) if recalls else 0,
"mrr": sum(mrrs) / len(mrrs) if mrrs else 0,
}
def build_retrieval_index(documents: list, embedding_model: str):
"""One-line summary of indexing strategy.
Longer: explain chunking approach, embedding rationale, guarantees.
Args:
documents: List of dicts with 'id', 'text', 'metadata'
embedding_model: HuggingFace model (e.g., 'BAAI/bge-small-en-v1.5')
Returns:
Indexed vector database client
Raises:
ValueError: If documents lack 'id' or 'text' fields
"""
[tool.ruff]
line-length = 100
select = ["E", "F", "W", "UP"]
[tool.mypy]
python_version = "3.9"
disallow_untyped_defs = true
ignore_missing_imports = true
| Anti-pattern | Fix |
|---|---|
| Fixed chunk_size=512 without domain eval | Test 256–1024 on domain data; measure recall@10 |
| No reranking; direct LLM on top-1 result | Use BM25+vector hybrid + reranker (Cohere, ColBERT) |
| Only measuring LLM output; ignoring retrieval | Measure context_precision ≥0.7 AND answer_relevancy separately |
| Tight coupling to embedding model | Decouple via vector DB schema; version embeddings in metadata |
| Single vector search; no hybrid or filtering | Always use hybrid (dense+sparse) + metadata filters + reranking |
MUST: Evaluate embeddings on domain data, implement hybrid search, measure retrieval quality, test on prod scale, monitor latency
MUST NOT: Use default chunk=512, skip reranking, ignore retrieval metrics, couple to embedding model, deploy without evaluation
npx claudepluginhub inho-team/qe-framework --plugin qe-frameworkCreates, edits, and optimizes skills for Claude Code, including drafting, evaluating with test prompts, iterating on performance, and improving skill descriptions for better triggering accuracy.