Skill

rag

Guides building RAG systems for Q&A, chatbots, knowledge bases, covering embedding models, chunking strategies, vector stores, ingestion pipelines, retrieval optimization.

Python

OpenAI

PostgreSQL

ai-ml

Popularity

Stars

Forks

Invocation

How this skill is triggered — by the user, by Claude, or both

Slash command

/godmode:rag

User invocable

Model invocable

Inline context

Default effort

Context Preview

The summary Claude sees in its skill listing — used to decide when to auto-load this skill

- `/godmode:rag`, "build RAG system", "knowledge base"

SKILL.md

173 lines · ~1.2k tokens

Stats

LanguageShell

Stars18

Forks8

MaintenanceExcellent

Last CommitApr 25, 2026

Actions

View Source View Plugin View on GitHub View README

Activate When

/godmode:rag, "build RAG system", "knowledge base"
"improve retrieval", "RAG hallucinating"
Building Q&A, chatbot, or search-augmented features

Workflow

1. Requirements

Use case: <questions the system must answer>
Data sources: <docs, wiki, DB, PDFs, code>
Corpus: <N documents, N tokens, N MB>
Update frequency: static|daily|real-time
Query patterns:
  Factual lookup (single-hop retrieval)
  Analytical (multi-document retrieval)
  Conversational (multi-turn Q&A)
  Structured (metadata filtering + retrieval)

2. Embedding Model Selection

| Model | Dims | MTEB | Cost |
| text-embedding-3-large | 3072 | 64.6 | $0.13/1M |
| text-embedding-3-small | 1536 | 62.3 | $0.02/1M |
| Cohere embed-v3 | 1024 | 64.5 | $0.10/1M |
| Voyage voyage-3 | 1024 | 67.1 | $0.06/1M |
| BGE-large-en-v1.5 | 1024 | 64.2 | Free* |

IF budget-constrained: text-embedding-3-small. IF quality-critical: Voyage or BGE fine-tuned. IF multi-language: Cohere embed-v3.

3. Chunking Strategy

| Strategy | Best For |
| Fixed-size (token) | Baseline, uniform docs |
| Recursive character | General-purpose |
| Semantic | Varying topic density |
| Code-aware (AST) | Source code repos |
| Markdown headers | Structured docs |
| Sliding window | Boundary context critical |

ALWAYS set overlap >= 10% of chunk size. IF chunk_size > 1000 tokens: information dilution risk. IF chunk_size < 100 tokens: context too fragmented. Default: 500 tokens, 50 token overlap.

4. Vector Store

| Store | Type | Scale | Best For |
| Pinecone | Managed | Billions | Production |
| Weaviate | Managed/Self | Millions | Hybrid search |
| Chroma | Embedded | Millions | Prototyping |
| pgvector | Extension | Millions | Existing PG |
| Qdrant | Managed/Self | Billions | High perf |

IF already using PostgreSQL: start with pgvector. IF < 100K chunks: Chroma for development. IF > 10M chunks: Pinecone or Qdrant.

5. Ingestion Pipeline

Document -> Parse/Extract -> Clean/Transform
  -> Chunk -> Embed -> Index
Loaders:
  PDF: PyMuPDF, pdfplumber, Unstructured
  HTML: BeautifulSoup, Unstructured
  Code: tree-sitter AST parser

# Verify indexing
python -c "from chromadb import Client; \
  c=Client(); print(c.list_collections())"

6. Retrieval Optimization

Hybrid search (RECOMMENDED for production):
  Dense (vector): semantic similarity
  Sparse (BM25): keyword/exact matching
  Fusion: Reciprocal Rank Fusion (RRF)
Top-K: 5-20 chunks (start with 10)
Reranker: cross-encoder on top-20 results
  (highest-impact single optimization)

IF recall < 70%: increase overlap, add BM25, try domain-specific embeddings. IF recall > 90% but bad answers: generation problem.

7. Context Assembly

Context window budget:
  System prompt: <N tokens>
  Retrieved context: <N tokens>
  Conversation history: <N tokens>
  Output reservation: <N tokens>
  Total < model context limit
Assembly: rank by relevance, include until budget.
  Format with source attribution.

8. Evaluation

Retrieval metrics:
  Hit rate @ K: % queries with answer in top-K
  MRR: average 1/rank of first correct result
Generation metrics:
  Faithfulness: grounded in retrieved context
  Hallucination rate: answers without evidence
Targets:
  Recall@10 >= 80%, MRR >= 0.7
  Faithfulness >= 90%, Hallucination < 5%

# RAG pipeline testing
python -m pytest tests/test_rag.py -v
curl -s http://localhost:8080/api/search?q=test | jq .results

Hard Rules

NEVER ship without measuring hallucination rate.
NEVER vector search alone — always hybrid (+ BM25).
NEVER skip reranking in production.
NEVER stuff full context window (less + relevant).
NEVER default chunking (500 chars, no overlap).
ALWAYS evaluate retrieval + generation separately.
ALWAYS include source citations in answers.
ALWAYS set chunk overlap >= 10% of chunk size.
ALWAYS have "I don't know" fallback.

TSV Logging

Append .godmode/rag.tsv:

timestamp	action	chunks	recall_at_10	faithfulness	hallucination	status

Keep/Discard

KEEP if: target metric improved AND hallucination
  did not increase.
DISCARD if: hallucination increased OR no improvement.
Never keep a change that increases hallucination.

Stop Conditions

STOP when FIRST of:
  - Recall@10 >= 80%, faithfulness >= 90%,
    hallucination < 5%
  - Two iterations < 2% improvement
  - Latency meets requirements

Autonomous Operation

On failure: git reset --hard HEAD~1. Never pause.

Error Recovery

Failure	Action
Low recall < 70%	Increase overlap, add BM25, reranker
High hallucination	Add "only use context", reduce chunks
High latency	Cache frequent queries, reduce top-K

rag

Popularity

Invocation

Context Preview

SKILL.md

rag

Popularity

Invocation

Context Preview

SKILL.md

Activate When

Workflow

1. Requirements

2. Embedding Model Selection

3. Chunking Strategy

4. Vector Store

5. Ingestion Pipeline

6. Retrieval Optimization

7. Context Assembly

8. Evaluation

Hard Rules

TSV Logging

Keep/Discard

Stop Conditions

Autonomous Operation

Error Recovery

Similar Skills

Activate When

Workflow

1. Requirements

2. Embedding Model Selection

3. Chunking Strategy

4. Vector Store

5. Ingestion Pipeline

6. Retrieval Optimization

7. Context Assembly

8. Evaluation

Hard Rules

TSV Logging

Keep/Discard

Stop Conditions

Autonomous Operation

Error Recovery

Similar Skills