From distill-rag-bridge
Distill conversation insights into durable memory/KB files, then index them into a vector database for semantic search. Use /search-kb to search prior knowledge.
How this skill is triggered — by the user, by Claude, or both
Slash command
/distill-rag-bridge:distill-and-indexThis skill is limited to the following tools:
The summary Claude sees in its skill listing — used to decide when to auto-load this skill
Extract high-value information from a conversation and persist it so future sessions pick up where this one left off. Knowledgebase files are embedded and indexed into a vector database for semantic search.
Extract high-value information from a conversation and persist it so future sessions pick up where this one left off. Knowledgebase files are embedded and indexed into a vector database for semantic search.
Conversation ──► Phase 1 (Distill) ──► memory/*.md + knowledgebase/*.yaml
│
Phase 2 (Index)
│
embed-server (primary, ~40ms)
Ollama HTTP (fallback, ~330ms)
│
▼
/project/.claude/agentdb.sqlite3
──searchable via──► /search-kb
Embedding uses embed-server.py daemon (sentence-transformers, 40ms) with Ollama HTTP fallback (330ms) — no npm dependencies.
session-distillation skill installedembed-server.py daemon running (auto-started at container boot, Unix socket /tmp/embed-server.sock)all-minilm:latest pulled (fallback, auto-pulled at container start)/project/scripts/{embed-server,load-kb-to-memory,search-kb-memory}.pyBefore any indexing or search operation, verify the embedding model is available:
if test -S /tmp/embed-server.sock; then
echo "✔ embed-server daemon ready"
elif curl -s http://localhost:11434/api/tags 2>/dev/null | python3 -c "
import sys, json
d = json.load(sys.stdin)
models = [m['name'] for m in d.get('models', [])]
sys.exit(0 if any('all-minilm' in m for m in models) else 1)" 2>/dev/null; then
echo "✔ Ollama fallback ready"
else
echo "✖ No embedding model available — aborting."
echo "Check: docker compose logs tooling | grep -E 'embed|ollama|model'"
exit 1
fi
Do not proceed with Phase 2 (Index) or Phase 3 (Search) if this check fails. Phase 1 (Distill) can still run — memory/KB files will be written and indexed on the next successful start.
Run the session-distillation workflow:
~/.claude/projects/*/memory/MEMORY.md and knowledgebase/index.yaml before writing---
name: descriptive-name
description: one-line summary
type: user | feedback | project | reference
---
Content...
MEMORY.md and index.yamlBuild the vector index from knowledgebase YAML files:
python3 /project/scripts/load-kb-to-memory.py
This reads all knowledgebase/{decisions,patterns,sessions}/*.yaml files, generates 384-dim embeddings (fast via embed daemon, fallback to Ollama), and stores them in /project/.claude/agentdb.sqlite3. Uses INSERT OR REPLACE — safe to run repeatedly.
Verify after indexing:
python3 -c "
import sqlite3
db = sqlite3.connect('/project/.claude/agentdb.sqlite3')
c = db.execute('SELECT namespace, COUNT(*) FROM embeddings GROUP BY namespace')
for r in c: print(f' {r[0]}: {r[1]}')
"
Search the vector database for relevant prior knowledge:
python3 /project/scripts/search-kb-memory.py "<query>" [-n namespace] [-l limit]
Common namespaces: decisions, patterns, sessions.
Users can also invoke search directly via /search-kb <query>.
Agents treat the distill-and-index pipeline as a two-way memory system:
Agent completes work
→ distill-and-index runs (manual or PreCompact hook)
→ Phase 1: session-distillation scans conversation, writes memory + KB files
→ Phase 2: files are embedded and indexed into SQLite vector DB
→ Knowledge becomes searchable by future sessions
Agent starts new task
→ runs /search-kb "<topic>" to find relevant prior knowledge
→ "has anyone solved something like this before?"
→ "what decisions shaped this area of the code?"
→ "what gotchas should I watch out for?"
→ uses findings to inform approach, skip solved problems, avoid known traps
Before implementing a feature:
/search-kb "<feature> architecture" — find design decisions/search-kb "PATTERN#* <domain>" — find relevant patternsWhen debugging a problem:
/search-kb "<error message>" — find past encountersknowledgebase/index.yaml quickReference gotchasWhen a session ends (PreCompact hook):
load-kb-to-memory.py)For automatic distillation before context compaction, add to .claude/settings.local.json:
{
"hooks": {
"PreCompact": [{
"matcher": "auto",
"hooks": [{
"type": "agent",
"prompt": "Run the distill-and-index skill. Phase 1: distill conversation into memory/KB files using session-distillation. Phase 2: run python3 /project/scripts/load-kb-to-memory.py to index KB files into the vector database. Verify entry counts.",
"statusMessage": "Distilling session and indexing into vector DB..."
}]
}]
}
}
The load-kb-to-memory.py script is safe to run repeatedly — INSERT OR REPLACE ensures idempotency.
After running, confirm:
ls ~/.claude/projects/*/memory/cat ~/.claude/projects/*/memory/MEMORY.mdcat knowledgebase/index.yamlpython3 -c "import sqlite3; db=sqlite3.connect('/project/.claude/agentdb.sqlite3'); print(db.execute('SELECT COUNT(*) FROM embeddings').fetchone()[0], 'entries')"/search-kb "test query" -l 3npx claudepluginhub qoolqool/skills --plugin distill-rag-bridgeSearches MemPalace before answering questions about past work, people, projects, or prior decisions. Returns verbatim stored content instead of guessing from model memory.
Guides Payload CMS config (payload.config.ts), collections, fields, hooks, access control, APIs. Debugs validation errors, security, relationships, queries, transactions, hook behavior.
Implements vector databases with Pinecone, Weaviate, Qdrant, Milvus, pgvector for semantic search, RAG, recommendations, and similarity systems. Optimizes embeddings, indexing, and hybrid search.