Claude Context Search QMD
Automatic context injection from past sessions and knowledge bases into Claude Code conversations. Uses QMD (BM25/hybrid/vector search) to find relevant past discussions and inject them as context on every message — transparently, without any special commands.
How It Works
Every time you send a message, a UserPromptSubmit hook:
- Reads your prompt
- Searches all configured QMD collections for relevant past context
- Injects the top results into Claude's context (up to 9K chars)
- Claude sees your message + relevant past discussions together
Claude doesn't know the search happened. It just has more context. If it needs more detail from a result, it uses the Read tool with the file path and line offset provided.
Installation
npm install -g @tobilu/qmd
/plugin marketplace add vranac/claude-context-search-qmd
Then run the interactive setup:
/context-search-qmd:setup
Quick Start
- Install QMD and the plugin (above)
- Run
/context-search-qmd:setup — configures collections, database, hooks
- Done — search is active on every message
Architecture
The plugin has two layers:
search-tool.py — the search engine. Takes a query, returns formatted results. This is the single callable interface for everything:
# Direct call
uv run /path/to/search-tool.py "your search query"
# With options
uv run /path/to/search-tool.py "query" --cwd /path/to/project --debug --no-reindex
search-hook.py — thin hook wrapper. Reads stdin from UserPromptSubmit, checks skip conditions, delegates to search-tool.py. That's it.
All search goes through search-tool.py:
- The hook calls it (automatic, every message)
- Claude calls it via Bash (on-demand, when it needs more context)
- Agents and subagents call it via Bash (when they need background on a topic)
/context-search-qmd:search calls it (manual, user-initiated)
One tool, one config, consistent results everywhere.
Configuration
search-config.yaml
Place search-config.yaml in your project root. This file is committable and shared with your team.
# search-config.yaml
search_mode: bm25 # "bm25" | "vector" | "hybrid"
max_results: 5 # max results to inject
min_score: 0.3 # minimum relevance score — results below this are discarded
database: local # "local" (.qmd/ in project root) | "global" (~/.cache/qmd/) | "custom"
database_path: .qmd # only used when database: custom
debug: false # when true, logs to stderr (timing, query terms, result counts)
# Collections — directories of markdown files to search
collections:
team-docs:
path: ~/team/docs
context: "Team documentation and coding standards"
personal-notes:
path: ~/Documents/notes
context: "Personal knowledge base"
No config file?
If no search-config.yaml exists, the hook exits silently — no search, no context injection. Run /context-search-qmd:setup to create it.
Search Modes
| Mode | Latency | Setup |
|---|
| bm25 | ~250-300ms | None |
| vector | ~800-900ms | qmd embed first |
| hybrid | ~800-900ms | qmd embed first (~5 min, ~330MB download) |
We recommend vector or hybrid search. Here's why:
Three modes available:
- bm25: pure keyword matching. Fast, no models needed.
- vector: semantic similarity via embeddings. Finds conceptually related content. Requires
qmd embed.
- hybrid: BM25 + vector + LLM reranking. Most sophisticated but the reranker can occasionally demote relevant results in favor of broadly matching ones.
In testing, vector search produced the best results for conversational queries — finding the right section in long files that BM25 also found, but with more related results. Hybrid's reranker sometimes got confused and demoted correct chunks.
BM25 is pure keyword matching. It matches on individual words regardless of meaning. A prompt like "verify the results of the test" will match every session that mentions "verify", "results", or "test" — producing high-scoring but completely irrelevant results from unrelated projects. BM25 has no concept of what your prompt means, only what words it contains. Short, conversational messages ("sounds good, go ahead", "let's discuss this further") will consistently produce noise because common words match everywhere.
Hybrid search adds vector embeddings + LLM reranking. The reranker understands semantic similarity — it knows that "verify the results" in the context of testing a search hook is unrelated to "verify the gap analysis" in a documentation session, even though they share the word "verify." This dramatically reduces false positives.
The trade-off is latency: ~300ms (BM25) vs ~900ms (hybrid). On modern hardware (M1/M2+ with 16GB+), the 900ms is acceptable. On older or resource-constrained machines, BM25 may be the only practical option.