Claude Context Search QMD

Automatic context injection from past sessions and knowledge bases into Claude Code conversations. Uses QMD (BM25/hybrid/vector search) to find relevant past discussions and inject them as context on every message — transparently, without any special commands.

How It Works

Every time you send a message, a UserPromptSubmit hook:

Reads your prompt
Searches all configured QMD collections for relevant past context
Injects the top results into Claude's context (up to 9K chars)
Claude sees your message + relevant past discussions together

Claude doesn't know the search happened. It just has more context. If it needs more detail from a result, it uses the Read tool with the file path and line offset provided.

Installation

npm install -g @tobilu/qmd
/plugin marketplace add vranac/claude-context-search-qmd

Then run the interactive setup:

/context-search-qmd:setup

Quick Start

Install QMD and the plugin (above)
Run /context-search-qmd:setup — configures collections, database, hooks
Done — search is active on every message

Architecture

The plugin has two layers:

search-tool.py — the search engine. Takes a query, returns formatted results. This is the single callable interface for everything:

# Direct call
uv run /path/to/search-tool.py "your search query"

# With options
uv run /path/to/search-tool.py "query" --cwd /path/to/project --debug --no-reindex

search-hook.py — thin hook wrapper. Reads stdin from UserPromptSubmit, checks skip conditions, delegates to search-tool.py. That's it.

All search goes through search-tool.py:

The hook calls it (automatic, every message)
Claude calls it via Bash (on-demand, when it needs more context)
Agents and subagents call it via Bash (when they need background on a topic)
/context-search-qmd:search calls it (manual, user-initiated)

One tool, one config, consistent results everywhere.

Configuration

search-config.yaml

Place search-config.yaml in your project root. This file is committable and shared with your team.

# search-config.yaml

search_mode: bm25       # "bm25" | "vector" | "hybrid"
max_results: 5          # max results to inject
min_score: 0.3          # minimum relevance score — results below this are discarded
database: local         # "local" (.qmd/ in project root) | "global" (~/.cache/qmd/) | "custom"
database_path: .qmd     # only used when database: custom
debug: false            # when true, logs to stderr (timing, query terms, result counts)

# Collections — directories of markdown files to search
collections:
  team-docs:
    path: ~/team/docs
    context: "Team documentation and coding standards"
  personal-notes:
    path: ~/Documents/notes
    context: "Personal knowledge base"

No config file?

If no search-config.yaml exists, the hook exits silently — no search, no context injection. Run /context-search-qmd:setup to create it.

Search Modes

Mode	Latency	Setup
bm25	~250-300ms	None
vector	~800-900ms	`qmd embed` first
hybrid	~800-900ms	`qmd embed` first (~5 min, ~330MB download)

We recommend vector or hybrid search. Here's why:

Three modes available:

bm25: pure keyword matching. Fast, no models needed.
vector: semantic similarity via embeddings. Finds conceptually related content. Requires qmd embed.
hybrid: BM25 + vector + LLM reranking. Most sophisticated but the reranker can occasionally demote relevant results in favor of broadly matching ones.

In testing, vector search produced the best results for conversational queries — finding the right section in long files that BM25 also found, but with more related results. Hybrid's reranker sometimes got confused and demoted correct chunks.

BM25 is pure keyword matching. It matches on individual words regardless of meaning. A prompt like "verify the results of the test" will match every session that mentions "verify", "results", or "test" — producing high-scoring but completely irrelevant results from unrelated projects. BM25 has no concept of what your prompt means, only what words it contains. Short, conversational messages ("sounds good, go ahead", "let's discuss this further") will consistently produce noise because common words match everywhere.

Hybrid search adds vector embeddings + LLM reranking. The reranker understands semantic similarity — it knows that "verify the results" in the context of testing a search hook is unrelated to "verify the gap analysis" in a documentation session, even though they share the word "verify." This dramatically reduces false positives.

The trade-off is latency: ~300ms (BM25) vs ~900ms (hybrid). On modern hardware (M1/M2+ with 16GB+), the 900ms is acceptable. On older or resource-constrained machines, BM25 may be the only practical option.

claude-context-search-qmd

Popularity

What's Inside

README