From mad-skills
Semantic search over a local document archive (PDFs, Word, PowerPoint, Excel, Markdown). Uses ChromaDB + any OpenAI-compatible embeddings endpoint.
How this skill is triggered — by the user, by Claude, or both
Slash command
/mad-skills:docsearchThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
Semantic search over a local directory of documents. Indexes into ChromaDB using any OpenAI-compatible embeddings endpoint (LM Studio, Ollama, OpenAI, etc.) and supports incremental re-sync via content hashing.
Semantic search over a local directory of documents. Indexes into ChromaDB using any OpenAI-compatible embeddings endpoint (LM Studio, Ollama, OpenAI, etc.) and supports incremental re-sync via content hashing.
Supported formats: .docx, .pptx, .pdf, .xlsx, .xlsm, .md, .txt, .csv
| Topic | File |
|---|---|
| Embedding backend options (LM Studio, Ollama, OpenAI, remote) | embeddings.md |
| Troubleshooting install issues, venv setup, platform quirks | setup.md |
All configuration is via environment variables — no config files to edit.
| Variable | Default | Purpose |
|---|---|---|
DOCSEARCH_DIR | (required) | Root directory to index and search |
DOCSEARCH_DB | ~/.docsearch/chroma_db | ChromaDB storage path |
DOCSEARCH_COLLECTION | (derived from dir name) | Override the collection name |
DOCSEARCH_EMBED_URL | http://localhost:1234/v1/embeddings | Embeddings endpoint |
DOCSEARCH_EMBED_MODEL | text-embedding-nomic-embed-text-v1.5 | Model name |
DOCSEARCH_EMBED_KEY | lmstudio | API key |
DOCSEARCH_EMBED_DIM | 768 | Embedding dimensions |
DOCSEARCH_PYTHON | (python3 on PATH) | Override Python interpreter |
Set these in your shell profile (.zshrc, .bashrc) or pass them inline.
docsearch search "QUERY" [--top N] [--filter KEY=VALUE ...]
Filters use path-depth metadata extracted at index time:
--filter depth_1=customers — top-level subdirectory--filter depth_2=acme_corp — second-level subdirectory--filter extension=.pdf — file typeOutput: JSON when piped (for agent use), human-readable when run in a terminal.
docsearch index [--dir PATH] [--force]
Incremental by default — only processes new or changed files. --force rebuilds from scratch.
docsearch info
docsearch config
When calling docsearch from a Bash tool, the output is JSON:
[
{
"rank": 1,
"score": 0.87,
"file": "/path/to/document.pdf",
"filename": "document.pdf",
"relative_path": "customers/acme/proposal.pdf",
"type": ".pdf",
"depth_1": "customers",
"depth_2": "acme",
"preview": "..."
}
]
After finding a document:
.md, .txt, .csv — read directly with the Read tool.docx, .pptx, .pdf, .xlsx — show the path to the user or open with open "FILE_PATH"1. Python packages
pip install chromadb requests python-docx python-pptx pypdfium2 openpyxl
Requires Python 3.10+. Install into a venv if preferred.
2. Embeddings server — any OpenAI-compatible endpoint must be running when you index or search.
| Backend | Default URL | Quick start |
|---|---|---|
| LM Studio | http://localhost:1234/v1/embeddings | Load a nomic-embed-text model in the app |
| Ollama | http://localhost:11434/v1/embeddings | ollama pull nomic-embed-text |
| OpenAI | https://api.openai.com/v1/embeddings | Set DOCSEARCH_EMBED_KEY=sk-... and DOCSEARCH_EMBED_MODEL=text-embedding-3-small |
3. Make the script accessible — pick one approach:
# Symlink onto PATH
ln -s /path/to/mad-skills/skills/docsearch/scripts/docsearch.sh ~/.local/bin/docsearch
# Or call directly
python3 /path/to/mad-skills/skills/docsearch/scripts/docsearch.py search "query"
4. Set required env var and index
export DOCSEARCH_DIR="$HOME/Documents/my-docs"
docsearch index # walks DOCSEARCH_DIR, builds ChromaDB index
docsearch info # verify
See setup.md for venv setup, troubleshooting, and platform-specific notes. See embeddings.md for full backend configuration options.
.icloud placeholders) are automatically skippednpx claudepluginhub thevgergroup/mad-skills --plugin docsearchSearches indexed local document folders using natural language queries on Markdown/text files. Activates for file content questions, 'find document about...', or indexing requests.
Performs local keyword, semantic, or hybrid search on markdown notes and docs to find relevant files before reading them, saving 90% tokens during codebase exploration.
Google File Search API patterns for managed RAG with Gemini. Covers both TypeScript (@google/genai) and Python (google-genai) SDKs. Use when building File Search integrations, implementing RAG with Google AI, or when user mentions Google File Search, Gemini RAG, document indexing, or semantic search.