Fork notice: This is a personal fork of mixedbread-ai/mgrep by @JackDevAU. It is not actively maintained — use at your own risk. For a stable, supported version use the upstream project.
What's different in this fork:
- Local mode — run fully offline using Ollama for embeddings (
nomic-embed-text by default) and LanceDB for vector storage (stored at ~/.mgrep/lancedb). No Mixedbread account or API key required.
- Privacy-first — in local mode your code never leaves your machine. No analytics or telemetry is sent anywhere in either mode; the only external calls are to the Mixedbread API for core search functionality (when not using local mode), and users are explicitly warned about file syncing at install time.
Why mgrep?
- Natural-language search that feels as immediate as
grep.
- Semantic, multilingual & multimodal (audio, video support coming soon!)
- Web search built-in — query the web alongside your local files with
--web.
- Smooth background indexing via
mgrep watch, designed to detect and keep up-to-date everything that matters inside any git repository.
- Friendly device-login flow and first-class coding agent integrations.
- Built for agents and humans alike, and designed to be a helpful tool, not a restrictive harness: quiet output, thoughtful defaults, and escape hatches everywhere.
- Reduces the token usage of your agent by 2x while maintaining superior performance
# index once
mgrep watch
# then ask your repo things in natural language
mgrep "where do we set up auth?"
Quick Start
-
Install
npm install -g @mixedbread/mgrep # or pnpm / bun
-
Sign in once
mgrep login
A browser window (or verification URL) guides you through Mixedbread authentication.
Alternative: API Key Authentication
For CI/CD or headless environments, set the MXBAI_API_KEY environment variable:
export MXBAI_API_KEY=your_api_key_here
This bypasses the browser login flow entirely.
-
Index a project
cd path/to/repo
mgrep watch
watch performs an initial sync, respects .gitignore, then keeps the Mixedbread store updated as files change.
-
Search anything
mgrep "where do we set up auth?" src/lib
mgrep -m 25 "store schema"
Searches default to the current working directory unless you pass a path.
Today, mgrep works great on: code, text, PDFs, images.
Coming soon: audio & video.
Using it with Coding Agents
[!CAUTION]
Background Sync Enabled: When installed with a coding agent, mgrep runs a
background process that syncs your files to enable semantic search. This
process starts automatically when you begin a session and stops when your
session ends. You can see your current usage in the Mixedbread
platform.
[!NOTE]
Default Limits: mgrep enforces default limits to ensure optimal performance:
- Maximum file size: 1MB per file
- Maximum file count: 1,000 files per directory
These limits can be customized via CLI flags (--max-file-size, --max-file-count),
environment variables, or config files. See the Configuration section for details.
If you prefer to manually start the file watcher instead of relying on the agent's
automatic background sync, you can run:
mgrep watch /path/to/your/project
This gives you explicit control over when indexing occurs and which directories are watched.