Gr0m_Mem

Zero-install persistent memory brain for any LLM runtime (Claude Code, Claude Desktop, Cursor, Gemini CLI, Continue, Cline, Zed, OpenAI Codex CLI, Aider, raw OpenAI / Anthropic / Gemini APIs, or a local Llama) that stops the model from re-asking and re-deriving across sessions.
This is the main branch — the zero-install core.
No ChromaDB, no Ollama, no 1 GB embedding model. Pure CPython stdlib + a couple of pure-Python wheels. pip install gr0m-mem and it just works.
For semantic retrieval (ChromaDB HNSW + Ollama embeddings) switch to the semantic branch.
Works with any LLM
Gr0m_Mem is universally compatible through three integration paths:
- MCP server — Claude Code, Claude Desktop, Cursor, Gemini CLI, Continue, Cline, Zed, OpenAI Codex CLI, and any other Model Context Protocol client. Setup snippets for every major client in
docs/integrations.md.
- CLI shell-out — any agent framework that can run shell commands (OpenAI Agents SDK, LangChain, LlamaIndex, Aider, raw API callers): wrap
gr0m_mem wakeup, gr0m_mem remember, and gr0m_mem search as tools.
- Paste-into-system-prompt — models with no MCP and no tool calling at all: copy
UNIVERSAL_PROMPT.md into your system prompt and the model will drive the CLI via shell.
The loop-prevention protocol is the same across all three paths.
The problem
Claude forgets everything when a session ends. Next time you talk to it:
- It re-introduces itself.
- It asks what you're working on — again.
- It re-derives the same architectural decision you already locked in yesterday.
- It loses track of which features shipped and re-suggests them.
Other memory systems try to fix this with "let an LLM decide what to remember." That path is expensive, loses context, and still leaks reasoning. Gr0m_Mem takes the other path: record everything important explicitly, surface it at session start, and refuse to contradict it without you saying so.
How it fixes the loop
Four tools (and two Claude Code hooks) are the entire product:
| When | Tool | Effect |
|---|
| Session start | mem_wakeup | Returns a token-budgeted snapshot of identity / preferences / projects / decisions / open questions. Claude inlines it and stops re-introducing. |
| After a decision | mem_record_decision | Persists the decision + rationale against a subject. |
| Before asking a familiar question | mem_recall_decisions | Retrieves prior decisions on that subject. If any exist, Claude uses them instead of re-asking. |
| Learning anything durable | mem_remember | Stores a preference, project, milestone, context fact, or open question. |
The plugin's Stop and PreCompact hooks flush a milestone after every session and before every context compaction, so nothing high-value is lost to /clear or window compression. Session ids are whitelisted (tr -cd 'a-zA-Z0-9_-') before any path touch — the shell-injection bug MemPalace had to patch (Issue #110) is fixed by design here.
Zero-install promise
pip install gr0m-mem always produces a working brain. The main branch has exactly one backend:
sqlite_fts — SQLite FTS5 BM25 full-text search. Ships with CPython's stdlib sqlite3 on every mainstream platform. No compiled extras, no embedding model, no Ollama, no network. Lexical-only, but mem_wakeup + mem_record_decision don't care about the backend — they use their own SQLite table.
Run gr0m_mem doctor to verify.
Want semantic retrieval too?
Switch to the semantic branch. It adds two more backends with auto-selection:
chromadb — HNSW cosine over ChromaDB, best retrieval quality
sqlite_vec — pure-Python cosine over SQLite rows, using Ollama for embeddings