hindsight-cc

A Claude Code plugin that provides persistent memory across conversations using the Hindsight vector database.
Installation
Install directly from the GitHub marketplace:
claude plugin add gcswan/hindsight-cc
The plugin runtime is stdlib-only: its hooks run under your system python3
(no virtualenv, no pip install). Nothing is installed on first run, so the
first session is not slowed down by a dependency install.
To verify installation:
claude plugin list
First-Run Setup
Configure the LLM provider Hindsight uses for its memory operations by running
the setup wizard inside Claude Code:
/hindsight-cc:setup
It asks for a provider, model, and (for cloud providers) an API key, then
writes ~/.config/hindsight-cc/config.env. The ensure-hindsight.sh
container-startup script reads that file when it creates the Hindsight
container, with precedence: explicit environment variable >
~/.config/hindsight-cc/config.env > built-in default. Local providers
(Ollama, LM Studio) need a base URL instead of an API key.
First-run onboarding note
On a brand-new machine, the very first Claude Code session may start before
you have run /hindsight-cc:setup — so there is no API key configured yet and
memory will not be active for that session. This is expected. Run
/hindsight-cc:setup and then start a new session; the Hindsight container is
created with your chosen provider and memory features come online.
You can also set the provider via environment variables (instead of, or in
addition to, the wizard) before starting Claude Code — see the examples under
Requirements below.
Features
- Automatic Memory Injection: Relevant context from past conversations is automatically injected into your prompts
- Prompt Retention: User prompts are stored for future semantic search
- Transcript Retention: Complete conversation segments are stored at session end
- Per-Project Isolation: Each project has its own memory bank
- Automatic Server Management: Hindsight Docker container starts automatically when you begin a session
Requirements
- Docker installed and running
- A system
python3 on PATH (Python 3.10+; the repo pins 3.13 for dev). The
hooks call it directly — no virtualenv is created or used at runtime.
The examples below show provider configuration via environment variables. The
same values can be supplied through ~/.config/hindsight-cc/config.env via
/hindsight-cc:setup; the precedence is explicit env var > config.env >
built-in default. Cloud providers need an API key; local providers (Ollama, LM
Studio) need a base URL and no key.
# Groq (recommended for fast inference)
export HINDSIGHT_API_LLM_PROVIDER=groq
export HINDSIGHT_API_LLM_API_KEY=gsk_xxxxxxxxxxxx
export HINDSIGHT_API_LLM_MODEL=openai/gpt-oss-20b
# For free tier users: override to on_demand if you get service_tier errors
# export HINDSIGHT_API_LLM_GROQ_SERVICE_TIER=on_demand
# OpenAI
export HINDSIGHT_API_LLM_PROVIDER=openai
export HINDSIGHT_API_LLM_API_KEY=sk-xxxxxxxxxxxx
export HINDSIGHT_API_LLM_MODEL=gpt-4o
# Gemini
export HINDSIGHT_API_LLM_PROVIDER=gemini
export HINDSIGHT_API_LLM_API_KEY=xxxxxxxxxxxx
export HINDSIGHT_API_LLM_MODEL=gemini-2.0-flash
# Anthropic
export HINDSIGHT_API_LLM_PROVIDER=anthropic
export HINDSIGHT_API_LLM_API_KEY=sk-ant-xxxxxxxxxxxx
export HINDSIGHT_API_LLM_MODEL=claude-sonnet-4-20250514
# Ollama (local, no API key)
export HINDSIGHT_API_LLM_PROVIDER=ollama
export HINDSIGHT_API_LLM_BASE_URL=http://localhost:11434/v1
export HINDSIGHT_API_LLM_MODEL=llama3
# LM Studio (local, no API key)
export HINDSIGHT_API_LLM_PROVIDER=lmstudio
export HINDSIGHT_API_LLM_BASE_URL=http://localhost:1234/v1
export HINDSIGHT_API_LLM_MODEL=your-local-model
# OpenAI-compatible endpoint
export HINDSIGHT_API_LLM_PROVIDER=openai
export HINDSIGHT_API_LLM_BASE_URL=https://your-endpoint.com/v1
export HINDSIGHT_API_LLM_API_KEY=your-api-key
export HINDSIGHT_API_LLM_MODEL=your-model-name
Usage
Once installed, the plugin works automatically:
- On session start: The Hindsight server is started if not already running (an already-running server is reused)
- On each prompt: Your prompt is stored, and relevant memories are injected
- On session end: The conversation transcript is stored
Prompt and transcript retention are fire-and-forget (non-blocking). Memory
injection runs a recall that is hard-bounded at ~2.5s and soft-fails to no
injection, so a slow or unavailable server never holds up your prompt.
Slash Commands