hindsight-cc

A Claude Code plugin that provides persistent memory across conversations using the Hindsight vector database.

Installation

Install directly from the GitHub marketplace:

claude plugin add gcswan/hindsight-cc

The plugin runtime is stdlib-only: its hooks run under your system python3 (no virtualenv, no pip install). Nothing is installed on first run, so the first session is not slowed down by a dependency install.

To verify installation:

claude plugin list

First-Run Setup

Configure the LLM provider Hindsight uses for its memory operations by running the setup wizard inside Claude Code:

/hindsight-cc:setup

It asks for a provider, model, and (for cloud providers) an API key, then writes ~/.config/hindsight-cc/config.env. The ensure-hindsight.sh container-startup script reads that file when it creates the Hindsight container, with precedence: explicit environment variable > ~/.config/hindsight-cc/config.env > built-in default. Local providers (Ollama, LM Studio) need a base URL instead of an API key.

First-run onboarding note

On a brand-new machine, the very first Claude Code session may start before you have run /hindsight-cc:setup — so there is no API key configured yet and memory will not be active for that session. This is expected. Run /hindsight-cc:setup and then start a new session; the Hindsight container is created with your chosen provider and memory features come online.

You can also set the provider via environment variables (instead of, or in addition to, the wizard) before starting Claude Code — see the examples under Requirements below.

Features

Automatic Memory Injection: Relevant context from past conversations is automatically injected into your prompts
Prompt Retention: User prompts are stored for future semantic search
Transcript Retention: Complete conversation segments are stored at session end
Per-Project Isolation: Each project has its own memory bank
Automatic Server Management: Hindsight Docker container starts automatically when you begin a session

Requirements

Docker installed and running
A system python3 on PATH (Python 3.10+; the repo pins 3.13 for dev). The hooks call it directly — no virtualenv is created or used at runtime.

The examples below show provider configuration via environment variables. The same values can be supplied through ~/.config/hindsight-cc/config.env via /hindsight-cc:setup; the precedence is explicit env var > config.env > built-in default. Cloud providers need an API key; local providers (Ollama, LM Studio) need a base URL and no key.

# Groq (recommended for fast inference)
export HINDSIGHT_API_LLM_PROVIDER=groq
export HINDSIGHT_API_LLM_API_KEY=gsk_xxxxxxxxxxxx
export HINDSIGHT_API_LLM_MODEL=openai/gpt-oss-20b
# For free tier users: override to on_demand if you get service_tier errors
# export HINDSIGHT_API_LLM_GROQ_SERVICE_TIER=on_demand

# OpenAI
export HINDSIGHT_API_LLM_PROVIDER=openai
export HINDSIGHT_API_LLM_API_KEY=sk-xxxxxxxxxxxx
export HINDSIGHT_API_LLM_MODEL=gpt-4o

# Gemini
export HINDSIGHT_API_LLM_PROVIDER=gemini
export HINDSIGHT_API_LLM_API_KEY=xxxxxxxxxxxx
export HINDSIGHT_API_LLM_MODEL=gemini-2.0-flash

# Anthropic
export HINDSIGHT_API_LLM_PROVIDER=anthropic
export HINDSIGHT_API_LLM_API_KEY=sk-ant-xxxxxxxxxxxx
export HINDSIGHT_API_LLM_MODEL=claude-sonnet-4-20250514

# Ollama (local, no API key)
export HINDSIGHT_API_LLM_PROVIDER=ollama
export HINDSIGHT_API_LLM_BASE_URL=http://localhost:11434/v1
export HINDSIGHT_API_LLM_MODEL=llama3

# LM Studio (local, no API key)
export HINDSIGHT_API_LLM_PROVIDER=lmstudio
export HINDSIGHT_API_LLM_BASE_URL=http://localhost:1234/v1
export HINDSIGHT_API_LLM_MODEL=your-local-model

# OpenAI-compatible endpoint
export HINDSIGHT_API_LLM_PROVIDER=openai
export HINDSIGHT_API_LLM_BASE_URL=https://your-endpoint.com/v1
export HINDSIGHT_API_LLM_API_KEY=your-api-key
export HINDSIGHT_API_LLM_MODEL=your-model-name

Usage

Once installed, the plugin works automatically:

On session start: The Hindsight server is started if not already running (an already-running server is reused)
On each prompt: Your prompt is stored, and relevant memories are injected
On session end: The conversation transcript is stored

Prompt and transcript retention are fire-and-forget (non-blocking). Memory injection runs a recall that is hard-bounded at ~2.5s and soft-fails to no injection, so a slow or unavailable server never holds up your prompt.

hindsight-cc

Popularity

What's Inside

README

hindsight-cc

Installation

First-Run Setup

First-run onboarding note

Features

Requirements

Usage

Slash Commands

Confidence

Similar Plugins

total-recall

pensyve

memsearch

claude-cognis

recall

codemem

Popularity

Health & Quality

Similar Plugins

total-recall

pensyve

memsearch

claude-cognis

recall

codemem