lethe

Λήθη — the ancient Greek personification of forgetfulness, and one of the five rivers of the underworld.

A memory store for LLM agents. Hybrid BM25 + dense retrieval, cross-encoder reranking, clustered retrieval-induced forgetting, and an optional LLM enrichment layer at write time.

On LongMemEval S (199,509 conversation turns, 500 questions, full-corpus NDCG@10):

Stage	NDCG@10	vs baseline
Hybrid BM25 + vector + cross-encoder	0.293	—
+ clustered+gap RIF (checkpoint 13)	0.312	+6.5%
+ LLM enrichment, on covered queries	0.473	+35%

The enrichment gain is measured on the 75 queries for which the answer turns are enriched; overall numbers are diluted by uncovered queries. See BENCHMARKS.md.

Quick start

from lethe import MemoryStore
from sentence_transformers import SentenceTransformer, CrossEncoder

store = MemoryStore(
    "./my_memories",
    bi_encoder=SentenceTransformer("all-MiniLM-L6-v2"),
    cross_encoder=CrossEncoder("cross-encoder/ms-marco-MiniLM-L-6-v2"),
)

store.add("I prefer window seats on flights", session_id="trip")
store.add("My wife needs aisle seats", session_id="trip")
store.add("I work at Google as a software engineer", session_id="work")

results = store.retrieve("What are my travel preferences?", k=5)
for entry_id, content, score in results:
    print(f"  [{score:.1f}] {content}")

store.save()
store.close()

Architecture

Query
  │
  ├── FAISS top-30 (dense vector similarity)
  ├── BM25 top-30 (sparse keyword match)
  │
  └── Merge (RRF)
        │
        └── RIF suppression penalty (per-cluster, gap-based)
              │
              └── Cross-encoder rerank → top-k
                    │
                    └── Update suppression state, affinities, tier

Optional write-time LLM enrichment layer (src/lethe/enrichment.py): before indexing, each memory can be processed by an LLM (default claude-haiku-4-5) to produce a gist, 3 anticipated queries, entities, and temporal markers. All fields index alongside the original text; cross-encoder still scores against original. Attacks the vocabulary-mismatch failure mode.

Retrieval-induced forgetting (RIF)

On each retrieval, entries that reach the candidate pool but lose to the cross-encoder accumulate a per-cluster suppression score. On future retrievals in the same query cluster, their scores get penalized before the cross-encoder sees them, freeing slots for entries that were previously crowded out.

Key design points:

Clustered (k-means 30, cue-dependent): an entry suppressed for "travel" queries stays available for "food" queries. 5× stronger than global suppression.
Rank-gap competition formula: max(0, xenc_rank − initial_rank) / pool × sigmoid(−xenc). Only suppresses entries that actually dropped in rank AND were actively rejected, not entries that just lost a close race.

Based on Anderson's inhibition theory (1994) and the SAM competitive-sampling model (Raaijmakers & Shiffrin, 1981). First implementation in an AI memory system as far as I can tell.

Three storage layers

Layer	File	Purpose
SQLite	`lethe.db`	Entries, suppression state, rescue cache, stats
numpy + FAISS	`embeddings.npz`, `faiss.index`	Vector storage
BM25	In-memory, rebuilt on startup	Sparse keyword index

Entry lifecycle (germinal-center inspired)

NAIVE → GC → MEMORY
              ↓
         APOPTOTIC

Naive: new entries, unproven
GC: retrieved 3+ times, actively evaluated
Memory: high affinity + frequently retrieved, stable, exempt from decay
Apoptotic: low affinity + idle > 1000 steps, excluded from search

Useful for long-running agents; doesn't directly improve retrieval quality (that's what RIF and enrichment do).

Deduplication (on add)

Exact: SHA-256 content hash (free)
Near-duplicate: cosine similarity > 0.95 (keeps the longer entry)

Install

git clone https://github.com/teimurjan/lethe && cd lethe
uv venv --python 3.12 && uv pip install -e .

Benchmark

# prep LongMemEval
uv run python experiments/data_prep.py --dataset longmemeval

# retrieval-only baseline + RIF variants
uv run python benchmarks/run_benchmark.py
uv run python benchmarks/run_rif_benchmark.py

# LLM enrichment layer (needs ANTHROPIC_API_KEY)
export ANTHROPIC_API_KEY=sk-ant-...
uv run python experiments/enrich_longmemeval.py     # one-time, ~$16 for 10k entries
uv run python benchmarks/run_rif_enriched.py         # 3-arm benchmark

See BENCHMARKS.md for results and methodology.

Benchmark methodology

All numbers here are NDCG@10 over turn-level retrieval on the full 199,509-turn LongMemEval S corpus — needle-in-haystack among 200k candidates.

lethe

Popularity

What's Inside

README

lethe

Quick start

Architecture

Retrieval-induced forgetting (RIF)

Three storage layers

Entry lifecycle (germinal-center inspired)

Deduplication (on add)

Install

Benchmark

Benchmark methodology

Confidence

Similar Plugins

claude-mem

caveman

llm-council-plugin

self-improving-agent

obsidian

Popularity

Health & Quality

Similar Plugins

claude-mem

caveman

llm-council-plugin

self-improving-agent

obsidian