lethe
Λήθη — the ancient Greek personification of forgetfulness, and one of the five rivers of the underworld.
A memory store for LLM agents. Hybrid BM25 + dense retrieval, cross-encoder reranking, clustered retrieval-induced forgetting, and an optional LLM enrichment layer at write time.
On LongMemEval S (199,509 conversation turns, 500 questions, full-corpus NDCG@10):
| Stage | NDCG@10 | vs baseline |
|---|
| Hybrid BM25 + vector + cross-encoder | 0.293 | — |
| + clustered+gap RIF (checkpoint 13) | 0.312 | +6.5% |
| + LLM enrichment, on covered queries | 0.473 | +35% |
The enrichment gain is measured on the 75 queries for which the answer turns are enriched; overall numbers are diluted by uncovered queries. See BENCHMARKS.md.
Quick start
from lethe import MemoryStore
from sentence_transformers import SentenceTransformer, CrossEncoder
store = MemoryStore(
"./my_memories",
bi_encoder=SentenceTransformer("all-MiniLM-L6-v2"),
cross_encoder=CrossEncoder("cross-encoder/ms-marco-MiniLM-L-6-v2"),
)
store.add("I prefer window seats on flights", session_id="trip")
store.add("My wife needs aisle seats", session_id="trip")
store.add("I work at Google as a software engineer", session_id="work")
results = store.retrieve("What are my travel preferences?", k=5)
for entry_id, content, score in results:
print(f" [{score:.1f}] {content}")
store.save()
store.close()
Architecture
Query
│
├── FAISS top-30 (dense vector similarity)
├── BM25 top-30 (sparse keyword match)
│
└── Merge (RRF)
│
└── RIF suppression penalty (per-cluster, gap-based)
│
└── Cross-encoder rerank → top-k
│
└── Update suppression state, affinities, tier
Optional write-time LLM enrichment layer (src/lethe/enrichment.py): before indexing, each memory can be processed by an LLM (default claude-haiku-4-5) to produce a gist, 3 anticipated queries, entities, and temporal markers. All fields index alongside the original text; cross-encoder still scores against original. Attacks the vocabulary-mismatch failure mode.
Retrieval-induced forgetting (RIF)
On each retrieval, entries that reach the candidate pool but lose to the cross-encoder accumulate a per-cluster suppression score. On future retrievals in the same query cluster, their scores get penalized before the cross-encoder sees them, freeing slots for entries that were previously crowded out.
Key design points:
- Clustered (k-means 30, cue-dependent): an entry suppressed for "travel" queries stays available for "food" queries. 5× stronger than global suppression.
- Rank-gap competition formula:
max(0, xenc_rank − initial_rank) / pool × sigmoid(−xenc). Only suppresses entries that actually dropped in rank AND were actively rejected, not entries that just lost a close race.
Based on Anderson's inhibition theory (1994) and the SAM competitive-sampling model (Raaijmakers & Shiffrin, 1981). First implementation in an AI memory system as far as I can tell.
Three storage layers
| Layer | File | Purpose |
|---|
| SQLite | lethe.db | Entries, suppression state, rescue cache, stats |
| numpy + FAISS | embeddings.npz, faiss.index | Vector storage |
| BM25 | In-memory, rebuilt on startup | Sparse keyword index |
Entry lifecycle (germinal-center inspired)
NAIVE → GC → MEMORY
↓
APOPTOTIC
- Naive: new entries, unproven
- GC: retrieved 3+ times, actively evaluated
- Memory: high affinity + frequently retrieved, stable, exempt from decay
- Apoptotic: low affinity + idle > 1000 steps, excluded from search
Useful for long-running agents; doesn't directly improve retrieval quality (that's what RIF and enrichment do).
Deduplication (on add)
- Exact: SHA-256 content hash (free)
- Near-duplicate: cosine similarity > 0.95 (keeps the longer entry)
Install
git clone https://github.com/teimurjan/lethe && cd lethe
uv venv --python 3.12 && uv pip install -e .
Benchmark
# prep LongMemEval
uv run python experiments/data_prep.py --dataset longmemeval
# retrieval-only baseline + RIF variants
uv run python benchmarks/run_benchmark.py
uv run python benchmarks/run_rif_benchmark.py
# LLM enrichment layer (needs ANTHROPIC_API_KEY)
export ANTHROPIC_API_KEY=sk-ant-...
uv run python experiments/enrich_longmemeval.py # one-time, ~$16 for 10k entries
uv run python benchmarks/run_rif_enriched.py # 3-arm benchmark
See BENCHMARKS.md for results and methodology.
Benchmark methodology
All numbers here are NDCG@10 over turn-level retrieval on the full 199,509-turn LongMemEval S corpus — needle-in-haystack among 200k candidates.