Tense

Temporal memory for AI agents — knows which version is true.
Tense is an MCP server that gives an AI agent a
memory which tracks not just what it was told, but when each thing was true.
It stores knowledge as a hand-built bi-temporal graph on Postgres and answers
which version is current — or what was true as of any past date — something a
plain vector store cannot.
A vector store indexes "Zach reports to Alice" and "Zach reports to Bob" with
near-identical embeddings and happily returns both. It has no principled way to
know the second superseded the first, when that happened, or who Zach
reported to last quarter. Tense does.
Short on time? docs/CASE-STUDY.md is the 2-minute
narrative — the problem, the bet, the decision I had to defend, and how it's proven
— with every claim linked to the code, eval, or ADR that backs it.
Install in your agent
Tense is just an MCP stdio server, so any coding agent with MCP support can use it.
For Claude Code, set TENSE_DATABASE_URL and OPENROUTER_API_KEY, then add the
server in one line:
claude mcp add tense -e TENSE_DATABASE_URL="$TENSE_DATABASE_URL" -e OPENROUTER_API_KEY="$OPENROUTER_API_KEY" -- npx -y github:Zacplischka/tense
For Cursor, Windsurf, Goose, Cline, Continue, or any other MCP client, run one line
and paste the generated mcpServers.tense block:
npx -y github:Zacplischka/tense init
Claude plugin route: this repo also carries a Claude Code plugin manifest. Add the
repo as a marketplace, then install the plugin:
claude plugin marketplace add github:Zacplischka/tense
claude plugin install tense@tense
The result
On point-in-time questions whose answer changed over time — the one place a
recency-sorted vector store cannot win — measured against a fair vector
baseline (same Sources, same embeddings, recency tiebreak allowed):
| Metric (10-scenario gold set, live extraction) | Tense | Fair vector baseline |
|---|
| Temporal-QA — point-in-time (5 questions) | 100% | 0% |
| Temporal-QA — all questions (11) | 100% | 55% |
| Supersession precision / recall | 100% / 100% | — |
| False-supersession rate | 0% | — |
| Extraction triple-F1 / valid_at accuracy | 100% / 100% | — |
The point-in-time row is the headline: 5 questions whose answer changed over
time, where a recency-sorted vector store is structurally wrong. The baseline
still gets 6/11 overall (the "now" questions) — it loses precisely on the 5 it
cannot model. The rows below it are the supporting evidence (extraction and
supersession quality), measured over all 10 scenarios.
Reproduce with pnpm eval — it prints those same denominators (all 11,
point-in-time (5)), so every number above reconciles against a live run. No API
key needed: pnpm eval:offline reproduces the same 5/5 point-in-time win
(100% vs 0%) with no spend, byte-identical every run. The fairness of the
baseline, the offline path, and the gold-set size are detailed below.
Reproduce it — and why the baseline is fair, not a strawman
Run it live. pnpm eval prints those same denominators (all 11,
point-in-time (5)), so every number above reconciles against a live run. Want
the numbers without running anything? eval/RESULTS.md is a
committed, byte-identical snapshot with a per-question breakdown — every
point-in-time question, its as_of, and gold vs Tense vs baseline, so you can see
which answers the baseline gets wrong and why (it returns the most-recent value).
No API key? pnpm eval:offline reproduces the headline row with no spend —
stub extraction plus hashed bag-of-words embeddings, Postgres only — printing the
same 5/5 point-in-time win (100% vs 0%), byte-identical every run.
