GROOM

Gated Refresh of Organizational Memory

A self-maintaining knowledge base for AI agents — consulting it is the act that keeps it current.

GROOM: a read activates a skill that returns in under 100 ms and spawns a detached agent which runs one bounded, checkpointed maintenance operation over the wiki.

Install

In Claude Code, add the marketplace and install the plugin. Two lines:

/plugin marketplace add beconfident-ai/groom
/plugin install groom@groom

You get the harness-wiki skill (consult the bundled knowledge base; consulting it is what triggers a gated background refresh) and the groom subagent that returns a structured, cited brief. Nothing else to wire up.

Other agents (Codex, Gemini CLI, Cursor, Windsurf, Cline, Copilot) read a ready-made rules file that ships in this repo. See Using it as agent context.

To run the maintenance pipeline or reproduce the benchmarks yourself, clone the repo instead and follow the Quickstart.

The problem

An LLM agent is only as current as the text it reads. Production agents ground on curated corpora — internal wikis, convention docs, runbooks, retrieval indices — and those corpora rot: the field moves, the text does not, and every agent that loads a stale page is silently degraded. Context engineering manages the window (what reaches the model at inference time); almost nobody maintains the source.

We made the cost concrete. When a consuming agent treats a corpus as authoritative, injecting staleness into five facts dropped its answer accuracy on those facts from 100% to 0% while untouched controls held at 100%. Corpus correctness is load-bearing — and maintaining it is nobody's immediate job, so it doesn't happen.

What GROOM does

GROOM makes consulting the knowledge base the act that maintains it. A consuming agent reads the corpus through a skill; that fires a gated launcher which returns in tens of milliseconds and, when a refresh is due, spawns a detached agent to run one bounded maintenance operation (lint, prune, expand, research, or iterate). The read never blocks; the next reader gets the benefit (stale-while-revalidate, for knowledge).

Autonomous edits to a live corpus are the real risk, so every operation is wrapped in a git checkpoint behind a deterministic, token-free validator. An edit "counts" only if it reports terminal success, passes structural and fact-level validation, satisfies its postcondition, and touched nothing outside the corpus — otherwise the working tree is reset to the pre-operation commit. A bad edit becomes a recoverable no-op, never a committed corruption.

GROOM is content-agnostic (point it at any markdown knowledge base, or scaffold a fresh one) and retrieval-agnostic (it maintains clean markdown; how an agent retrieves — progressive disclosure, full-context, BM25, dense — is a pluggable layer, not GROOM's concern).

Results

Every number below is reproduced by the harness in eval/ — no agent calls, no network (single laptop, Node 22; timings are load-sensitive).

Property	Result
Staleness matters	A consuming agent's accuracy on affected facts collapses 100% → 0% under corpus staleness; controls hold at 100%.
Safety	Across 9 fault classes, the gate rejects every one and restores the corpus byte-identically to the checkpoint (n=450, ~13 ms median). A no-gate baseline that commits unconditionally corrupts the corpus 9/9.
Concurrency	The naive debounce stamp is a TOCTOU race — it resolves an 8-way trigger to one run only 28–59% of the time. An atomic `mkdir` claim fixes it to 500/500.
Cost	The validation gate is linear (tens of µs/page, ~14–27 ms at 400 pages, load-sensitive); the read path adds a warm ~50 ms and never blocks.
Canaries	Structural validation alone misses 5/5 semantic-loss injections; fact-level canaries catch all 5 — at zero token cost.
Generalization	Across 3 unrelated agent-KB domains (an internal API/SDK reference, an SRE runbook, a SaaS support KB) and 2 retrievers (BM25 + dense), grooming yields a 45–51% relative gain in recall@1 (BM25 0.52→0.78, dense 0.56→0.81); a groomed corpus is ~40% smaller.

Quickstart

npm install
npm test                       # 11-test behavior suite — free, no agent calls
node eval/fault-matrix.mjs     # reproduce the safety benchmark — also free

groom

Popularity

What's Inside

README

GROOM

Gated Refresh of Organizational Memory

Install

The problem

What GROOM does

Results

Quickstart

Confidence

Similar Plugins

caveman

ui-design

llm-council-plugin

self-improving-agent

claude-mem

ecc