dig
The open, local, reversible knowledge-base layer for AI agents — it keeps a knowledge base in order, organized to your policy (folder structure, naming, labels, no duplicates). dig's agents enforce it, detect drift, fix it, and version every change so nothing is ever lost, and retrieve it fast (hybrid full-text + semantic). A knowledge base kept this clean is one your agent can recall across sessions — so dig doubles as memory that doesn't rot: it serves recall, never answers. It plugs into any agent or framework via MCP + native SDKs. Humans keep editing with their own tools — dig reconciles around them instead of locking them out, and runs many agents in parallel without colliding. Open source, runs fully on your machine, works with any OpenAI-compatible model — including a small local one.
A company's or a person's knowledge base rots: files land in the wrong place, names drift from convention, duplicates pile up, structure erodes. Keeping it tidy is real, recurring work most people would rather delegate. dig is that delegate — an agent harness that does the librarian's whole job (find, organize, dedupe, label, version, reconcile) over one content-addressed core, safely, even while humans and other agents touch the same library.
Most tools do one slice: some move bytes, some apply naming rules, some lint prose, some answer questions about your docs, some version. None manage the structure of a living knowledge base and keep it converged on your policy.
dig aims to be the pi.dev of KB management — a small, sharp core with a rich extension ecosystem. Need to store blobs in your own object store, back up on every change, parse a proprietary format, or add a command? That's an extension, not a fork.
Status: pre-1.0 (canary). The reversible core is shipped and tested — content store, organize, dedup, drift/reconcile, parallel work views + merge/escalation, hybrid FTS+vector retrieval, and watch — alongside the agent-memory loop (retain/recall), the dig mcp server, the dig serve daemon, and the @vllnt/dig / dig-client SDKs. Items still on the roadmap are marked planned in the command table below. Private repo, canary releases only — expect breaking changes until v1.
What dig does
┌──────────────────────────────────────────────┐
│ dig — file librarian │
└──────────────────────────────────────────────┘
retrieve organize dedupe version parallel-safe
find fast rules: name/ no copies full history isolate · merge
& ranked move/label kept + undo · escalate
└──────────────┴──────────────┬──────────────┴──────────────┘
▼
┌───────────────────────────────────┐
│ one content-addressed store │
│ blobs by hash + tree manifests │
└───────────────────────────────────┘
- Retrieve fast. Indexed, ranked
find across the whole library.
- Organize by policy. You declare the rules (naming conventions, folder layout, where things belong);
dig makes the tree match — readable, like a librarian shelving books.
- No duplicates. Identical content is detected by construction (same hash) and collapsed per policy.
- Version everything. Every change is recorded; history is browsable; any change is reversible (
dig undo).
- Detect & fix drift. Policy is a desired state.
dig continuously compares it to the actual KB, reports what has drifted (misfiled, misnamed, duplicated, unlabeled), and reconciles — automatically where safe, by proposal where not.
- Coexist with humans. People keep using their notes app, Finder, Drive, their editor.
dig observes those direct edits, folds them into history, and reconciles them against policy — it never demands you go "through" it, and never silently overrides a deliberate human change (it escalates instead).
- Parallel-safe. Multiple agents operate in isolated views, merge back automatically when they don't overlap, and escalate to a human only when a real conflict can't be resolved by policy.
Why these aren't separate features: a single content-addressed store gives dedupe, versioning, cheap isolation, and mergeable changesets for free. See docs/architecture.md.
Scope