TraceRelay


Task-first runtime for schema evolution, shared memory, and gap-driven agent workflows.
TraceRelay is a local-first system that lets an LLM or agent:
- interpret a task and resolve the subject,
- recheck the abstract schema family when the requested shape points to a better fit,
- reuse prior memory before taking the next step,
- generate or reuse a schema,
- extract structured information,
- detect whether the current gap is missing values or missing structure,
- add new keys and relations only when they are truly needed,
- plan the next search or follow-up action from known facts and open gaps,
- re-run extraction until the task is filled or the loop limit is reached.
Every run is persisted as inspectable lineage, projected into PostgreSQL, browsable in Flask, and shared live through MCP, Codex, Claude Code, and LM Studio.
Benchmark Snapshot
Normalized Comparison

What These Charts Are Meant To Show
- The benchmark is shown as one normalized profile so the comparison reads as an overall product shape, not five disconnected charts.
- Higher is better across all axes in this view.
- Query Quality reflects the inverse of broad or malformed query rate.
- Claim Support reflects the inverse of unsupported claim rate.
- Token Efficiency reflects lower average tokens per successful task.
- Long-Task Recall reflects the inverse of long-task context forgetting rate.
- TraceRelay should win when the task depends on evolving structure, not just one-shot prompting.
- Schema-aware memory recall should reduce broad search, malformed search, and repeated search loops.
- Gap-directed retries should lower wasted token spend relative to agents that have to rediscover task structure each turn.
- Context-scoped memory and relay-style structured outputs should reduce long-task forgetting as the task gets deeper and more iterative.
- Traceable schema evolution and memory formation should reduce unsupported claims by making missing facts and missing structure explicit.
Quick Start
cp .env.example .env
docker compose up -d --build postgres web mcp
docker compose logs -f web mcp
Then open:
http://127.0.0.1:5080/tasks
http://127.0.0.1:5080/memory
Default .env.example targets LM Studio. If you want Ollama or external embedding APIs, edit .env first. Full setup variants are in Setup Details.
What Makes It Strong
- Self-evolving structure: TraceRelay starts with the schema you need now, then adds fields and relations only when the task proves they are required.
- Context-scoped memory: daily work, deep research, and coding investigations do not get dumped into one noisy global memory pool.
- Better search inputs: later searches are driven by missing fields, missing relations, evolved schema versions, and prior extracted facts.
- Gap-driven next actions: TraceRelay can tell an agent what is still missing, whether the gap is values or structure, and which search phrases to try next.
- Relay memory across long tasks: each extraction round leaves behind structured outputs that the next round can reuse.
- Shared memory across agents: Codex, Claude Code, LM Studio, and MCP clients can work against the same live memory and lineage instead of keeping isolated assistant-local recall.
- Traceable decisions: interpretation, extraction, coverage, schema evolution, retries, and failures are persisted as lineage.
- Lower waste, fewer hallucinations: gap-directed retries reduce token burn, redundant prompting, malformed search, and unsupported guesses.
- Operational surfaces: the same runtime is exposed through Web, PostgreSQL, MCP, Codex, Claude Code, and LM Studio.
- Flexible deployment control: run locally with LM Studio or Ollama, use OpenAI-compatible APIs or Gemini APIs when needed, keep data in PostgreSQL, inspect everything in Flask, and avoid locking the runtime to a single hosted pipeline.
Why It Is Better Than Static Extraction