From ANCHOR
Use this skill when the user works with engineering documents — datasheets, leaflets, manuals, P&ID drawings — and wants to ingest them into a source-grounded knowledge base, query the structured contents, or build a workspace canvas where every value points back to its source page + bbox. ANCHOR has its OWN extraction pipeline (docling layout + OCR → per-page markdown and images → structured regions with bboxes). So when the user says "ingest this PDF", "read / OCR this document", "extract the specs", "what does the leaflet say about X", "place that spec on the canvas", or "wire this value into a simulation", drive it through ANCHOR — do NOT install OCR/PDF libraries or write your own parsing code.
How this skill is triggered — by the user, by Claude, or both
Slash command
/anchor:anchorThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
<!-- Generated by scripts/build_claude_plugin.py from src/anchor/skills/.
ANCHOR turns a folder of engineering documents into a structured, source-grounded knowledge base you can drive over MCP. Every value the agent quotes points back to a specific page region.
ANCHOR already does document extraction end to end. When the user wants a PDF
read, OCR'd, ingested, or its contents extracted, run one command —
anchor ingest <pdf> (CLI) or the ingest_pdf MCP tool — and read the result
with the query ops below. The pipeline is:
bbox, the provenance every quoted value depends onDo NOT pip install an OCR/PDF library (pytesseract, pdfplumber,
pdf2image, unstructured, …) or write your own parsing. That bypasses the gold
provenance ANCHOR exists to provide and duplicates work the tool already does.
If anchor ingest errors, fix the cause (run anchor check — endpoint, key,
model) rather than falling back to hand-rolled extraction.
Install the tool before using this skill:
uv tool install anchor-kb
# Fallback:
pipx install anchor-kb
anchor install <harness> registers an installed ANCHOR tool with an AI
harness. It does not install the tool itself.
Bronze and silver extraction run locally. Gold extraction requires
ANCHOR_OPENAI_API_KEY; set the other ANCHOR_OPENAI_* variables for your
provider as needed.
workspace_slug. ANCHOR is multi-canvas; create one
per question or project (canvas_create_workspace) and reuse it.data.kind = "evidence" and data.source_ref = {page, bbox}. The
system enforces this on anchored evidence edges.pump-analysis.list_documents() first; if the slug exists with
has_gold: true, skip ingest unless the user asks for a fresh pass.The canvas has SSE. If a browser tab is open at the same time, the user sees your changes appear live. The server is authoritative and serialises commands per workspace, so you don't need to coordinate with the browser.
A folder containing an anchor.toml (created by anchor init) is an ANCHOR
project. It declares the data dir, the AI provider/data-zone, and the models.
Run ANCHOR from inside that folder and every adapter resolves the project
automatically. The CLI and anchor serve walk up from the working directory to
find anchor.toml; anchor-mcp does the same, or name it explicitly with
anchor-mcp --project <folder>. So a single MCP registration
(anchor install claude-code, no --data-dir) works for every project: open
the agent in the project folder and it targets that project, with no reinstall.
If you are unsure which project is active, run anchor from the folder you mean
(or pass --project/ANCHOR_CONFIG). Don't pass --data-dir ~/anchor-data
unless you specifically want the global default rather than the current project.
npm init / uv init)You can scaffold ANCHOR in any folder non-interactively. anchor init accepts
every choice as a flag, so no prompt blocks you:
# local-only (no document egress): no key, no endpoint
anchor init . --yes --provider local
# a named endpoint (Azure shown): the deployment name is the model
anchor init . --yes --provider azure \
--base-url https://<resource>.openai.azure.com/openai/v1/ \
--vision-model <deployment> --embed-model text-embedding-3-small
init self-corrects an Azure URL that is missing /openai/v1/. The API key is
never written to anchor.toml. Set ANCHOR_OPENAI_API_KEY in the environment
or a gitignored .env in the folder. Then verify before ingesting:
anchor check # offline: prints the data zone, repairs a bad endpoint
anchor check --probe # also makes one tiny call to confirm deployment + key
anchor check exits non-zero when something would break a real ingest, so you
can gate on it. Register the MCP once with anchor install claude-code.
Each project's data lives in its own data_dir (default <project>/anchor-data/
from anchor init, or the global ~/anchor-data/ when no project is found).
ANCHOR_DATA_DIR or an explicit --data-dir <path> override it; the HTTP
adapter uses the path passed to anchor serve.
bronze/ — raw PDFssilver/<slug>/ — per-page markdown + page PNGsgold/<slug>/ — structured regions with cropscanvases/<slug>/ — per-canvas durable state + events logEach ingestion or simulation domain ships its own skill section explaining its tools and a typical flow. The composer concatenates the enabled extensions below this section so you see only what's available in the current install.
The canvas is the visible substrate humans and agents share. Each
workspace is an isolated folder under canvases/. Edits land in
real time on every connected client via SSE.
canvas_create_workspace(slug, title?) and canvas_list_workspaces().canvas_get_state(workspace_slug) — full state for the workspace.canvas_list_placeholders(workspace_slug) — every node flagged
data.placeholder == true with its placeholder_hint. The entry
point when the user says "fill in the specs I marked".canvas_add_node(workspace_slug, node_type, label, x, y, data?).canvas_update_node(workspace_slug, id, ...) and
canvas_remove_node(workspace_slug, id).canvas_add_edge(workspace_slug, source, target, edge_type?, data?)
and canvas_remove_edge(workspace_slug, id).canvas_clear(workspace_slug) — destructive; ask first.| Node type | When to use |
|---|---|
document | A whole PDF as a card on the canvas. |
spec | A table of rows with values. Each row carries a source_ref. |
fact | A free-form note tied to a source. |
image | A region crop or screenshot. |
concept / entity | Generic shapes for grouping or schematics. |
canvas | A tile that links to a child canvas. |
The full list and the data shapes live in the on-disk substrate docs; this is the shortlist of the ones agents touch most.
anchor_pdfs — ingest engineering PDFsBronze → silver → gold pipeline that turns each page into structured regions tagged with the page number and bounding box they came from.
ingest_pdf(pdf_path, slug?, skip_polish?, skip_regions?, force?) — runs the
full pipeline. Idempotent: if the slug already has gold it returns
{skipped: true} without recomputing (the gold stage is billed). Pass
force=true (CLI --force) to re-ingest and overwrite.
There are TWO ingestion pathways; do not conflate them. ingest_pdf
(CLI anchor ingest) is the BUILT-IN pipeline: Anchor's own configured
LLM does the polish and region extraction, which needs an API key. The
HARNESS-DRIVEN session protocol (ingest_begin through
ingest_finalize, described below) is the no-key pathway where you,
the agent, do that extraction work yourself. If the user asks for
"harness ingestion", "agent-driven ingestion", or "no-key ingestion"
by name, use the session protocol; calling anchor ingest is not the
harness pathway.
search_documents(query, k?) — semantic search across every
embedded gold region. Returns ranked hits with slug, page, region_id, text, score. This is how you "find stuff" in the documents by meaning,
not by guessing a page. CLI anchor search "<query>", HTTP
GET /api/search?q=…. Embeddings are created during ingest_pdf; if a
doc was ingested without them, run embed first (anchor embed).
list_documents() — every document and its current status.
get_document_index(slug) — silver outline (sections, tables, figures).
get_gold_regions(slug, page?) — structured regions with page + bbox.
get_page_text(slug, page) — polished or raw page markdown.
To answer "what does this document say about X" or "find the pricing /
the flow rate / the warranty", start with search_documents(query) —
it ranks gold regions across all documents by meaning. Each hit already
carries its slug, page, region_id, so follow up with
get_gold_regions(slug, page=…) or get_page_text(slug, page) to read
the full context and cite the page + bbox. Do not page through every
region by hand or re-read whole documents when search points you at the
right region directly.
When the user drops a PDF and asks for specs on the canvas:
list_documents() first — skip ingest if the slug is already golded.ingest_pdf(pdf_path="/abs/path/to/datasheet.pdf") only if needed.search_documents("flow rate") to locate the right region(s), or
get_gold_regions(slug=..., page=2) when you already know the page.document node on the canvas via canvas_add_node.spec node whose data.rows reference the regions, and an
anchored evidence edge from each row to the document node.404 / unknown slug → run list_documents() to see what's available.400 / file is not a PDF → ANCHOR only ingests PDFs in this extension.gold extraction skipped in the status → no ANCHOR_OPENAI_API_KEY
set; silver is still queryable but regions aren't structured. With
provider harness this is expected for ingest_pdf - use the
harness-driven session protocol below instead.Use this protocol when EITHER applies: the project's provider is
harness (check anchor_status or anchor check), OR the user
explicitly asks for harness/agent-driven/no-key ingestion, regardless
of the configured provider. YOU are the extraction model. Under
provider harness, ingest_pdf will not produce gold; drive the
session protocol:
ingest_begin(pdf_path) - Anchor runs docling + page images and
returns {session_id, page_count, pages[]}. If it returns
resumed: true, some pages are already done - check pages[].status.ingest_get_page(session_id, page) gives the page image
(read the returned path with your file tools), the raw markdown, and
candidates (docling boxes with stable ids). Follow the returned
instructions. Then ingest_submit_page(session_id, page, polished_md, regions).
member_item_ids: ["p3-i0", "p3-i1"];
the server computes the bbox. Use approx_bbox only when no
candidate covers a visual.errors naming the bad fields; fix and
resubmit (resubmitting a page replaces it).session_id and a contiguous batch of 3-5 pages, returning only
the submit verdicts. Pages are independent; submits are idempotent.ingest_status(session_id) shows nothing remaining, call
ingest_finalize(session_id, declared_model="<your model id>") -
Anchor embeds locally and publishes gold atomically.ingest_status(slug="<doc>") shows
pages done/remaining; continue from there. ingest_abort discards
staging.Write descriptions that would rank well in semantic search: name the quantities and entities ("Max flow, head and motor sizes for LKH-5 to LKH-90"), not vague labels ("a table with numbers").
CLI parity for shell-only harnesses: anchor ingest-session begin|get-page|submit-page|status|finalize|abort (JSON in/out).
Always pass --data-dir <data_dir from the begin work order> on every
command: the default resolves from the directory you invoke from, so a
cd between sibling commands silently switches projects and the session
appears unknown.
npx claudepluginhub novia-rdi-seafaring/anchor --plugin anchorSearches MemPalace before answering questions about past work, people, projects, or prior decisions. Returns verbatim stored content instead of guessing from model memory.
Guides Payload CMS config (payload.config.ts), collections, fields, hooks, access control, APIs. Debugs validation errors, security, relationships, queries, transactions, hook behavior.
Implements vector databases with Pinecone, Weaviate, Qdrant, Milvus, pgvector for semantic search, RAG, recommendations, and similarity systems. Optimizes embeddings, indexing, and hybrid search.