Skill

anchor

Use this skill when the user works with engineering documents — datasheets, leaflets, manuals, P&ID drawings — and wants to ingest them into a source-grounded knowledge base, query the structured contents, or build a workspace canvas where every value points back to its source page + bbox. ANCHOR has its OWN extraction pipeline (docling layout + OCR → per-page markdown and images → structured regions with bboxes). So when the user says "ingest this PDF", "read / OCR this document", "extract the specs", "what does the leaflet say about X", "place that spec on the canvas", or "wire this value into a simulation", drive it through ANCHOR — do NOT install OCR/PDF libraries or write your own parsing code.

Popularity

Parent stars

Invocation

How this skill is triggered — by the user, by Claude, or both

Slash command

/anchor:anchor

User invocable

Model invocable

Inline context

Default effort

Context Preview

The summary Claude sees in its skill listing — used to decide when to auto-load this skill

<!-- Generated by scripts/build_claude_plugin.py from src/anchor/skills/.

SKILL.md

290 lines · ~3.4k tokens

Stats

LanguagePython

Parent stars3

MaintenanceGood

Last CommitJun 12, 2026

Actions

View Source View Plugin View on GitHub View README

Stats

Actions

ANCHOR — agent-first engineering knowledge canvas

ANCHOR turns a folder of engineering documents into a structured, source-grounded knowledge base you can drive over MCP. Every value the agent quotes points back to a specific page region.

Use the pipeline — do not reinvent it

ANCHOR already does document extraction end to end. When the user wants a PDF read, OCR'd, ingested, or its contents extracted, run one command — anchor ingest <pdf> (CLI) or the ingest_pdf MCP tool — and read the result with the query ops below. The pipeline is:

bronze — docling layout analysis + OCR on the raw PDF
silver — per-page markdown + page PNGs
gold — structured regions (tables, specs, figures) each carrying a source bbox, the provenance every quoted value depends on

Do NOT pip install an OCR/PDF library (pytesseract, pdfplumber, pdf2image, unstructured, …) or write your own parsing. That bypasses the gold provenance ANCHOR exists to provide and duplicates work the tool already does. If anchor ingest errors, fix the cause (run anchor check — endpoint, key, model) rather than falling back to hand-rolled extraction.

Install ANCHOR

Install the tool before using this skill:

uv tool install anchor-kb
# Fallback:
pipx install anchor-kb

anchor install <harness> registers an installed ANCHOR tool with an AI harness. It does not install the tool itself.

Bronze and silver extraction run locally. Gold extraction requires ANCHOR_OPENAI_API_KEY; set the other ANCHOR_OPENAI_* variables for your provider as needed.

When to use

The user drops a PDF datasheet, leaflet, or manual and wants it readable.
The user asks "what does this document say about X" or looks up specs.
The user wants a spec table, document card, or region crop on a canvas.
The user wires a datasheet value into a simulation (FMU).
The user mentions a workspace folder or canvas.
The user asks "where does this number come from?" — provenance is the whole point.

Conventions

Always pass a workspace_slug. ANCHOR is multi-canvas; create one per question or project (canvas_create_workspace) and reuse it.
Provenance is the contract. When you place a spec value or quote a number, anchor it to its source via an edge carrying data.kind = "evidence" and data.source_ref = {page, bbox}. The system enforces this on anchored evidence edges.
Slug naming. Document slugs are filename-derived (lowercase, hyphenated). Canvas slugs are user-chosen, e.g. pump-analysis.
Don't re-ingest. list_documents() first; if the slug exists with has_gold: true, skip ingest unless the user asks for a fresh pass.

Live state

The canvas has SSE. If a browser tab is open at the same time, the user sees your changes appear live. The server is authoritative and serialises commands per workspace, so you don't need to coordinate with the browser.

Projects: a folder is the unit

A folder containing an anchor.toml (created by anchor init) is an ANCHOR project. It declares the data dir, the AI provider/data-zone, and the models. Run ANCHOR from inside that folder and every adapter resolves the project automatically. The CLI and anchor serve walk up from the working directory to find anchor.toml; anchor-mcp does the same, or name it explicitly with anchor-mcp --project <folder>. So a single MCP registration (anchor install claude-code, no --data-dir) works for every project: open the agent in the project folder and it targets that project, with no reinstall.

If you are unsure which project is active, run anchor from the folder you mean (or pass --project/ANCHOR_CONFIG). Don't pass --data-dir ~/anchor-data unless you specifically want the global default rather than the current project.

Set up a project (agent-drivable, like `npm init` / `uv init`)

You can scaffold ANCHOR in any folder non-interactively. anchor init accepts every choice as a flag, so no prompt blocks you:

# local-only (no document egress): no key, no endpoint
anchor init . --yes --provider local

# a named endpoint (Azure shown): the deployment name is the model
anchor init . --yes --provider azure \
  --base-url https://<resource>.openai.azure.com/openai/v1/ \
  --vision-model <deployment> --embed-model text-embedding-3-small

init self-corrects an Azure URL that is missing /openai/v1/. The API key is never written to anchor.toml. Set ANCHOR_OPENAI_API_KEY in the environment or a gitignored .env in the folder. Then verify before ingesting:

anchor check            # offline: prints the data zone, repairs a bad endpoint
anchor check --probe    # also makes one tiny call to confirm deployment + key

anchor check exits non-zero when something would break a real ingest, so you can gate on it. Register the MCP once with anchor install claude-code.

Where things live

Each project's data lives in its own data_dir (default <project>/anchor-data/ from anchor init, or the global ~/anchor-data/ when no project is found). ANCHOR_DATA_DIR or an explicit --data-dir <path> override it; the HTTP adapter uses the path passed to anchor serve.

bronze/ — raw PDFs
silver/<slug>/ — per-page markdown + page PNGs
gold/<slug>/ — structured regions with crops
canvases/<slug>/ — per-canvas durable state + events log

Extensions

Each ingestion or simulation domain ships its own skill section explaining its tools and a typical flow. The composer concatenates the enabled extensions below this section so you see only what's available in the current install.

Canvas tools — workspaces, nodes, edges

The canvas is the visible substrate humans and agents share. Each workspace is an isolated folder under canvases/. Edits land in real time on every connected client via SSE.

Tools

canvas_create_workspace(slug, title?) and canvas_list_workspaces().
canvas_get_state(workspace_slug) — full state for the workspace.
canvas_list_placeholders(workspace_slug) — every node flagged data.placeholder == true with its placeholder_hint. The entry point when the user says "fill in the specs I marked".
canvas_add_node(workspace_slug, node_type, label, x, y, data?).
canvas_update_node(workspace_slug, id, ...) and canvas_remove_node(workspace_slug, id).
canvas_add_edge(workspace_slug, source, target, edge_type?, data?) and canvas_remove_edge(workspace_slug, id).
canvas_clear(workspace_slug) — destructive; ask first.

Picking a node type

Node type	When to use
`document`	A whole PDF as a card on the canvas.
`spec`	A table of rows with values. Each row carries a `source_ref`.
`fact`	A free-form note tied to a source.
`image`	A region crop or screenshot.
`concept` / `entity`	Generic shapes for grouping or schematics.
`canvas`	A tile that links to a child canvas.

The full list and the data shapes live in the on-disk substrate docs; this is the shortlist of the ones agents touch most.

`anchor_pdfs` — ingest engineering PDFs

Bronze → silver → gold pipeline that turns each page into structured regions tagged with the page number and bounding box they came from.

Tools

ingest_pdf(pdf_path, slug?, skip_polish?, skip_regions?, force?) — runs the full pipeline. Idempotent: if the slug already has gold it returns {skipped: true} without recomputing (the gold stage is billed). Pass force=true (CLI --force) to re-ingest and overwrite.

There are TWO ingestion pathways; do not conflate them. ingest_pdf (CLI anchor ingest) is the BUILT-IN pipeline: Anchor's own configured LLM does the polish and region extraction, which needs an API key. The HARNESS-DRIVEN session protocol (ingest_begin through ingest_finalize, described below) is the no-key pathway where you, the agent, do that extraction work yourself. If the user asks for "harness ingestion", "agent-driven ingestion", or "no-key ingestion" by name, use the session protocol; calling anchor ingest is not the harness pathway.
search_documents(query, k?) — semantic search across every embedded gold region. Returns ranked hits with slug, page, region_id, text, score. This is how you "find stuff" in the documents by meaning, not by guessing a page. CLI anchor search "<query>", HTTP GET /api/search?q=…. Embeddings are created during ingest_pdf; if a doc was ingested without them, run embed first (anchor embed).
list_documents() — every document and its current status.
get_document_index(slug) — silver outline (sections, tables, figures).
get_gold_regions(slug, page?) — structured regions with page + bbox.
get_page_text(slug, page) — polished or raw page markdown.

Finding content — search first, then retrieve

To answer "what does this document say about X" or "find the pricing / the flow rate / the warranty", start with search_documents(query) — it ranks gold regions across all documents by meaning. Each hit already carries its slug, page, region_id, so follow up with get_gold_regions(slug, page=…) or get_page_text(slug, page) to read the full context and cite the page + bbox. Do not page through every region by hand or re-read whole documents when search points you at the right region directly.

Typical flow

When the user drops a PDF and asks for specs on the canvas:

list_documents() first — skip ingest if the slug is already golded.
ingest_pdf(pdf_path="/abs/path/to/datasheet.pdf") only if needed.
search_documents("flow rate") to locate the right region(s), or get_gold_regions(slug=..., page=2) when you already know the page.
Place a document node on the canvas via canvas_add_node.
Place a spec node whose data.rows reference the regions, and an anchored evidence edge from each row to the document node.

Common errors

404 / unknown slug → run list_documents() to see what's available.
400 / file is not a PDF → ANCHOR only ingests PDFs in this extension.
gold extraction skipped in the status → no ANCHOR_OPENAI_API_KEY set; silver is still queryable but regions aren't structured. With provider harness this is expected for ingest_pdf - use the harness-driven session protocol below instead.

Harness-driven ingestion (you are the extraction model, no API key)

Use this protocol when EITHER applies: the project's provider is harness (check anchor_status or anchor check), OR the user explicitly asks for harness/agent-driven/no-key ingestion, regardless of the configured provider. YOU are the extraction model. Under provider harness, ingest_pdf will not produce gold; drive the session protocol:

ingest_begin(pdf_path) - Anchor runs docling + page images and returns {session_id, page_count, pages[]}. If it returns resumed: true, some pages are already done - check pages[].status.
Per page: ingest_get_page(session_id, page) gives the page image (read the returned path with your file tools), the raw markdown, and candidates (docling boxes with stable ids). Follow the returned instructions. Then ingest_submit_page(session_id, page, polished_md, regions).
- Name region geometry with member_item_ids: ["p3-i0", "p3-i1"]; the server computes the bbox. Use approx_bbox only when no candidate covers a visual.
- A rejection returns errors naming the bad fields; fix and resubmit (resubmitting a page replaces it).
For documents over ~4 pages, fan out: spawn subagents, each given the session_id and a contiguous batch of 3-5 pages, returning only the submit verdicts. Pages are independent; submits are idempotent.
When ingest_status(session_id) shows nothing remaining, call ingest_finalize(session_id, declared_model="<your model id>") - Anchor embeds locally and publishes gold atomically.
Interrupted or fresh context? ingest_status(slug="<doc>") shows pages done/remaining; continue from there. ingest_abort discards staging.

Write descriptions that would rank well in semantic search: name the quantities and entities ("Max flow, head and motor sizes for LKH-5 to LKH-90"), not vague labels ("a table with numbers").

CLI parity for shell-only harnesses: anchor ingest-session begin|get-page|submit-page|status|finalize|abort (JSON in/out). Always pass --data-dir <data_dir from the begin work order> on every command: the default resolves from the directory you invoke from, so a cd between sibling commands silently switches projects and the session appears unknown.

anchor

Popularity

Invocation

Context Preview

SKILL.md

anchor

Popularity

Invocation

Context Preview

SKILL.md

ANCHOR — agent-first engineering knowledge canvas

Use the pipeline — do not reinvent it

Install ANCHOR

When to use

Conventions

Live state

Projects: a folder is the unit

Set up a project (agent-drivable, like npm init / uv init)

Where things live

Extensions

Canvas tools — workspaces, nodes, edges

Tools

Picking a node type

anchor_pdfs — ingest engineering PDFs

Tools

Finding content — search first, then retrieve

Typical flow

Common errors

Harness-driven ingestion (you are the extraction model, no API key)

Similar Skills

ANCHOR — agent-first engineering knowledge canvas

Use the pipeline — do not reinvent it

Install ANCHOR

When to use

Conventions

Live state

Projects: a folder is the unit

Set up a project (agent-drivable, like npm init / uv init)

Where things live

Extensions

Canvas tools — workspaces, nodes, edges

Tools

Picking a node type

anchor_pdfs — ingest engineering PDFs

Tools

Finding content — search first, then retrieve

Typical flow

Common errors

Harness-driven ingestion (you are the extraction model, no API key)

Similar Skills

Set up a project (agent-drivable, like `npm init` / `uv init`)

`anchor_pdfs` — ingest engineering PDFs

Set up a project (agent-drivable, like `npm init` / `uv init`)

`anchor_pdfs` — ingest engineering PDFs