From karpathy-wiki
Processes claimed captures into karpathy-wiki pages via spawned Claude ingester. Runs 9-step orientation: reads schema/index/log, extracts signals from capture, scores/rank/read candidates, decides create/augment/no-op.
How this skill is triggered — by the user, by Claude, or both
Slash command
/karpathy-wiki:karpathy-wiki-ingestThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
You are a detached `claude -p` ingester invoked by
You are a detached claude -p ingester invoked by
wiki-spawn-ingester.sh. Your job: process one already-claimed
capture into wiki pages and commit. The main agent does NOT read this
skill — it's loaded by your spawn prompt.
Before any wiki write, run this 9-step orientation protocol. The goal: your view of the wiki is shaped by the actual page content, not by titles alone.
<wiki>/schema.md — current categories, taxonomy, thresholds.<wiki>/index.md (or <wiki>/<category>/_index.md per category) — what pages exist with one-line summaries.<wiki>/log.md — recent activity.Extract candidate signals from the capture. From the body and frontmatter, gather:
tags: list (if any).chat-only and chat-attached captures: meaningful nouns and
proper-noun phrases from the body. Use judgment — you're an LLM,
not a regex; pick names, identifiers, technical terms, version
numbers.raw-direct captures: filename basename + the file's first
200 lines.Score candidates against the index. A page is a candidate if ANY extracted signal:
_index.md one-line summary.This is a deterministic substring + tag match — no embeddings, no semantic similarity. See "Why no embeddings / vector search" below for the rationale.
Pick up to 7 candidates ordered by:
Read all picked candidates in full. Zero is a valid count for a small or fresh wiki — see "Cold start" below.
Decide: create new page, augment an existing page, or no-op (the capture's content is already covered by an existing page). Write your decision rationale into the commit message later.
Issue reporting (during steps 5-7). While reading the index and
the candidate pages, observe issues. Append each as one JSONL line
to <wiki>/.ingest-issues.jsonl via bash scripts/wiki-issue-log.sh.
Do NOT fix issues inline; report only — wiki doctor consumes the
log later.
Example invocation:
bash "${CLAUDE_PLUGIN_ROOT}/scripts/wiki-issue-log.sh" \
--wiki "${WIKI_ROOT}" \
--ingester-run "${RUN_ID}" \
--capture "${WIKI_CAPTURE#${WIKI_ROOT}/}" \
--page "concepts/auth.md" \
--type broken-cross-link \
--severity warn \
--detail "Links to /concepts/legacy.md which does not exist." \
--suggested-action "Remove link or create stub"
Issue types to watch for (the --type enum in wiki-issue-log.sh):
/path/foo.md but that file
does not exist.type: concept (singular) but
directory is concepts/; page lacks required frontmatter; page
uses retired categories.quality.overall < 3.0 was rated by
ingester (not human)._index.md or
other page.Severity:
Each issue gets one JSONL line, ≤ 4096 bytes total. Over-length
--detail is auto-truncated by the helper.
When the candidate list from step 6 is empty or near-empty (the index
has fewer than 8 pages total), step 7 reads few or no pages. The role
guardrail in <wiki>/.wiki-config becomes the PRIMARY lens, not a
backup tripwire:
role: project or role: project-pointer: write specifics
about THIS codebase / instance / situation. Document the symptom,
the pinpoint, what code path triggered it. Do NOT generalize.role: main: write general patterns reusable across projects.
Do NOT name specific apps or instances; abstract them.In the cold-start state the index has insufficient gravity to shape the page; the role hint carries the lens. Defer cross-linking to a future ingester run when more pages exist.
The substring + tag match in step 5 is deterministic, scriptable, testable, and cheap. Embedding-based candidate selection is the "smart" upgrade but explicitly out of scope per CLAUDE.md ("do not add: vector search — defer until genuine scaling pain"). The substring/tag approach degrades gracefully — at large scale it hits more candidates than 7, but the ranking still produces a usable top-7. When that breaks, vector search becomes worth its complexity; not before.
3-7 extra file reads per ingestion (zero on cold-start wikis). Each page is typically 5-20 KB. The ingester is already reading the capture and the schema; this is the same order of magnitude. No measurable spawn-time impact.
At the very start of your work, generate a unique run_id and append
a "spawned" record to <wiki>/.ingest-runs.jsonl:
RUN_ID="in-$(date +%s)-$(openssl rand -hex 4 2>/dev/null || echo $$)"
ts="$(date -u +%Y-%m-%dT%H:%M:%SZ)"
mkdir -p "${WIKI_ROOT}/.locks"
{
flock 7
printf '{"run_id":"%s","capture":"%s","started_at":"%s","status":"spawned"}\n' \
"${RUN_ID}" "${WIKI_CAPTURE}" "${ts}" >> "${WIKI_ROOT}/.ingest-runs.jsonl"
} 7>"${WIKI_ROOT}/.locks/ingest-runs.lock"
At the END of your work (success or failure), append a closing record:
ts="$(date -u +%Y-%m-%dT%H:%M:%SZ)"
status="completed" # or "failed" if you exit non-zero
exit_code=0 # or your real exit code
{
flock 7
printf '{"run_id":"%s","ended_at":"%s","status":"%s","exit_code":%d}\n' \
"${RUN_ID}" "${ts}" "${status}" "${exit_code}" >> "${WIKI_ROOT}/.ingest-runs.jsonl"
} 7>"${WIKI_ROOT}/.locks/ingest-runs.lock"
The two records (spawned + closing) tie back via run_id. wiki status reads both files and surfaces asymmetric outcomes in both
mode (a fork that has a project record but no main record is
flagged).
If the ingester crashes between the two records, the spawned record
remains without a closing record. wiki status flags such records as
"in-flight or stalled" if they're > 30 minutes old.
On platforms without flock(1) (macOS), use the same noclobber-spin
fallback used by scripts/wiki-manifest-lock.sh. The 4 KB-per-line
JSONL discipline keeps the append atomic on POSIX even without a lock,
but the lock prevents any partial-write contention under high
concurrency.
Read <wiki>/.wiki-config to determine the wiki's role:
role: project or role: project-pointer → you are writing for a
PROJECT wiki. Document specifics: how this app handles X, where
bug Y lived, what we decided for this codebase. Do NOT generalize.role: main → you are writing for the MAIN wiki. Extract general
patterns reusable across projects. Do NOT name specific apps or
instances.The role field is the primary lens during cold start (≤ 7 pages in the wiki) and a sanity tripwire in mature wikis (the index pull is the primary lens once enough pages exist). See "Deep orientation — Cold start" above.
If the index pulls strongly toward instance-style or pattern-style writing (you read 5+ existing pages of one shape), trust the pull. The role hint then becomes a tripwire — "if you find yourself generalizing in a project wiki / specifying in a main wiki, stop."
See <plugin>/skills/karpathy-wiki-capture/references/capture-schema.md
for the canonical capture frontmatter contract — capture_kind enum,
body floors, evidence rules, legacy backward-compat. Do not duplicate
that schema here.
See references/page-conventions.md for the canonical page
frontmatter and cross-link conventions.
Before any other work, measure the capture body size in bytes (content
AFTER the closing --- of frontmatter). Apply the per-capture_kind
floor:
raw-direct → no floor (body is auto-generated boilerplate).chat-attached → 1000 bytes.chat-only → 1500 bytes.Legacy captures (no capture_kind): apply backward-compat per
<plugin>/skills/karpathy-wiki-capture/references/capture-schema.md.
If body is BELOW its floor:
needs_more_detail: true to the capture's frontmatter.needs_more_detail_reason: "body is <N> bytes; floor for capture_kind=<K> is <F> bytes"..md.processing → .md so the next session-start drain
re-presents it after the main agent expands.log.md: ## [<timestamp>] reject | <capture-basename> — body <N>b below <F>b floor.A thin-capture rejection is a feature, not a failure.
The capture is already claimed for you. Read ${WIKI_CAPTURE} (it's a .md.processing file). If for some reason ${WIKI_CAPTURE} is unset or missing, call wiki_capture_claim "${WIKI_ROOT}" to grab any pending capture as fallback.
Read the orientation files: schema.md, index.md, last 10 entries of log.md (per the Orientation section above).
Read the capture body.
Copy evidence with write-staging discipline (v2.4):
Atomic write to raw/ requires the staging dance (skip steps 1–2 and 7
if evidence_type is conversation AND the capture has no real path):
<wiki>/.raw-staging/<basename> (NOT directly to raw/).<wiki>/.locks/manifest.lock via bash scripts/wiki-manifest-lock.sh ... for the manifest write + rename block.raw/<basename> already exists with the same sha256, this is a duplicate — skip (apply the sha256 short-circuit below).raw/<basename> (origin, sha, copied_at, last_ingested, referenced_by)..manifest.json.tmp + os.rename) — wiki-manifest.py build already does this.<wiki>/.raw-staging/<basename> → <wiki>/raw/<basename> (POSIX rename is atomic on the same filesystem).<wiki>/inbox/<basename> (raw-direct via inbox queue), rm it.Crash recovery: if the ingester crashes between steps 1 and 7, a file
lingers in .raw-staging/. The SessionStart recovery scan SKIPS
.raw-staging/ (it's a reserved dot-prefixed directory). Future cleanup
is wiki doctor's responsibility.
chat-only / legacy conversation): write or update the manifest entry for the raw file in <wiki>/.manifest.json:
{
"raw/<basename>": {
"sha256": "<sha256 of raw file>",
"origin": "<exact value of capture's `evidence` field>",
"copied_at": "<iso-8601 utc, preserve if already present>",
"last_ingested": "<iso-8601 utc, now>",
"referenced_by": ["<pages added to this list below>"]
}
}
python3 scripts/wiki-manifest.py build "${WIKI_ROOT}" at the end of ingest to refresh sha256 and last_ingested. This is mandatory; the manifest is the drift-detection source of truth.origin is the capture's evidence field value — never the string "file", "conversation" (when a real path was available), "mixed", or the evidence_type/capture_kind. If capture_kind == "chat-only" AND the capture has no real path, origin is the literal string "conversation". Any other value is a validator failure.raw/<basename> already exists AND sha256(new) == manifest[raw/<basename>].sha256, the evidence content is identical — skip re-ingest of this capture and append ## [<timestamp>] skip | <capture-basename> — sha match, no-op to log.md. Archive the capture normally (step 10). This prevents re-ingesting the same research file twice when the capture-trigger fires on a near-duplicate.concepts/gemma4-27b-hardware-requirements.md vs new evidence covering a 27B+31B comparison), do NOT force-merge the broader content into the narrower-titled page. Instead, either (a) create a sibling concept page with a scope-appropriate slug (e.g. concepts/gemma4-27b-vs-31b-hardware-comparison.md) and cross-link both, or (b) rename the existing page's slug AND frontmatter title to cover the new scope, then merge — only if no other wiki page currently links to the old slug (if any do, use option (a) to avoid broken links). Log which option was taken in log.md.raw/<basename> already exists AND the new sha256 differs from the manifest entry AND the manifest's last_ingested is within the last 60 minutes (the evidence file on disk was replaced since the previous ingest), treat this as an overwrite situation: copy the new evidence to raw/<basename> AS NORMAL, but also append ## [<timestamp>] overwrite | <capture-basename> — raw sha changed since <previous_ingested_iso>, previous referenced_by: [<list>] to log.md. Proceed with the rest of step 4 and the title-scope check in step 6 as above. The overwrite is not an error — it is the exact scenario from the failure-mode transcript (two research agents both wrote to 2026-04-24-gemma4-hardware.md), and the title-scope check catches the content-divergence part.Decide target pages: suggested_pages is a hint; orientation may change it.
For each target page:
a. Acquire a page lock (wiki_lock_wait_and_acquire).
b. Read current page content (read-before-write).
c. Merge new material. Do NOT replace existing claims — add dated findings, use contradictions: frontmatter if they disagree.
d. Release lock (wiki_lock_release).
6.5. Self-rate every page you just touched. For each page, use the cheap model to score on four dimensions (1-5 each), compute overall as round(mean, 2), and write the following into the page's frontmatter (creating the quality: block if missing, preserving rated_by: human if the page already has it):
quality:
accuracy: <1-5>
completeness: <1-5>
signal: <1-5>
interlinking: <1-5>
overall: <float>
rated_at: "<ISO-8601 UTC now>"
rated_by: ingester
Rating criteria in one line each:
sources:?Never clobber rated_by: human. If the existing page has quality.rated_by == "human", skip this step for that page entirely. See references/page-conventions.md for the full quality block contract.
Update indexes via wiki-build-index.py. Do NOT write index.md or any _index.md directly. Instead, for each unique parent directory of a touched page (deduplicated from touched_pages), invoke:
for dir in "${TOUCHED_DIRS[@]}"; do
python3 "${CLAUDE_PLUGIN_ROOT}/scripts/wiki-build-index.py" \
--wiki-root "${WIKI_ROOT}" "${dir}"
done
The script regenerates _index.md in that directory and walks UP to ancestors (path-order locks, leaves first). Root MOC (index.md) is rebuilt automatically by the script if a top-level category was added or removed.
If the script exits non-zero (lock timeout, discovery failure), log the failure to log.md and continue. The next ingest catches up because indexes are a function of directory state.
7.5. Missed-cross-link check. Pass the freshly-edited page content AND the relevant _index.md content (the page's parent directory's index, NOT root index.md) to the cheap model with this prompt: "Identify any existing wiki page in _index.md that this page obviously should link to but currently does not. Return a list of (target-page-path, anchor-text) pairs, or an empty list. Do not propose new pages; only propose links to pages already in _index.md." For each returned pair, insert a markdown link at a relevant point in the page (or append to a ## See also section, creating it if absent), re-acquire the page lock, save, release. Re-validate.
7.6. Per-_index.md size threshold check. After step 7's invocation, check the size of every _index.md the script touched:
for idx in "${TOUCHED_INDEXES[@]}"; do
size="$(wc -c < "${idx}" | tr -d ' ')"
if [[ "${size}" -gt 8192 ]]; then
slug="$(echo "${idx#${WIKI_ROOT}/}" | tr '/' '-' | tr '.' '-')"
proposal_pattern="${WIKI_ROOT}/.wiki-pending/schema-proposals/*-${slug}-index-split.md"
if ! find ${proposal_pattern} -mtime -1 2>/dev/null | grep -q .; then
ts="$(date -u +%Y-%m-%dT%H-%M-%SZ)"
cat > "${WIKI_ROOT}/.wiki-pending/schema-proposals/${ts}-${slug}-index-split.md" <<EOF
---
title: "Schema proposal: split ${idx#${WIKI_ROOT}/} (size threshold exceeded)"
captured_at: "$(date -u +%Y-%m-%dT%H:%M:%SZ)"
trigger: "${idx#${WIKI_ROOT}/} size = ${size} bytes (threshold 8192 bytes)"
---
The sub-index file ${idx#${WIKI_ROOT}/} exceeded the 8 KB orientation-degradation threshold. Recommended: split this directory into sub-categories, OR consolidate scope. Root MOC is exempt from this threshold (capped via Rule 3 instead — see Category discipline).
EOF
fi
fi
done
The root index.md (small MOC built by _build_root_moc) is exempt from this 8 KB threshold. The MOC is bounded by Rule 3 (≥8 categories soft ceiling) instead — see Category discipline section.
Append to log.md.
If this wiki is a project wiki, decide propagation. A project wiki evaluates whether each capture is general-interest (useful across projects) or project-specific, using these criteria:
.wiki-pending/ with propagated_from: <project wiki path>.Main wiki ingestion is otherwise identical to project wiki ingestion.
Archive the capture from .processing to .wiki-pending/archive/YYYY-MM/: wiki_capture_archive "${WIKI_ROOT}" "${WIKI_CAPTURE}". The helper strips the .processing suffix on rename, so archived basenames end in .md.
Call auto-commit (from skill's base dir): bash scripts/wiki-commit.sh "${WIKI_ROOT}" "ingest: <capture title>"
Exit.
Before exiting, run the validator on every page you just touched:
for page in "${touched_pages[@]}"; do
python3 "${CLAUDE_PLUGIN_ROOT}/scripts/wiki-validate-page.py" \
--wiki-root "${WIKI_ROOT}" "${page}"
done
The validator checks:
title, type, tags, sources, created, updated) present.type is one of concept, entity, query.2026-04-24T13:00:00Z, not 2026-04-24).sources: is a flat list of strings (no nested mappings).quality.* field is present and in range.If the validator exits non-zero for any page, fix the mechanical issue and re-validate. The commit script enforces this in code (0.2.8 #3): wiki-commit.sh runs wiki-validate-page.py on every staged content page and refuses to commit if any page fails. If the commit step refuses, fix the failing page and re-attempt the commit — do not bypass by skipping wiki-commit.sh.
If a contradiction surfaces, add contradictions: frontmatter pointing to the conflicting page — do NOT resolve it during ingest. (Contradictions are a judgement call, not a validator violation.)
Additionally: after running the validator, also run wiki-lint-tags.py if it exists in the plugin. If it reports proposed new tags, drop a schema-proposal capture in .wiki-pending/schema-proposals/ and continue — do NOT rename tags inline.
raw/archive/, update references).index.md when it exceeds ~200 entries / 8KB / 2000 tokens — orientation degrades beyond that. Evidence: Chroma Context Rot research shows retrieval accuracy starts degrading around 1,000 tokens of preamble; Obsidian MOC practitioners cap at 25 items per MOC; Starmorph flags 100-200 pages as the scale-out point.When a threshold is reached, propose the restructure via a schema-proposal capture in .wiki-pending/schema-proposals/. Do NOT restructure during the current ingest.
Three rules govern how categories grow. Each has firing mechanism aimed at the agent during ingest, not at the user via status alarms.
Rule 1: Don't create a new category to file ONE page. Before mkdir-ing during ingest step 5 (decide-target-page), the agent checks: are there ≥3 pages I can place here, or one page that will grow to ≥3? If neither, place this page in an existing category instead. Mechanism: the cheap model's prompt at step 5 carries this rule explicitly. Decision-time prevention beats after-the-fact warning.
Rule 2: Sub-directory depth has a HARD cap of 4. Validator REJECTS any page placed at depth ≥5 (category/a/b/c/d/page.md). The cheap model is told at step 5 to place shallower if a deeper position would be required. Mechanism: validator exit non-zero. wiki-status.sh surfaces "categories exceeding depth 4" (always 0 if validator is doing its job).
Rule 3: ≥8 categories soft ceiling triggers schema-proposal. When current category count is already 8 and the cheap model wants to mkdir a 9th, it files a schema-proposal capture in .wiki-pending/schema-proposals/<timestamp>-9th-category-<name>.md instead of mkdir-ing. The current capture is filed in the existing-best-fit category for now. User reviews schema-proposal and can mkdir themselves to override. Mechanism: schema-proposal capture (not a hard reject — flexibility preserved). wiki-status.sh surfaces "category count vs soft-ceiling 8."
using-karpathy-wiki/SKILL.md) — single source of truth.karpathy-wiki-capture/SKILL.md.references/page-conventions.md.<plugin>/skills/karpathy-wiki-capture/references/capture-schema.md.npx claudepluginhub toolboxmd/karpathy-wiki --plugin karpathy-wikiIngests files and URLs into an Obsidian vault by extracting entities and concepts, creating or updating wiki pages with cross-references. Supports batch mode and multiple transport methods (CLI, MCP, filesystem).
Ingests new sources (papers, articles, URLs, files, transcripts) into a wiki: reads content, surfaces takeaways, writes summary pages, updates entities/concepts/index/log.
Ingests source files from raw/ into wiki: reads content, discusses takeaways, creates summary pages for sources/entities/concepts, updates index/log.