From pka-skills
ALWAYS use this skill when the user wants to process, route, OCR, index, or organize documents in their knowledge system. Triggers on: processing an inbox or team-inbox, routing dropped files to the right folders, OCR on PDFs or scanned documents, re-indexing a knowledge base, setting up a scanner (ScanSnap, iPhone), organizing files into a folder structure, inventorying what's in an inbox, extracting text from PDFs, or any document ingestion task within a PKA or personal knowledge system. Use this skill even when the user just says "process my inbox", "what's in the inbox", "route these docs", "re-index", "OCR these", "set up my scanner", or drops files and asks what to do with them. Handles transcript detection (holds .vtt/.srt files for meeting processing instead of routing them). Also runs lint / health checks on the knowledge base (orphan files, broken links, stale wiki sources, contradiction candidates) — triggered by "run a health check", "lint my knowledge base", "what needs attention", or "check for broken links". Works standalone or with pka-bootstrap.
How this skill is triggered — by the user, by Claude, or both
Slash command
/pka-skills:pka-librarianThis skill is limited to the following tools:
The summary Claude sees in its skill listing — used to decide when to auto-load this skill
Ingest, OCR, categorize, and index documents for a Personal Knowledge Assistance system.
Ingest, OCR, categorize, and index documents for a Personal Knowledge Assistance system.
.pkaignore if present; apply defaults if notCLAUDE.md ## Repo Map if present.pka/knowledge.db — determines whether to update SQLitereferences/inference-guide.md and infer routing from destination folder structureobsidian_present := directory_exists("./knowledge/.obsidian")hybrid_monorepo_present := file_exists("./.meta") AND directory_exists("./.git")
These are evaluated once per session and used to gate the additive behaviors at the end of routing.Before routing anything, scan team-inbox/ for transcript files.
Transcript detection patterns: *.vtt, *.srt, files matching *transcript*, GMT*.txt, *_recording.txt, *recording*.docx
If transcripts found:
<list>. Process these with pka-meetings or route them to a specific location?"pka-meetings: hand off and exit the librarian flow for those filesThis prevents transcripts from being buried in the knowledge base before reconciliation.
Check in order before any OCR attempt:
python3 -c "import pdfplumber" 2>/dev/null && echo "pdfplumber available"
python3 -c "import pypdf" 2>/dev/null && echo "pypdf available"
which tesseract 2>/dev/null && echo "tesseract available"
If nothing available:
pip install pdfplumber --break-system-packagesbrew install tesseract)" — add to report, never silently skipSee references/ocr-patterns.md for full detection and fallback strategy.
references/file-routing-rules.md..txt + SQLite ocr_text + search_fts; originals never modifiedfile_index, per-folder table, search_fts; append to session-log.mdobsidian_present and the destination is inside knowledge/) — see references/obsidian-routing.md for the per-route checklist. Conventions live in .pka/roles/_obsidian.md._MOC.md) — see references/pointer-layer.md for the cluster-discovery and append-only contract. Runs whether or not Obsidian is present; the retrieval value comes from FTS.hybrid_monorepo_present and the destination is inside a child repo) — see references/commit-protocol.md for the trigger and message format. Rules live in .pka/roles/_git-protocol.md. The pointer-row update rides along in the same commit as the route.owner-inbox/librarian-report-<YYYY-MM-DD>.md with counts, destinations, OCR status, unsorted items, and (when obsidian_present) any malformed-frontmatter files surfaced for user review, plus any pointer rows that hit the 8-file soft capUnsorted files → team-inbox/unsorted/, never silently discarded.
Run on "index my knowledge base" or after a project transition. Reads .md content, extracts PDF text, reads OCR sidecars; populates search_fts.content. This is what makes full-text search work — bootstrap intentionally defers it.
Diff file_index by path + modified_at; update changed entries only; report N added/updated/removed.
Trigger: "run a health check", "lint my knowledge base", "what needs attention", "check for broken links".
Runs a non-destructive scan producing owner-inbox/librarian-lint-<YYYY-MM-DD>.md with findings across 7 rule categories:
[[wikilinks]] to missing paths)Lint reports only — never auto-fixes. User acts on the report. See references/lint-rules.md for full rule definitions and references/cross-reference-maintenance.md for the back-reference model.
Variants:
team-inbox/ without explicit confirmationpka-meetingsobsidian_present, never modify a file's existing frontmatter fields — merge only. See .pka/roles/_obsidian.md.obsidian_present, never read file bodies during the Obsidian bootstrap (mechanical retrofit only). The lazy/per-route behavior may use frontmatter and routing context, but bootstrap uses filename + folder structure exclusively.hybrid_monorepo_present, auto-commits land only in child repos (knowledge/, projects/*). The root repo is never auto-committed; root-tracked side effects are staged and surfaced for user review.Provides UI/UX resources: 50+ styles, color palettes, font pairings, guidelines, charts for web/mobile across React, Next.js, Vue, Svelte, Tailwind, React Native, Flutter. Aids planning, building, reviewing interfaces.
Fetches up-to-date documentation from Context7 for libraries and frameworks like React, Next.js, Prisma. Use for setup questions, API references, and code examples.
npx claudepluginhub rappdw/pka-skills --plugin pka-skills