From pdf-to-markdown
Convert any PDF into clean Markdown for your Coworker knowledge base. PyMuPDF extracts text locally (no AI reads every page). Claude maps the document structure in one pass. Python cleans and saves. Say "convert this PDF to markdown", "PDF to markdown", "ingest this PDF", "turn this PDF into a knowledge file", or "PDF to md".
How this skill is triggered — by the user, by Claude, or both
Slash command
/pdf-to-markdown:pdf-to-markdownThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
Convert an uploaded PDF to clean Markdown — without reading every page through AI.
Convert an uploaded PDF to clean Markdown — without reading every page through AI.
How it works: PyMuPDF pulls text locally (pure library, zero tokens). Claude sees only 500-char previews per page to map structure in one pass. Python does the cleanup. Output lands in your Coworker folder.
pip install pymupdf --break-system-packages -q
Run this automatically at Step 1 if pymupdf is not yet installed.
Step 1 — Install dependency & locate PDF
Run: pip install pymupdf --break-system-packages -q
The uploaded PDF is at /path/from/uploads/<filename>.pdf. Confirm it exists.
If no PDF is found, ask the user to re-upload before continuing.
Step 2 — Extract page previews (no AI)
python "{base_dir}/references/pdf_to_md.py" extract "<pdf_path>"
This prints JSON: [{"page_num": N, "preview": "...(500 chars)", "char_count": N}, ...]
PyMuPDF does all the work. No tokens consumed.
Step 3 — Scan for image-only PDF
If more than 80% of pages have char_count < 50, the PDF is likely scanned/image-based.
Stop and tell the user: "This PDF appears to be image-based. PyMuPDF can't extract text
from scanned pages. Consider running it through an OCR tool first (e.g., Adobe Acrobat,
Tesseract), then re-upload the text-based version."
Step 4 — Analyze structure inline
Read the page previews. Identify:
^Chapter \d+)Write your analysis to /tmp/pdf_structure.json:
{
"skip_pages": [1, 2, 3],
"content_start_page": 4,
"content_end_page": 210,
"header_pattern": "Book Title",
"footer_pattern": "^\\d+$",
"chapter_pattern": "^Chapter \\d+"
}
Use null for any field you cannot identify. Never guess — if uncertain, include
the page rather than skip it.
Step 5 — Clean and assemble
python "{base_dir}/references/pdf_to_md.py" clean "<pdf_path>" /tmp/pdf_structure.json "<output_path>"
Output path: <workspace>/training/knowledge/<filename-slug>.md
where <workspace> is the user's mounted Coworker folder.
Step 6 — Report and offer ingest
Show the user:
.md fileThen ask: "Want me to ingest this into your knowledge index now?"
If yes, trigger the file-master:ingest skill.
Creates, edits, and optimizes skills for Claude Code, including drafting, evaluating with test prompts, iterating on performance, and improving skill descriptions for better triggering accuracy.
npx claudepluginhub heymitch/vip-accelerator --plugin pdf-to-markdown