Skill

pdf-to-markdown

Convert any PDF into clean Markdown for your Coworker knowledge base. PyMuPDF extracts text locally (no AI reads every page). Claude maps the document structure in one pass. Python cleans and saves. Say "convert this PDF to markdown", "PDF to markdown", "ingest this PDF", "turn this PDF into a knowledge file", or "PDF to md".

Invocation

How this skill is triggered — by the user, by Claude, or both

Slash command

/pdf-to-markdown:pdf-to-markdown

User invocable

Model invocable

Inline context

Default effort

Context Preview

The summary Claude sees in its skill listing — used to decide when to auto-load this skill

Convert an uploaded PDF to clean Markdown — without reading every page through AI.

Supporting Files

references/pdf_to_md.py

SKILL.md

101 lines · ~880 tokens

Stats

LanguageTypeScript

Parent stars0

MaintenanceGood

Last CommitApr 24, 2026

Actions

View Source View Plugin View on GitHub View README

Stats

Actions

PDF to Markdown

Convert an uploaded PDF to clean Markdown — without reading every page through AI.

How it works: PyMuPDF pulls text locally (pure library, zero tokens). Claude sees only 500-char previews per page to map structure in one pass. Python does the cleanup. Output lands in your Coworker folder.

Setup (first run only)

pip install pymupdf --break-system-packages -q

Run this automatically at Step 1 if pymupdf is not yet installed.

Flow

Step 1 — Install dependency & locate PDF

Run: pip install pymupdf --break-system-packages -q

The uploaded PDF is at /path/from/uploads/<filename>.pdf. Confirm it exists. If no PDF is found, ask the user to re-upload before continuing.

Step 2 — Extract page previews (no AI)

python "{base_dir}/references/pdf_to_md.py" extract "<pdf_path>"

This prints JSON: [{"page_num": N, "preview": "...(500 chars)", "char_count": N}, ...] PyMuPDF does all the work. No tokens consumed.

Step 3 — Scan for image-only PDF

If more than 80% of pages have char_count < 50, the PDF is likely scanned/image-based. Stop and tell the user: "This PDF appears to be image-based. PyMuPDF can't extract text from scanned pages. Consider running it through an OCR tool first (e.g., Adobe Acrobat, Tesseract), then re-upload the text-based version."

Step 4 — Analyze structure inline

Read the page previews. Identify:

Front matter to skip: title page, copyright, ISBN, dedication, TOC
Back matter to skip: index, "Also by", about the author, acknowledgements
Repeating headers/footers: strings that appear at the top/bottom of most pages
Chapter pattern: regex that matches chapter headings (e.g. ^Chapter \d+)
Content range: first and last page of actual body content

Write your analysis to /tmp/pdf_structure.json:

{
  "skip_pages": [1, 2, 3],
  "content_start_page": 4,
  "content_end_page": 210,
  "header_pattern": "Book Title",
  "footer_pattern": "^\\d+$",
  "chapter_pattern": "^Chapter \\d+"
}

Use null for any field you cannot identify. Never guess — if uncertain, include the page rather than skip it.

Step 5 — Clean and assemble

python "{base_dir}/references/pdf_to_md.py" clean "<pdf_path>" /tmp/pdf_structure.json "<output_path>"

Output path: <workspace>/training/knowledge/<filename-slug>.md where <workspace> is the user's mounted Coworker folder.

Step 6 — Report and offer ingest

Show the user:

Pages processed / pages skipped
Output file size (chars)
Link to the saved .md file

Then ask: "Want me to ingest this into your knowledge index now?" If yes, trigger the file-master:ingest skill.

Guardrails

Never load full page text into context — 500-char previews only in Step 4
Never skip uncertain pages — include them; the user can edit later
Output goes to the Coworker workspace only — never leave the final file in /tmp
Large PDFs (300+ pages): warn that processing may take 30–60 seconds; proceed
Encoding errors: if the Python script errors on a page, skip that page and note it in the report

pdf-to-markdown

Invocation

Context Preview

Supporting Files

SKILL.md

pdf-to-markdown

Invocation

Context Preview

Supporting Files

SKILL.md

PDF to Markdown

Setup (first run only)

Flow

Guardrails

Similar Skills

PDF to Markdown

Setup (first run only)

Flow

Guardrails

Similar Skills