By hungson175
Ingest business documents (.pptx, .pdf, .xlsx, .docx) into FAITHFUL markdown for RAG/review — VLM-first (Vision-Language Model), not OCR. Extracts charts (to JSON), tables (to HTML), and verbatim text, and self-checks fidelity with an independent judge model. Validated 10/10 on real documents.
A Claude Code skill that ingests business documents — .pptx, .pdf, .xlsx, .docx — into faithful Markdown for a RAG knowledge base or human review.
It is VLM-first (Vision-Language Model), not OCR. Plain text extraction looks like it works but silently drops charts, stacked/racing bars, merged-cell "dashboard" spreadsheets, and mixed-language layout. This skill extracts semantic structure — charts → JSON/Markdown, tables → HTML, verbatim text, [UNCLEAR: …] on doubt — and only routes genuinely-visual pages to the VLM, keeping text-dominant pages on a cheap, exact native path.
It also ships a built-in faithfulness self-check: an independent judge model reads the original page image and the produced Markdown and grades fidelity (a different model from the ingester = an honest cross-check). This is the procedure that validated the method 10/10 on real documents (see skills/document-processor/references/validation.md).
As a Claude Code plugin:
/plugin marketplace add hungson175/document-processor-skill
/plugin install document-processor@document-processor-skill
Or copy the skill manually:
git clone https://github.com/hungson175/document-processor-skill.git
cp -r document-processor-skill/skills/document-processor ~/.claude/skills/
uv or any venv with langchain-openai, python-pptx, python-dotenv, httpx, pypdfium2).soffice) — renders .pptx/.xlsx/.docx pages.pdftotext, pdfinfo) — native PDF text + page info.OPENROUTER_API_KEY in the environment or ~/dev/.env — the VLM runs via OpenRouter.macOS headless LibreOffice can hang on Gatekeeper — render non-PDF formats on a Linux host; PDFs are fine anywhere.
# convert one file to faithful markdown
python skills/document-processor/scripts/ingest.py --in deck.pptx --out deck.md
Model tiers (DOC_VLM_MODEL):
medium (default) = qwen/qwen3-vl-32b-instruct — fast, cheap, faithful on charts/tables/text.flagship = qwen/qwen3-vl-235b-a22b-instruct — escalate only for dense same-shape cluster diagrams (honeycombs, packed icon grids) the medium model under-enumerates.small = qwen/qwen3-vl-8b-instruct.DOC_VLM_MODEL=flagship python skills/document-processor/scripts/ingest.py --in dense_diagrams.pdf --out out.md
Verify faithfulness (recommended): render → prepare_judge.py → independent judge (references/faithfulness_judge.md) → judge_summary.py --dir. Don't trust the VLM ingester blindly — the whole point of this skill is that it self-verifies with a different model. See skills/document-processor/SKILL.md and references/method_and_gotchas.md for the full method and gotchas.
MIT licensed.
Own this plugin?
Verify ownership to unlock analytics, metadata editing, and a verified badge. GitHub access is read-only (username + org membership).
Sign in to claimOwn this plugin?
Verify ownership to unlock analytics, metadata editing, and a verified badge. GitHub access is read-only (username + org membership).
Sign in to claimBased on adoption, maintenance, documentation, and repository signals. Not a security audit or endorsement.
npx claudepluginhub hungson175/document-processor-skill --plugin document-processorCreate ONE self-contained personal AI agent in its own folder, with a two-part brain: Claude Code file-based memory plus a knowledge base built with Andrej Karpathy's LLM-wiki method (append-only raw/ + distilled wiki/ with backlinks). The agent is standalone and reports to no one. Bundles tm-send, a hardened tmux send-keys for optional inter-agent messaging.
Document processing suite — Excel (xlsx), Word (docx), PowerPoint (pptx), and PDF generation and manipulation.
Create and edit Obsidian vault files including Markdown, Bases, and Canvas. Use when working with .md, .base, or .canvas files in an Obsidian vault.
Ultra-compressed communication mode. Cuts ~75% of tokens while keeping full technical accuracy by speaking like a caveman.
Frontend design skill for UI/UX implementation
Comprehensive UI/UX design plugin for mobile (iOS, Android, React Native) and web applications with design systems, accessibility, and modern patterns
Memory compression system for Claude Code - persist context across sessions