From doc-skills
Convert PDF documents to AI-accessible markdown format using IBM's Docling library. This skill should be used when the user needs to extract content from PDFs including text, figures, and tables in a structured markdown format. It handles scientific papers, technical documents, reports, and any PDF requiring content extraction for AI processing or analysis.
How this skill is triggered — by the user, by Claude, or both
Slash command
/doc-skills:docling-pdfThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
Convert PDF documents to structured markdown using IBM's Docling library. Extract complete document content including text, figures (as PNG files), and tables (as separate markdown files) in an AI-accessible format optimized for further processing and analysis.
Convert PDF documents to structured markdown using IBM's Docling library. Extract complete document content including text, figures (as PNG files), and tables (as separate markdown files) in an AI-accessible format optimized for further processing and analysis.
Use this skill when the user needs to:
Convert a PDF using the bundled script:
uv run --python 3.10 scripts/convert_pdf.py input.pdf output_folder
This produces:
full_document.md - Complete markdown with cleaned referencesfigures/ - Numbered PNG files (figure_001.png, figure_002.png, etc.)tables/ - Individual markdown tables (table_001.md, table_002.md, etc.)metadata.json - Document statistics and conversion timingThe script uses uv to manage dependencies automatically. No manual installation required. The script's inline metadata specifies all required packages.
Run the conversion script with the following syntax:
uv run --python 3.10 scripts/convert_pdf.py <pdf_file> <output_folder> [options]
Required arguments:
pdf_file - Path to the input PDF fileoutput_folder - Directory where output will be savedOptional arguments:
--image-resolution-scale F - Scale factor for extracted images (default: 2.0)Examples:
# Basic conversion
uv run --python 3.10 scripts/convert_pdf.py paper.pdf output/
The conversion creates a structured output directory:
output_folder/
├── full_document.md # Complete markdown (cleaned references)
├── figures/ # PNG images
│ ├── figure_001.png
│ ├── figure_002.png
│ └── ...
├── tables/ # Markdown tables
│ ├── table_001.md
│ ├── table_002.md
│ └── ...
└── metadata.json # Conversion statistics
Key features:
figures/ directoryThe scripts/convert_pdf.py script is a standalone Python script with inline dependencies that:
Important: The script is designed to be run with uv run which handles environment creation and dependency management automatically. Do not try to run it directly with python3 without first installing dependencies.
npx claudepluginhub aeghnnsw/cc-toolkit --plugin doc-skillsConvert PDF files to LLM-ready markdown, DocTags, or JSON using Docling. Handles analysis, summarization, OCR, and batch processing with token savings estimates.
Parses PDF, Office, and image files into structured Markdown using the MinerU API. Supports OCR, formula/table recognition, batch processing, and multi-format export (DOCX/HTML/LaTeX).
Converts files and office documents (PDF, DOCX, PPTX, XLSX, images with OCR, audio with transcription, HTML, CSV, JSON, XML, ZIP, YouTube URLs, EPubs) to Markdown using Microsoft MarkItDown.