From paper-siphon
Convert academic PDFs to clean Markdown. Use whenever you encounter a PDF paper (local file or URL) that needs to be read, analyzed, or referenced as text.
How this skill is triggered — by the user, by Claude, or both
Slash command
/paper-siphon:paper-siphonThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
Convert academic PDFs into clean Markdown using **paper-siphon**. Runs via `uvx` — no installation needed.
Convert academic PDFs into clean Markdown using paper-siphon. Runs via uvx — no installation needed.
# Local PDF
uvx paper-siphon paper.pdf
# Remote PDF (e.g. arXiv)
uvx paper-siphon https://arxiv.org/pdf/1706.03762.pdf
# Custom output path
uvx paper-siphon paper.pdf -o paper-notes.md
By default, output is written to the same filename with a .md extension (e.g. paper.pdf → paper.md).
Paper Siphon has multiple extraction pipelines. Pick the right one based on the paper's complexity.
Best for: most papers — standard single/double-column text, simple tables, no heavy math.
uvx paper-siphon paper.pdf
This is the fastest option. It uses Docling to parse PDF structure, strips margin line numbers (common in journal proofs), and normalizes whitespace. Start here — it handles the majority of papers well.
--vlm)Best for: papers with complex layouts — multi-column figures interleaved with text, unusual formatting, scanned documents, or when the default pipeline produces garbled output.
uvx paper-siphon --vlm paper.pdf
This uses a vision-language model to interpret page images directly. It is significantly slower but handles visual complexity that pure text extraction misses. Use this when:
On Apple Silicon (M-series), this automatically uses MLX acceleration. To disable it:
uvx paper-siphon --vlm --no-mlx paper.pdf
To use the VLM pipeline with MLX dependencies explicitly included:
uvx --with 'paper-siphon[mlx]' paper-siphon --vlm paper.pdf
--enrich-formula)Best for: math-heavy papers where correct LaTeX rendering of equations matters.
uvx paper-siphon --enrich-formula paper.pdf
This post-processes extracted math expressions for better fidelity. Warning: resource-intensive — only enable when the paper's math content is important for the task at hand.
| Paper type | Command |
|---|---|
| Standard text-heavy paper | uvx paper-siphon paper.pdf |
| Complex layout / scanned PDF | uvx paper-siphon --vlm paper.pdf |
| Math-heavy paper | uvx paper-siphon --enrich-formula paper.pdf |
| Math-heavy + complex layout | uvx paper-siphon --vlm --enrich-formula paper.pdf |
Rule of thumb: try the default pipeline first. If the output is garbled or incomplete, escalate to --vlm.
uvx paper-siphon [OPTIONS] INPUT
| Option | Description |
|---|---|
INPUT | Path to a local PDF file, or a URL pointing directly to a PDF |
-o, --output PATH | Custom output file path (default: input filename with .md extension) |
--vlm | Use vision-language model pipeline for complex layouts |
--mlx / --no-mlx | Enable/disable Apple Silicon MLX acceleration (default: enabled when available) |
--enrich-formula | Enrich mathematical expressions (resource-intensive) |
-v, --verbose | Enable detailed debug logging |
uvx paper-siphon ./downloads/paper.pdfuvx paper-siphon https://arxiv.org/pdf/1706.03762.pdfFor arXiv, use the /pdf/ URL (not /abs/). For example:
https://arxiv.org/pdf/1706.03762.pdfhttps://arxiv.org/abs/1706.03762uvx paper-siphon https://arxiv.org/pdf/1706.03762.pdf -o attention.md
for f in papers/*.pdf; do uvx paper-siphon "$f"; done
scholar-search skilluvx paper-siphon https://arxiv.org/pdf/<id>.pdf -o paper.md
Use this skill when:
npx claudepluginhub mrshu/agent-skills --plugin paper-siphonTransforms academic PDFs into technical articles via MinerU Cloud API parsing of images, tables, formulas. Supports storytelling/academic/concise styles, optional formula explanations, GitHub code analysis. Outputs Markdown/HTML.
Builds full-paper Chinese-English side-by-side Markdown readers from PDF, DOI, arXiv, HTML, or pasted text, preserving figures, tables, and source anchors. Activates on paper reading/translation requests (e.g., "read this paper", "论文翻译").
Parses local files (PDF, DOCX, XLSX, HTML, etc.) into clean markdown on disk. Offers AI summaries and Q&A over document content.