From pdf-simplifier
Convert a PDF into an interactive web reader with three reading modes (ELI5/Medium/Full), chapter navigation, and math rendering. Only activate when invoked directly via /pdf-simplifier.
How this skill is triggered — by the user, by Claude, or both
Slash command
/pdf-simplifier:pdf-simplifierThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
Converts a PDF into a self-contained static web reader with three reading modes (ELI5, Medium, Full), sidebar navigation, KaTeX math rendering, and extracted images.
assets/template/app.jsassets/template/index.htmlassets/template/styles.cssreferences/content-prompts.mdreferences/manual-structure-format.mdreferences/structure-detection.mdreferences/subagent-prompt.mdscripts/assemble_content.pyscripts/build_content_shell.pyscripts/detect_structure.pyscripts/extract_images.pyscripts/extract_text.pyscripts/render_template.pyscripts/validate_sections.pyConverts a PDF into a self-contained static web reader with three reading modes (ELI5, Medium, Full), sidebar navigation, KaTeX math rendering, and extracted images.
| Stage | What | Tool | Output |
|---|---|---|---|
| 1 | Extract text + fonts | scripts/extract_text.py | .state/pages.json |
| 2 | Extract images | scripts/extract_images.py | images/ + .state/images.json |
| 3 | Detect structure | scripts/detect_structure.py | .state/structure.json |
| 4 | Build content shell | scripts/build_content_shell.py | .state/content-shell.json + .state/sections/*.txt |
| 5 | Generate content | Claude (subagents) | .state/sections/*.json |
| 6 | Assemble site | scripts/validate_sections.py + scripts/assemble_content.py + scripts/render_template.py | content.json + index.html + styles.css + app.js |
The PDF path is provided as the skill argument (pdf_path). Verify the file exists:
test -f "{pdf_path}" && echo "OK" || echo "NOT FOUND"
If the file doesn't exist, report the error and stop.
Ask the user where they want the output directory. Do not assume a default — wait for their answer.
Optionally, ask if they have a manual structure file (a structure.json) they'd like to use.
python3 -c "import fitz; print(fitz.version)" 2>/dev/null || echo "MISSING"
If pymupdf is missing, ask the user:
pymupdf is required for PDF extraction. Install it with
pip install pymupdf?
If the output directory already exists, check .state/progress.json:
cat {output}/.state/progress.json 2>/dev/null
If it exists, report what's already done and offer to resume or start fresh.
python3 {skill_dir}/scripts/extract_text.py "{pdf_path}" "{output_dir}"
Read the output JSON to confirm page count and outline entries. Report to user:
Extracted text from {N} pages. Found {M} outline/bookmark entries.
python3 {skill_dir}/scripts/extract_images.py "{pdf_path}" "{output_dir}"
Report:
Extracted {N} images.
If the user provided a manual structure file, copy it to .state/structure.json instead of running detection:
cp "{manual_structure_path}" "{output_dir}/.state/structure.json"
Otherwise, run detection:
python3 {skill_dir}/scripts/detect_structure.py "{output_dir}/.state/pages.json" "{output_dir}"
USER INTERACTION POINT: Display the detected structure and ask for confirmation:
Detected structure (method: {method}):
Title: {title}
Chapter Sections Pages {ch.title} {len(ch.sections)} {first_page}-{last_page} ... ... ... Does this look correct? You can:
- Proceed with this structure
- Provide a manual structure file (see format with
/pdf-simplifier help structure)- Ask me to adjust specific chapters/sections
If the user wants adjustments, read references/manual-structure-format.md for the schema and help them create one.
python3 {skill_dir}/scripts/build_content_shell.py "{output_dir}"
Report:
Built content shell: {N} chapters, {M} sections. Math detected: {yes/no}.
This is the core stage where Claude generates the three reading modes.
USER INTERACTION POINT: Before starting, report:
Ready to generate content for {N} sections. This will use subagents to process sections in parallel. Proceed?
.state/progress.json to find which sections are already complete.state/content-shell.json to get the full section listSubagent dispatch - process 3-5 sections concurrently:
For each section, spawn an Agent with subagent_type: "general-purpose" and model: "sonnet".
Read the prompt template from references/subagent-prompt.md and fill in the variables:
{skill_dir} — path to this skill's directory{section.id}, {section.title}, {section.startPage}, {section.endPage} — from the content shell{hasMath} — from .state/content-shell.json meta{output_dir} — the user's chosen output directoryAfter each batch of subagents completes, update .state/progress.json yourself (do not have subagents write to this file — concurrent writes will corrupt it):
python3 -c "
import json
p = json.load(open('{output_dir}/.state/progress.json'))
p['completedSections'].extend([{list of completed section IDs from this batch}])
p['stage'] = 'generating'
json.dump(p, open('{output_dir}/.state/progress.json', 'w'))
"
Report progress periodically:
Generated {completed}/{total} sections...
Once all sections are generated:
Validate sections:
python3 {skill_dir}/scripts/validate_sections.py "{output_dir}"
If errors are reported, log them. Retry failed sections or proceed with what's available.
Assemble content.json:
python3 {skill_dir}/scripts/assemble_content.py "{output_dir}"
Render template:
Read .state/content-shell.json to get the title and meta.hasMath flag, then:
python3 {skill_dir}/scripts/render_template.py \
"{skill_dir}/assets/template" \
"{output_dir}" \
"{title}" \
"{has_math}"
This copies template files, safely substitutes the title, and bundles KaTeX locally when math is detected (falls back to CDN if download fails).
Update progress:
python3 -c "
import json
p = json.load(open('{output_dir}/.state/progress.json'))
p['stage'] = 'complete'
json.dump(p, open('{output_dir}/.state/progress.json', 'w'))
"
Report completion:
Reader generated at
{output_dir}/To view it:
cd {output_dir} && python3 -m http.server 8000Then open http://localhost:8000
When invoked on an existing output directory:
.state/progress.jsonstage is "complete" → offer to regenerate specific sections or start freshstage is "generating" → find incomplete sections and resume from therestage is "content-shell-built" → resume from Stage 5.state/pages.json exists but no structure.json → resume from Stage 3.state/pages.json exists and structure.json exists → resume from Stage 4{skill_dir} placeholder refers to the directory containing this SKILL.md file. Resolve it relative to the skill's location in the plugin.pymupdf (pip install pymupdf).npx claudepluginhub callumthomas/claude-tools --plugin pdf-simplifierDistills long books (PDF, EPUB, Markdown, HTML, plain text) chapter-by-chapter using parallel subagents, then synthesizes results into a concise structured overview or Claude Code skill.
Generates styled PDF books from directories of ordered markdown chapters using pandoc and weasyprint, with TOC, page numbers, and resolved inter-chapter links.
Processes academic PDFs into structured Obsidian literature notes and 9-node critical-thinking canvases. Useful for researchers who want to deeply read, summarize, or critique papers and save results in Obsidian.