From yt-ribosome
Build a tutor-facing index that maps every meaningful image in a book to **where
How this skill is triggered — by the user, by Claude, or both
Slash command
/yt-ribosome:epub-image-indexThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
Build a tutor-facing index that maps every meaningful image in a book to **where
Build a tutor-facing index that maps every meaningful image in a book to where it lives (a navigation breadcrumb) and what surrounds it (caption, heading, nearby text), so a teaching agent can pull the right visual when explaining a topic.
This is an agentic skill, not a one-shot program. A thin script
(epub_extract.py) does only the deterministic plumbing; you — the agent —
do the judgment: inspect how this particular book marks its images, choose a
tagging rule, and write the final index. EPUBs vary enormously (textbooks,
novels, manga, papers), so a fixed classifier would be brittle. You have the
book in front of you — adapt to it.
EPUB extraction gives structure for free — every image already carries its breadcrumb and context. So there is no cost to keeping an image as long as it's well organized, and dropping risks losing something the tutor wanted.
Do not filter images out. Tag each with a type and keep them all. Even
inline math images are worth keeping as exact references (a faithful equation
image can beat a re-typeset LaTeX that transcribes wrong). The tutor agent
filters by type at use-time; that's a softer, reversible decision than
deleting at extraction time.
Run the extractor (deterministic; no judgment):
python3 "${CLAUDE_PLUGIN_ROOT}/skills/epub-image-index/scripts/epub_extract.py" "<EPUB_OR_DIR>" [--out-dir DIR]
For each book it writes <out>/<Book Title>/extract.json + images/ (every
referenced image copied once) and prints a one-line profile per book:
N unique images / M occurrences | K in <figure> | C captioned.
Inspect how this book marks images. Read the profile and skim a few
extract.json entries. Decide which signals are reliable for this book:
<figure> (K large)? Then in_figure is your
gold signal and bare <img> is mostly math/decoration.<img> (K≈0)? Then lean on dimensions and
captions instead.in_figure vs not, and at sample
captions, to pick any size threshold you need.Tag every image with a type — e.g. figure / equation / photo /
diagram / decorative / cover. Apply a per-book rule in bulk (read
extract.json, write a tiny snippet, or extend the extractor) — you are not
classifying images one by one, you are choosing the rule after seeing the
book. Keep all images; tagging organizes, it doesn't delete.
Write the tutor-facing index.json next to extract.json, keyed for
topic lookup (see schema). Organize by breadcrumb — it's free.
If the extractor errors on an unusual EPUB, it's a starting point, not sacred — inspect the file and adapt inline. Errors here are small and recoverable; handle them as you go.
source_image, image (copied path), width/height, and occurrences[] —
each occurrence has: breadcrumb (TOC path), nearest_heading, in_figure,
css_class, role (epub:type), alt, caption, context_before,
context_after, source_file, spine_index. Plus top-level book + toc.
<div>/<p>, not <figure>. When a book
wraps real figures in <figure>, that's the cleanest figure signal and bare
images are mostly equations/decoration. Don't assume every book does this.breadcrumb is authoritative location; the nearest DOM heading is a
finer hint that can differ from the TOC. Both are provided.cover/frontmatter, don't mix them into topic figures.index.json schema (tutor-facing)Top level: book, toc, figure_count (or image_count), and images[].
Each image: id, image (path), type (your tag), breadcrumb,
chapter/section (convenience), nearest_heading, caption, alt,
description (caption → alt → context snippet), context_before/after,
width/height, occurrences. Optionally a by_topic/by_breadcrumb reverse
index so the tutor can look up visuals for a topic quickly.
breadcrumb / caption /
context (or a reverse index), filtering by type (e.g. skip equation
unless a visual is wanted).images/<file> and narrate with the caption + breadcrumb
("Here's Figure 3.1 from Chapter 3 → Descriptive Statistics…").<!ENTITY>) to block XXE / billion-laughs — no third-party parser.fig*.jpg
filename assumptions, no fixed CSS classes — you supply the per-book judgment.epub-to-html (human reading); this skill targets the agent.scripts/epub_extract.py — thin deterministic extractor (run it). It is
an example/starting point, not a do-everything program: adapt it per book.npx claudepluginhub ssfskim/yt-ribosome --plugin yt-ribosomeCreates, edits, and optimizes skills for Claude Code, including drafting, evaluating with test prompts, iterating on performance, and improving skill descriptions for better triggering accuracy.