From yt-ribosome
This skill should be used when the user asks to "turn this YouTube video into a blog post", "make a full blog from a YouTube URL with images", "유튜브 영상을 블로그로 변환해줘", "video to blog", "embed slides into the transcript", or wants the transcript PLUS meaningful frame snapshots in an HTML page. Extracts frames by uniform sampling, deduplicates with perceptual hash, ranks with Gemini Flash against transcript context, and renders semantic HTML with clickable YouTube deep-links. For transcript-only output, use the `transcribe` skill instead.
How this skill is triggered — by the user, by Claude, or both
Slash command
/yt-ribosome:full-blog <youtube-url> [--out-dir DIR] [--ranker-model gemini-2.5-flash|gemini-2.0-flash] [--max-frames-per-video N] [--sample-interval N] [--batch-size N] [--workers N] [--max-cost-usd N] [--keep-temp] [--no-resume] [--force]<youtube-url> [--out-dir DIR] [--ranker-model gemini-2.5-flash|gemini-2.0-flash] [--max-frames-per-video N] [--sample-interval N] [--batch-size N] [--workers N] [--max-cost-usd N] [--keep-temp] [--no-resume] [--force]This skill is limited to the following tools:
The summary Claude sees in its skill listing — used to decide when to auto-load this skill
Turn a YouTube video or playlist into an HTML blog post: the transcript text plus
conftest.pyreferences/ranker-prompt.mdreferences/usage.mdrequirements-full-blog.txtscripts/frame_extract.pyscripts/frame_rank.pyscripts/full_blog.pyscripts/render_html.pyscripts/upgrade_frames.pytests/fixtures/SOURCE.txttests/fixtures/short_talk.mdtests/fixtures/short_talk.srttests/fixtures/slide_a.jpgtests/fixtures/slide_a_dup.jpgtests/fixtures/slide_b.jpgtests/test_e2e.pytests/test_frame_extract.pytests/test_frame_rank.pytests/test_render_html.pyTurn a YouTube video or playlist into an HTML blog post: the transcript text plus
~10–25 meaningful frame snapshots embedded inline at the moments they appear in
the source video. The output is a self-contained .html file plus an adjacent
folder of .jpg frames.
Run the bundled script — it does the whole pipeline.
python3 "${CLAUDE_PLUGIN_ROOT}/skills/full-blog/scripts/full_blog.py" "<URL>" [options]
${CLAUDE_PLUGIN_ROOT} is set automatically when this skill runs inside the
Claude Code plugin runtime. If invoking the script from a vanilla shell,
replace it with the absolute path to the repo root (e.g.
/path/to/yt-ribosome).
python3 here must be the interpreter you installed the deps into. On macOS,
/usr/bin/python3 and a Homebrew python3 use separate site-packages dirs;
if you hit ModuleNotFoundError, double-check which python3 matches the
one used for pip install -r requirements-full-blog.txt.
Per video, the script:
transcribe.py to produce .md + sentence-level .srt.yt-dlp to a temp dir.--sample-interval seconds (default 5) with ffmpeg.
Uniform sampling (vs. scene-cut) is intentional: pixel-based scene detection
misses slide transitions on lecture/code content where the template stays
the same and only text changes.<figure> blocks. Adjacent
figures inside the same paragraph are grouped into a .figure-row gallery.Cost target: ~$0.10 per 60-min video with gemini-2.5-flash. ~$0.03 with
gemini-2.0-flash.
yt-dlp, ffmpeg, and Python deps from
requirements-full-blog.txt. A Gemini API key
(GEMINI_API_KEY or GOOGLE_API_KEY) in env or .env in CWD.--out-dir DIR — where to write .html and image folders (default blogs).--ranker-model — gemini-2.5-flash (default) or gemini-2.0-flash (3×
cheaper).--max-frames-per-video N — final cap (default 25).--sample-interval N — seconds between uniform samples (default 5).
Lower = denser coverage + more Gemini cost; raise to 10 for long lectures
where you don't need slide-by-slide capture.--batch-size N — frames per Gemini call (default 10; lower if you hit
RPM rate limits).--workers N — parallel videos (default 2; Gemini RPM-aware).--keep-temp — preserve the /tmp/yt-ribosome-blog-*/ workdir after
each video (useful for debugging frame extraction or replaying renders
without re-downloading).--max-cost-usd N — soft ceiling on estimated total Gemini spend (default 1.00).--no-resume — don't reuse cached /tmp dirs from earlier runs (default off, i.e. reuse enabled).--force — overwrite existing .html.<h2>
section headings, a lead paragraph, and dividers._run_summary.json is written to the output directory.The script renders paragraphs flat — one <p data-srt-start="N"> per
transcript paragraph, with figures spliced in between. That's deliberate:
deciding the structure of a blog (topic boundaries, lead, hierarchy) is an
editorial judgement that belongs to you, not to the renderer.
For each .html the script produced, use Read + Edit to:
data-srt-start attribute
on each <p> gives you the second mark in the source video, so you can
group paragraphs by time + topic.<p class="lead">…</p> (drop cap is
automatic). If the first transcript paragraph is throat-clearing
("안녕하세요 여러분, 오늘은…"), tighten it into a 1–2 sentence hook.<h2> section headings at natural topic boundaries — usually
3–6 sections for a typical talk. Headings should be short (2–6 words) and
substantive (API의 본질, 실생활 비유, 왜 표준이 중요한가), not
sequential ("Part 1, Part 2"). Use the speaker's words where possible.<p> stays ~2–4 sentences. Preserve data-srt-start on the first
piece of a split paragraph; omit it on the continuation pieces.<hr class="divider"> between major sections only when the topic
really shifts (a triple-dot ornament; don't overuse).<section class="tail-section">; if absent (common when all
frames aligned cleanly), skip this step. Otherwise move each <figure>
in that section into the body section matching its data-timestamp, then
delete the empty tail <section>. (If a figure truly doesn't belong
anywhere, leaving it in the tail is fine — but try first.)<h1 class="post-title"> is the
raw YouTube title (often padded with prefixes like 01. or channel
noise). Rewrite it as a clean editorial title if it reads poorly.Do not:
<figure> elements other than their position in
the document. The src, alt, caption, data-timestamp, and deep-link
href are correct as-emitted.<head>, or page chrome. Only edit inside
<div class="post-body">.translate skill on the result (it preserves the structure).The CSS classes the template understands:
p.lead (drop-cap lead), h2 / h3 (sectioning), blockquote
(pull-quotes for memorable lines), hr.divider (triple-dot ornament),
ul/ol (lists), inline <code> for technical terms.
transcribe.py is required and runs first; full-blog will fail for videos
without captions and without an audio fallback (no transcript = no blog).translate skill — it recognizes
.html and translates only visible text + alt-text.[DEGRADED] in the summary.scripts/full_blog.py — orchestrator (run this; don't reimplement).scripts/frame_extract.py — uniform ffmpeg sampling + phash dedup.scripts/frame_rank.py — Gemini batched ranker.scripts/render_html.py — srt-paragraph alignment + HTML template.references/usage.md — options, prerequisites, troubleshooting.references/ranker-prompt.md — the Gemini ranker prompt (tunable).npx claudepluginhub ssfskim/yt-ribosome --plugin yt-ribosomeCreates, edits, and optimizes skills for Claude Code, including drafting, evaluating with test prompts, iterating on performance, and improving skill descriptions for better triggering accuracy.