Skill

full-blog

This skill should be used when the user asks to "turn this YouTube video into a blog post", "make a full blog from a YouTube URL with images", "유튜브 영상을 블로그로 변환해줘", "video to blog", "embed slides into the transcript", or wants the transcript PLUS meaningful frame snapshots in an HTML page. Extracts frames by uniform sampling, deduplicates with perceptual hash, ranks with Gemini Flash against transcript context, and renders semantic HTML with clickable YouTube deep-links. For transcript-only output, use the `transcribe` skill instead.

Invocation

How this skill is triggered — by the user, by Claude, or both

Slash command

/yt-ribosome:full-blog <youtube-url> [--out-dir DIR] [--ranker-model gemini-2.5-flash|gemini-2.0-flash] [--max-frames-per-video N] [--sample-interval N] [--batch-size N] [--workers N] [--max-cost-usd N] [--keep-temp] [--no-resume] [--force]

User invocable

Model invocable

Inline context

Default effort

Argument hint

<youtube-url> [--out-dir DIR] [--ranker-model gemini-2.5-flash|gemini-2.0-flash] [--max-frames-per-video N] [--sample-interval N] [--batch-size N] [--workers N] [--max-cost-usd N] [--keep-temp] [--no-resume] [--force]

Tool Access

This skill is limited to the following tools:

BashReadWriteEdit

Context Preview

The summary Claude sees in its skill listing — used to decide when to auto-load this skill

Turn a YouTube video or playlist into an HTML blog post: the transcript text plus

Supporting Files

conftest.pyreferences/ranker-prompt.mdreferences/usage.mdrequirements-full-blog.txtscripts/frame_extract.pyscripts/frame_rank.pyscripts/full_blog.pyscripts/render_html.pyscripts/upgrade_frames.pytests/fixtures/SOURCE.txttests/fixtures/short_talk.mdtests/fixtures/short_talk.srttests/fixtures/slide_a.jpgtests/fixtures/slide_a_dup.jpgtests/fixtures/slide_b.jpgtests/test_e2e.pytests/test_frame_extract.pytests/test_frame_rank.pytests/test_render_html.py

SKILL.md

157 lines · ~2.2k tokens

Stats

LanguagePython

Stars0

MaintenanceExcellent

Last CommitJun 1, 2026

Actions

View Source View Plugin View on GitHub View README

Stats

Actions

Transcribe YouTube to HTML blog with embedded frames

Turn a YouTube video or playlist into an HTML blog post: the transcript text plus ~10–25 meaningful frame snapshots embedded inline at the moments they appear in the source video. The output is a self-contained .html file plus an adjacent folder of .jpg frames.

How it works

Run the bundled script — it does the whole pipeline.

python3 "${CLAUDE_PLUGIN_ROOT}/skills/full-blog/scripts/full_blog.py" "<URL>" [options]

${CLAUDE_PLUGIN_ROOT} is set automatically when this skill runs inside the Claude Code plugin runtime. If invoking the script from a vanilla shell, replace it with the absolute path to the repo root (e.g. /path/to/yt-ribosome).

python3 here must be the interpreter you installed the deps into. On macOS, /usr/bin/python3 and a Homebrew python3 use separate site-packages dirs; if you hit ModuleNotFoundError, double-check which python3 matches the one used for pip install -r requirements-full-blog.txt.

Per video, the script:

Calls the existing transcribe.py to produce .md + sentence-level .srt.
Downloads the video with yt-dlp to a temp dir.
Samples one frame every --sample-interval seconds (default 5) with ffmpeg. Uniform sampling (vs. scene-cut) is intentional: pixel-based scene detection misses slide transitions on lecture/code content where the template stays the same and only text changes.
Deduplicates near-identical frames with imagehash phash (Hamming ≤ 5). Held slides collapse into a single representative.
Batches the survivors to Gemini Flash with the matching transcript window; Gemini filters talking-head / duplicate / low-value frames and writes alt-text + caption for the keepers.
Aligns each kept frame to the right markdown paragraph (token overlap of sentence-level srt cues) and emits HTML with <figure> blocks. Adjacent figures inside the same paragraph are grouped into a .figure-row gallery.

Cost target: ~$0.10 per 60-min video with gemini-2.5-flash. ~$0.03 with gemini-2.0-flash.

Steps

Confirm prerequisites. yt-dlp, ffmpeg, and Python deps from requirements-full-blog.txt. A Gemini API key (GEMINI_API_KEY or GOOGLE_API_KEY) in env or .env in CWD.
Choose options from the user's intent:
- --out-dir DIR — where to write .html and image folders (default blogs).
- --ranker-model — gemini-2.5-flash (default) or gemini-2.0-flash (3× cheaper).
- --max-frames-per-video N — final cap (default 25).
- --sample-interval N — seconds between uniform samples (default 5). Lower = denser coverage + more Gemini cost; raise to 10 for long lectures where you don't need slide-by-slide capture.
- --batch-size N — frames per Gemini call (default 10; lower if you hit RPM rate limits).
- --workers N — parallel videos (default 2; Gemini RPM-aware).
- --keep-temp — preserve the /tmp/yt-ribosome-blog-*/ workdir after each video (useful for debugging frame extraction or replaying renders without re-downloading).
- --max-cost-usd N — soft ceiling on estimated total Gemini spend (default 1.00).
- --no-resume — don't reuse cached /tmp dirs from earlier runs (default off, i.e. reuse enabled).
- --force — overwrite existing .html.
Run the script with the URL and options.
Restructure each HTML for readability (see "Restructure for readability" below). The script produces a scaffold — flat paragraphs in a styled template. You turn that scaffold into a real blog post by adding <h2> section headings, a lead paragraph, and dividers.
Report the per-video summary the script prints. The machine-readable _run_summary.json is written to the output directory.

Restructure for readability

The script renders paragraphs flat — one <p data-srt-start="N"> per transcript paragraph, with figures spliced in between. That's deliberate: deciding the structure of a blog (topic boundaries, lead, hierarchy) is an editorial judgement that belongs to you, not to the renderer.

For each .html the script produced, use Read + Edit to:

Read the file and skim the paragraphs. The data-srt-start attribute on each <p> gives you the second mark in the source video, so you can group paragraphs by time + topic.
Promote a lead paragraph. Take the first 1–3 sentences that frame the video's premise and wrap them as <p class="lead">…</p> (drop cap is automatic). If the first transcript paragraph is throat-clearing ("안녕하세요 여러분, 오늘은…"), tighten it into a 1–2 sentence hook.
Insert <h2> section headings at natural topic boundaries — usually 3–6 sections for a typical talk. Headings should be short (2–6 words) and substantive (API의 본질, 실생활 비유, 왜 표준이 중요한가), not sequential ("Part 1, Part 2"). Use the speaker's words where possible.
Split monolithic paragraphs. Transcript paragraphs are often 5–10 sentences glued together; break them at clear conversational pivots so each <p> stays ~2–4 sentences. Preserve data-srt-start on the first piece of a split paragraph; omit it on the continuation pieces.
Add <hr class="divider"> between major sections only when the topic really shifts (a triple-dot ornament; don't overuse).
Empty the "Additional frames" tail — only if one was emitted. Search the file for <section class="tail-section">; if absent (common when all frames aligned cleanly), skip this step. Otherwise move each <figure> in that section into the body section matching its data-timestamp, then delete the empty tail <section>. (If a figure truly doesn't belong anywhere, leaving it in the tail is fine — but try first.)
Polish the H1 if needed. The default <h1 class="post-title"> is the raw YouTube title (often padded with prefixes like 01. or channel noise). Rewrite it as a clean editorial title if it reads poorly.

Do not:

Rewrite the meaning of paragraphs. This is a transcript-faithful blog, not a summary. Tighten phrasing only where the transcript is obviously speech-disfluent.
Move, rename, or alter <figure> elements other than their position in the document. The src, alt, caption, data-timestamp, and deep-link href are correct as-emitted.
Touch the CSS, <head>, or page chrome. Only edit inside <div class="post-body">.
Translate. If a translation is needed, finish restructuring first, then run the translate skill on the result (it preserves the structure).

The CSS classes the template understands: p.lead (drop-cap lead), h2 / h3 (sectioning), blockquote (pull-quotes for memorable lines), hr.divider (triple-dot ornament), ul/ol (lists), inline <code> for technical terms.

Notes

transcribe.py is required and runs first; full-blog will fail for videos without captions and without an audio fallback (no transcript = no blog).
Translate the resulting HTML with the translate skill — it recognizes .html and translates only visible text + alt-text.
Frames that couldn't be aligned to any paragraph appear in an "Additional frames" tail section rather than being dropped.
When the Gemini ranker fails after all retries, the run continues in degraded mode: frames are evenly sampled from phash survivors and the output is tagged [DEGRADED] in the summary.

Resources

scripts/full_blog.py — orchestrator (run this; don't reimplement).
scripts/frame_extract.py — uniform ffmpeg sampling + phash dedup.
scripts/frame_rank.py — Gemini batched ranker.
scripts/render_html.py — srt-paragraph alignment + HTML template.
references/usage.md — options, prerequisites, troubleshooting.
references/ranker-prompt.md — the Gemini ranker prompt (tunable).

full-blog

Invocation

Tool Access

Context Preview

Supporting Files

SKILL.md

full-blog

Invocation

Tool Access

Context Preview

Supporting Files

SKILL.md

Transcribe YouTube to HTML blog with embedded frames

How it works

Steps

Restructure for readability

Notes

Resources

Similar Skills

Transcribe YouTube to HTML blog with embedded frames

How it works

Steps

Restructure for readability

Notes

Resources

Similar Skills