Skill

ve-longform

Stage 10 of the video-essay pipeline. Expand a winning short into a 10-minute long-form video essay. Locks title + thumbnail first, then runs deep research → script → auto-critique → audio → images + QA → video assembly → publish. Use when the winner angle is set in packages.md and long-form video.mp4 is absent.

Invocation

How this skill is triggered — by the user, by Claude, or both

Slash command

/video-essay:ve-longform

User invocable

Model invocable

Inline context

Default effort

Context Preview

The summary Claude sees in its skill listing — used to decide when to auto-load this skill

Takes the winning angle's validated hook and expands it into a 10-minute video essay. The YouTube-native workflow: lock packaging first (title + thumbnail), then build content to deliver on the promise. Ends at `ve-publish-youtube`.

SKILL.md

157 lines · ~1.7k tokens

Stats

LanguagePython

Stars0

MaintenanceExcellent

Last CommitApr 17, 2026

Actions

View Source View Plugin View on GitHub View README

Stats

Actions

ve-longform — Stage 10

What this skill does

Working with the locked angle: the winner's research_hook_<winner>.md becomes the foundation for Stage 10b's deep research. The winning short's hook line and first-frame concept inform the long-form's title and thumbnail.

10a. Title & Thumbnail Lock

Title and thumbnail are the highest-leverage YouTube artifacts for long-form (unlike shorts where the first frame and cover do that work). Lock packaging first, then build content.

Draft 3 (title, thumbnail) pairs across at least two of the three proven archetypes:

Archetype A — Face-Driven Reaction (MrBeast / Veritasium covenant)
Archetype B — Text-Driven Claim (PolyMatter / Wendover / Mustard covenant)
Archetype C — Annotated Diagram (Johnny Harris / Real Engineering covenant)

Each pair uses this exact format in polish.md:

visual: <subject · composition · mood · palette · lighting — NO text content, the AI image model cannot render text reliably>
text_overlay: <REQUIRED 2–5 words, ≤20 chars total, ALL CAPS. use | to separate lines.>
text_position: upper-left | upper-right | lower-left | lower-right | center | center-bottom
text_color: white | yellow | red
text_stroke: black | white
archetype: face-driven | text-driven | annotated-diagram

Save the 3 pairs to $EPISODES_DIR/<slug>/polish.md (winner: null, judged_by: null).

Eval-optimize loop. Render each pair's thumbnail (with --text overlay applied), invoke the long-form references/judge_rubric.md LLM judge, refine, generate 3 variants of the winner. See the v3.1 SKILL.md history for the full Round-1/2/3 loop logic. Final outputs:

thumbnail.png (canonical, used by ve-publish-youtube)
thumbnail_V{1,2,3}__<title-slug>.png (titled variants kept for reference — title in filename per the durable rule: thumbnail + title is one indivisible artifact)

Checkpoint: user reviews the 3 variants (with title pairing in filenames) and picks one.

10b. Deep research scoped to the winning thesis

Run the deep-research thorough-mode flow per references/deep_research/SKILL.md, narrowed to the winning angle's thesis. Write to $EPISODES_DIR/<slug>/research_deep.md. Do not overwrite research_hook_<winner>.md from Stage 4 or research_angles.md from Stage 2 — three coexisting files (angles, hook, deep).

10c. Script draft + auto-critique

Read:

references/style_guide.md
At least one reference transcript from references/script_examples/ matching the intended mode (Wendover dense explainer / PolyMatter essayistic / Tom Scott narrative voice)
polish.md (locked title/thumbnail)
research_deep.md

Pick ONE cold-open model (declarative / setup-subvert / self-referential mystery) and ONE transition pattern (additive "but" / essayistic qualifier / first-person memory). Commit — don't mix.

Write $EPISODES_DIR/<slug>/script.md:

---
episode: <NNN>
slug: <slug>
title_working: "<locked title from polish.md>"
target_length_minutes: 10
target_words: 1500
wpm: 150
mode: dense_explainer | essayistic | narrative_voice
cold_open_model: declarative | setup_subvert | self_referential
transition_pattern: but_additive | essayistic_qualifier | first_person_memory
---

# Working Title

[VISUAL: opener — concrete scene]

First paragraph. Names something specific. Has a number or proper noun.

[VISUAL: next cue]

Next paragraph. Etc.

[VISUAL: ...] tags before the paragraph they accompany. Data-heavy cues marked [VISUAL: [HTML] ...] get rendered as HTML→PNG at Stage 10e.

Auto-critique substep. Run TWO passes:

Pass 1: Self-review — ctrl-F banned phrases, count words, spot-check facts against research_deep.md.
Pass 2: LLM retention judge — Agent invocation with references/script_judge_rubric.md, references/mrbeast_distillation.md, references/style_guide.md, the script, the locked thumbnail (Read the PNG into visual context). Two phases: retention simulation + prose quality. Retention failures take priority.

Apply judge fixes; present revised script to user.

Checkpoint: user reads, redlines, says "lock it."

10d. Generate audio

ve-audio <slug>

Writes audio.mp3. ElevenLabs voice from pipeline/config.toml (or script frontmatter voice: field).

Checkpoint: user plays first 30–60 sec, flags pronunciation issues.

10e. Generate visuals + QA

For the first episode in a new style (or unlocked style), run A/B test:

ve-images <slug> --compare

Then full run with chosen preset:

ve-images <slug> --style-preset <preset_name>

For HTML-marked cues, write $EPISODES_DIR/<slug>/html_assets/NNN.html and render via headless Chrome to overwrite the AI-generated images/NNN.png.

QA pass:

ve-qa-images <slug> --style-preset <preset_name> --fix

10f. Assemble video

ve-assemble <slug>

Writes video.mp4. Checkpoint: user scrubs through, flags timing issues.

10g. Publish long-form

Write $EPISODES_DIR/<slug>/description.md (hook line, 3–4 sentences setup, source bullet list from research_deep.md).

ve-publish-youtube <slug> --privacy public

Reads video.mp4, thumbnail.png, winning title from polish.md, description.md. Sets metadata, uploads custom thumbnail.

Checkpoint: confirm appears correctly in YouTube Studio.

Checkpoint discipline

Stage 10 has multiple checkpoints — never skip:

After 10a — thumbnail pick
After 10c — script lock
After 10d — audio review
After 10e — style A/B (if applicable)
After 10f — video assembled

Never auto-advance past these. Human judgment at every creative gate.

What to avoid (long-form specific)

Don't use AI image generation for text-heavy visuals — charts/diagrams/comparisons go through HTML→PNG.
Don't write bromide closings ("time is money"). Re-orient, don't recap.
Don't let AI voice leak through. If a sentence sounds like it could be in any LLM's blog post, rewrite it.
Don't invent visuals the user will have to defend. Every cue should be defensible to a skeptic.

ve-longform

Invocation

Context Preview

SKILL.md

ve-longform

Invocation

Context Preview

SKILL.md

ve-longform — Stage 10

What this skill does

10a. Title & Thumbnail Lock

10b. Deep research scoped to the winning thesis

10c. Script draft + auto-critique

10d. Generate audio

10e. Generate visuals + QA

10f. Assemble video

10g. Publish long-form

Checkpoint discipline

What to avoid (long-form specific)

Similar Skills

ve-longform — Stage 10

What this skill does

10a. Title & Thumbnail Lock

10b. Deep research scoped to the winning thesis

10c. Script draft + auto-critique

10d. Generate audio

10e. Generate visuals + QA

10f. Assemble video

10g. Publish long-form

Checkpoint discipline

What to avoid (long-form specific)

Similar Skills