Skill

video-assemble

Assembles a final recap video by mixing narration audio over source video with ducking, rendering subtitles (SRT/ASS, optionally burned in), and loudness-normalizing. Last stage of the video-recap bundle.

FFmpeg

automation

Popularity

Stars

283

Forks

Invocation

How this skill is triggered — by the user, by Claude, or both

Slash command

/video-recap-skills:video-assemble

Not user invocable

Model invocable

Inline context

Default effort

Context Preview

The summary Claude sees in its skill listing — used to decide when to auto-load this skill

1. Mixes the narration audio segments onto the source video at their placed times.

Supporting Files

scripts/assemble.pyscripts/export_jianying.pyscripts/lib.pyscripts/timeline.py

SKILL.md

63 lines · ~1.2k tokens

Stats

LanguagePython

Stars283

Forks49

MaintenanceExcellent

Last CommitJun 18, 2026

Actions

View Source View Plugin View on GitHub View README

What this does

Mixes the narration audio segments onto the source video at their placed times.
Ducks the original audio under narration (fixed / sidechain / zone modes).
Renders subtitles from the narration placement → subtitles.srt (+ subtitles.ass when burning, which is on by default; --no-burn-subtitles to disable).
Optional final loudness normalization to a target LUFS.

Input contract

<video> — the source video (the original, or edited_source.mp4 in cut mode).
work_dir/tts_meta.json — {segments: [...]} from video-voiceover (each segment carries audio_path, timing, pause_after_ms, and overlaps_speech/placement used for ducking + subtitles).

Run

python3 scripts/assemble.py <video> --work-dir <work_dir> \
  [--recap-stem <name>] [--output-dir <dir>] [--no-burn-subtitles]
  [--source-video <orig.mp4>] [--export-jianying [--jianying-out <dir>]]

Output contract

recap_<stem>.mp4 — the final recap video (written to --output-dir or work_dir's parent). It is the stable output alias, overwritten in place on every run so iterating on the narration refreshes the same file.
work_dir/output.mp4 — the in-place render.
subtitles.srt — narration subtitles; subtitles.ass when burning subtitles (on by default).
timeline.json — backend-neutral multi-track model (video / original-audio / narration / BGM / subtitle tracks with ducking automation). Always written.
assembly_manifest.json — a slim render record: the input/source paths, the cut-mode source fingerprint (proving a stale ambient SOURCE_VIDEO did not leak into a full-mode export), the render settings, and the final output path.
剪映 draft folder (recap_<stem>/draft_content.json + draft_info.json + draft_meta_info.json) — only with --export-jianying.

Notes

Audio is mixed as tracks (like a cut-software timeline): the original audio, an optional BGM bed, and the narration.
Optional 剪映/JianYing export: --export-jianying (or EXPORT_JIANYING=1) turns timeline.json into an editable 剪映 draft — original clips, separate audio tracks, and volume keyframes for the ducking. Fully decoupled and lazy-imported: the ffmpeg render never depends on it, and 剪映 need not be installed. In cut mode pass --source-video <orig> so the draft references the real clips. Point --jianying-out at 剪映's drafts root to open it in-app. If a draft folder with the same name already has files, export writes a numbered sibling instead of overwriting it. Media is bundled into the draft folder by default (--jianying-no-bundle-media to reference in place) — this is required on macOS, where 剪映 is sandboxed and cannot read external paths. Note: the draft references the un-burned original, so the source's hardcoded subtitles are visible there (mask them in 剪映 if needed).
Subtitle look: SUBTITLE_FONT_SIZE, SUBTITLE_MARGIN_V, SUBTITLE_MAX_CHARS, etc.
Ducking / loudness: the original swells to IDLE_ORIG_VOLUME in the gaps and ducks to SPEECH_DUCKING_VOLUME under narration (DUCK_FADE_SECONDS smooths the transition); also DUCKING_MODE, ZONE_DUCKING_VOLUME, FINAL_LOUDNORM, TARGET_LUFS.
BGM (optional): set BGM_PATH to any audio file; it loops to length and ducks under narration (BGM_VOLUME / BGM_DUCKING_VOLUME).
Burning subtitles requires an ffmpeg with subtitles/libass support; assemble (and the recap orchestrator) preflight this and fail fast with a clear message if it is missing.
During original-audio blocks (the narration gaps), the original dialogue is also burned as subtitles so the band is never blank while the original speaks — wrapped in 「」 to set it apart from narration (SUBTITLE_ORIGINAL_IN_GAPS, default on). Preferred source is the agent-calibrated original_subtitles.json (OUTPUT-time [{start,end,text}]); without it, a conservative auto-ASR mapping is used (cut mode remaps ASR source→output via the clip plan, assigns each line to the one gap it lands in, and skips lines too dense to read).

What this skill does NOT do

Does NOT generate narration or synthesize TTS.
Does NOT re-transcribe or alter timing decisions — it consumes placement from tts_meta.json.
Burning subtitles is on by default (--no-burn-subtitles to turn it off); when on, it re-encodes the video to draw the subtitle band.

video-assemble

Popularity

Invocation

Context Preview

Supporting Files

SKILL.md

video-assemble

Popularity

Invocation

Context Preview

Supporting Files

SKILL.md

What this does

Input contract

Run

Output contract

Notes

What this skill does NOT do

Similar Skills

What this does

Input contract

Run

Output contract

Notes

What this skill does NOT do

Similar Skills