From video-recap-skills
Generates Chinese-narration recap videos from source files. Orchestrates video understanding, narration writing, scene cutting, voiceover synthesis, and final assembly using a single MiMo API key and ffmpeg.
How this skill is triggered — by the user, by Claude, or both
Slash command
/video-recap-skills:video-recapThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
A thin orchestrator over five independent, self-contained skills (each in `skills/`, sharing only
A thin orchestrator over five independent, self-contained skills (each in skills/, sharing only
JSON/MP4 artifacts in a work_dir — no shared code):
video-understanding ─▶ (agent writes narration.json per video-script) ─▶ [video-cut] ─▶ video-voiceover ─▶ video-assemble
It is resume-safe: rerun the same command after writing narration.json to continue.
Phase B validates recap_run_manifest.json so an old work_dir from another source video or
different run settings is rejected instead of silently reusing stale narration. Understanding
artifacts are reused only when their provenance matches. For per-stage detail, read each skill's own SKILL.md.
# ffmpeg: brew install ffmpeg | apt install ffmpeg | choco install ffmpeg
export MIMO_API_KEY=*** # ONE key drives ASR + VLM + TTS (all MiMo)
The whole pipeline runs on ffmpeg + a single MiMo key: ASR (mimo-v2.5-asr), VLM (mimo-v2.5),
TTS (mimo-v2.5-tts). tp-* Token Plan keys default to the cn cluster (MIMO_TOKEN_PLAN_CLUSTER).
Optional MiMo scene-chunk video understanding: --mimo-video-overview.
Overridable defaults (zero-config otherwise): see references/config-playbook.md.
If you can identify the source (show, film, topic), research it before analyzing and write
work_dir/background_research.json (see video-understanding/references/research-guide.md).
video-understanding folds it into the VLM context, so scene analysis can name characters and read
scenes with plot knowledge instead of labelling everyone "黑衣男子". Skip it when you can't research.
python3 scripts/recap.py <video> --work-dir <work_dir> --context "背景"
Runs video-understanding (using background_research.json if you wrote it), writes
agent_narration_brief.md, and pauses. Then write work_dir/narration.json following the
video-script skill (read the brief first).
Cut mode (--edit-mode cut --target-duration 10m) also requires clip_plan.json.
Rerun the same command (narration.json now exists):
python3 scripts/recap.py <video> --work-dir <work_dir> # [--edit-mode cut] [--no-burn-subtitles]
This validates the narration, (cut: builds edited_source.mp4), synthesizes the voiceover, and
assembles recap_<name>.mp4.
python3 scripts/recap.py --doctor
recap_<video>.mp4 — final video · subtitles.srt / .ass — subtitleswork_dir/ — all intermediate artifacts (the inter-skill contract; see references/data-schema.md)--context, --scene-threshold, --style, --edit-mode {full,cut}, --target-duration,
--skip-asr, --mimo-video-overview, --consolidate, --consolidate-asr, --mimo-tts-voice,
--no-burn-subtitles (burn is on by default), --output-dir.
npx claudepluginhub worldwonderer/video-recap-skillsGenerate professional voiceover narration for a video with audio-video sync using Azure TTS by default, or Gemini 3.1 Flash TTS when configured. Use this skill whenever the user wants to add narration, voiceover, commentary, or voice dubbing to any video file — even if they just say "add audio to this video" or "make a narrated version." Also trigger when the user has a screen recording, demo, tutorial, or presentation video that needs a voice track. Trigger on Chinese requests like "视频配音", "给视频加旁白", "录屏解说", "视频加语音", "视频添加声音", "生成视频旁白", "自动配音", "视频解说词".
Writes and validates timestamped Chinese narration scripts for analyzed videos. Use after video-understanding produces analysis files. Outputs validated narration.json.
Generates and edits AI video via Hyper MCP: text/image-to-video (Sora, Veo, Seedance), scene chaining, analysis, transcription, subtitles, TikTok-style captions, voiceover, clipping, stitching, and text overlays.