Skill

video-recap

Generates Chinese-narration recap videos from source files. Orchestrates video understanding, narration writing, scene cutting, voiceover synthesis, and final assembly using a single MiMo API key and ffmpeg.

Python

Popularity

Stars

283

Forks

Invocation

How this skill is triggered — by the user, by Claude, or both

Slash command

/video-recap-skills:video-recap

User invocable

Model invocable

Inline context

Default effort

Context Preview

The summary Claude sees in its skill listing — used to decide when to auto-load this skill

A thin orchestrator over five independent, self-contained skills (each in `skills/`, sharing only

Supporting Files

references/config-playbook.mdreferences/data-schema.mdreferences/timeline-and-jianying.mdscripts/doctor.pyscripts/lib.pyscripts/recap.py

SKILL.md

90 lines · ~968 tokens

Stats

LanguagePython

Stars283

Forks49

MaintenanceExcellent

Last CommitJun 18, 2026

Actions

View Source View Plugin View on GitHub View README

What this is

A thin orchestrator over five independent, self-contained skills (each in skills/, sharing only JSON/MP4 artifacts in a work_dir — no shared code):

video-understanding ─▶ (agent writes narration.json per video-script) ─▶ [video-cut] ─▶ video-voiceover ─▶ video-assemble

It is resume-safe: rerun the same command after writing narration.json to continue. Phase B validates recap_run_manifest.json so an old work_dir from another source video or different run settings is rejected instead of silently reusing stale narration. Understanding artifacts are reused only when their provenance matches. For per-stage detail, read each skill's own SKILL.md.

Install / env

# ffmpeg: brew install ffmpeg | apt install ffmpeg | choco install ffmpeg
export MIMO_API_KEY=***          # ONE key drives ASR + VLM + TTS (all MiMo)

The whole pipeline runs on ffmpeg + a single MiMo key: ASR (mimo-v2.5-asr), VLM (mimo-v2.5), TTS (mimo-v2.5-tts). tp-* Token Plan keys default to the cn cluster (MIMO_TOKEN_PLAN_CLUSTER). Optional MiMo scene-chunk video understanding: --mimo-video-overview.

Overridable defaults (zero-config otherwise): see references/config-playbook.md.

Use

0. Research first (recommended)

If you can identify the source (show, film, topic), research it before analyzing and write work_dir/background_research.json (see video-understanding/references/research-guide.md). video-understanding folds it into the VLM context, so scene analysis can name characters and read scenes with plot knowledge instead of labelling everyone "黑衣男子". Skip it when you can't research.

1. Analyze → pause for narration

python3 scripts/recap.py <video> --work-dir <work_dir> --context "背景"

Runs video-understanding (using background_research.json if you wrote it), writes agent_narration_brief.md, and pauses. Then write work_dir/narration.json following the video-script skill (read the brief first). Cut mode (--edit-mode cut --target-duration 10m) also requires clip_plan.json.

2. Continue → produce the recap

Rerun the same command (narration.json now exists):

python3 scripts/recap.py <video> --work-dir <work_dir>          # [--edit-mode cut] [--no-burn-subtitles]

This validates the narration, (cut: builds edited_source.mp4), synthesizes the voiceover, and assembles recap_<name>.mp4.

Self-check

python3 scripts/recap.py --doctor

Output

recap_<video>.mp4 — final video · subtitles.srt / .ass — subtitles
work_dir/ — all intermediate artifacts (the inter-skill contract; see references/data-schema.md)

Options (passed through to the stage skills)

--context, --scene-threshold, --style, --edit-mode {full,cut}, --target-duration, --skip-asr, --mimo-video-overview, --consolidate, --consolidate-asr, --mimo-tts-voice, --no-burn-subtitles (burn is on by default), --output-dir.

What this skill does NOT do

Does NOT write narration.json / clip_plan.json — the agent authors those (see the video-script skill).
Does NOT hard-block on the narration review (advisory; validate.py is the hard gate).
Is NOT an unattended scheduler — it is human-in-the-loop and posts to no channel.
Shares NO code between stage skills — they communicate only through work_dir artifacts.

video-recap

Popularity

Invocation

Context Preview

Supporting Files

SKILL.md

video-recap

Popularity

Invocation

Context Preview

Supporting Files

SKILL.md

What this is

Install / env

Use

0. Research first (recommended)

1. Analyze → pause for narration

2. Continue → produce the recap

Self-check

Output

Options (passed through to the stage skills)

What this skill does NOT do

Similar Skills

What this is

Install / env

Use

0. Research first (recommended)

1. Analyze → pause for narration

2. Continue → produce the recap

Self-check

Output

Options (passed through to the stage skills)

What this skill does NOT do

Similar Skills