From video-recap-skills
Writes and validates timestamped Chinese narration scripts for analyzed videos. Use after video-understanding produces analysis files. Outputs validated narration.json.
How this skill is triggered — by the user, by Claude, or both
Slash command
/video-recap-skills:video-scriptThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
Authoring + validation of the narration script. The **agent writes `work_dir/narration.json`**
Authoring + validation of the narration script. The agent writes work_dir/narration.json
following the rules below; then validate.py lints it against the understanding index, and in
full mode time-aligns it to quiet windows.
Read work_dir/agent_narration_brief.md (scenes, durations, quiet windows, char budget) first.
Digest long dialogue via asr_writing_chunks.json; judge "is there speech/a silent slot here?"
via timeline_fusion.json. Check raw vlm_analysis.json / asr_result.json for details.
All timestamps are original-video time.
[
{"start": 5.0, "end": 12.0, "narration": "解说文本。", "pause_after_ms": 250, "overlaps_speech": true}
]
| Field | Meaning |
|---|---|
start / end | narration start/end seconds (original-video time) |
narration | narration text |
pause_after_ms | pause after segment, default 250 (keeps a tight rhythm) |
overlaps_speech | overlaps original dialogue; default true for continuous-bed style, false only in true silence |
Optionally also author original_subtitles.json — [{start,end,text}] (OUTPUT time) — the calibrated
original dialogue burned during the original-audio gaps (ASR errors/names fixed, only what is actually
spoken there). Rendered in 「」 to set it apart from narration. If omitted, assemble falls back to a
conservative auto-ASR mapping. See the brief's 原声留白字幕 section.
overlaps_speech 默认 true;只有刻意放进真正静音空档的 beat 才设 false。(end - start - 0.25) × 3。--context 或 background_research.json 给出角色名时优先使用。A separate quality pass (LLM-as-judge), distinct from the mechanical lint below. Needs the chat API key (same as VLM).
python3 scripts/review.py --work-dir <work_dir>narration_review.md. For every error finding (ESPECIALLY category=hallucination — a claim
not grounded in the visual/ASR evidence), revise narration.json and re-run review until either:
OK with zero error findings, ORwork_dir/narration_review_override.md naming WHICH finding
(segment + category), WHY it is acceptable, and who signed off. Unaddressed error findings with
no override entry mean the draft is NOT ready.validate.py — the hard gate).GATE rule: review NEVER blocks the tooling (it leans on a flaky chat API and a re-render is cheap).
validate.py is the deterministic hard gate. The override log makes "we saw the finding and chose to
ship it" auditable — review.py / validate.py never read it; it is a record for the human in the loop.
Override block shape — work_dir/narration_review_override.md (append-only):
## Override — <date>
- Finding: segment 4 / category=hallucination
- Reviewer said: "‘他早已知情’无画面/对白依据"
- Decision: KEEP — grounded in the --context synopsis (s2 reveal); reviewer lacked that context.
- Signed: <agent/human>
python3 scripts/validate.py --work-dir <work_dir> --mode full # or --mode cut
Writes narration_lint.json; in full mode rewrites narration.json with quiet-window alignment.
Fix any lint errors and re-run until clean.
Before narration, write work_dir/clip_plan.json (original-time source ranges to keep), optionally
self-review it in clip_plan_review.md (agent-only; the tooling does not read it), then write
narration.json with timestamps that fall inside the kept clips. The video-cut skill maps both to
the shortened timeline. Validate with --mode cut.
{"target_duration": "10m", "clips": [{"start": 12.0, "end": 38.0, "reason": "冲突开端"}]}
剪辑模式写作要点(解说要对上剪后的画面,不是原片):
[start, end] 不要跨片段边界,否则会被裁到片段长度,配音就会念到剪掉的画面(--mode cut 会以 crosses_clip_boundary 警告)。source_clip_id,确保映射到正确片段。片名/题材明确但缺乏剧情上下文时,先按 背景调研指南 写
background_research.json再写解说——否则解说只能"看图说话"。brief 在 substrate 偏薄时会把密度目标降为上限而非配额:宁可少写、写实,也不要为凑数堆画面描述。
review.py does NOT edit narration.json and does NOT block the pipeline — it is advisory.validate.py does NOT rewrite the meaning of the text — it only checks/aligns timing and quiet windows.npx claudepluginhub worldwonderer/video-recap-skillsGenerates Chinese-narration recap videos from source files. Orchestrates video understanding, narration writing, scene cutting, voiceover synthesis, and final assembly using a single MiMo API key and ffmpeg.
Generate professional voiceover narration for a video with audio-video sync using Azure TTS by default, or Gemini 3.1 Flash TTS when configured. Use this skill whenever the user wants to add narration, voiceover, commentary, or voice dubbing to any video file — even if they just say "add audio to this video" or "make a narrated version." Also trigger when the user has a screen recording, demo, tutorial, or presentation video that needs a voice track. Trigger on Chinese requests like "视频配音", "给视频加旁白", "录屏解说", "视频加语音", "视频添加声音", "生成视频旁白", "自动配音", "视频解说词".
Writes documentary narration from picture descriptions and research notes. Drafts commentary that adds context without describing what is already visible on screen.