Automatically produce Chinese-narration video recaps from any video file using AI video understanding, narration script generation, scene cutting, TTS voiceover synthesis, and final assembly with ffmpeg and a MiMo API key.
Assemble a final recap video: mux narration audio over the source video, duck the original audio under the narration, render subtitles (SRT/ASS, optionally burned in), and loudness- normalize. Use as the last stage of the video-recap bundle. Consumes the source video + tts_meta.json (+ narration placement); produces recap_<name>.mp4 + subtitles.srt/.ass. 触发词: 视频合成, 混音, 字幕, 压字幕, assemble video, mux, ducking, subtitles, 成片.
Cut a long video down to selected source ranges (montage / clip assembly). Part of the video-recap bundle: in the orchestrated (two-pass) flow, consumes clip_plan.json + the source video, produces edited_source.mp4; the agent then writes narration.json against the output timeline. When invoked standalone WITHOUT --no-narration-map, also remaps an existing narration.json → narration_mapped.json (legacy single-pass path). 触发词: 视频剪辑, 剪辑式解说, video cut, clip plan, 拼剪.
Generate a Chinese-narration recap video from an input video, end to end. Use when the user gives a video file (.mp4 / .mov / .mkv / .webm) and asks to add narration, generate voiceover, dub, summarize, or produce a recap (短剧 / 电视剧 / 电影 / 纪录片 / 科普). Orchestrates the video-* skill bundle: understanding → (agent writes narration) → cut → voiceover → assemble. 触发词: 视频解说, 视频旁白, 生成解说, 视频recap, video recap, voiceover, narration, auto-dub, recap.
Write a timestamped Chinese narration script (解说词 / 旁白) for an already-analyzed video, then lint/validate it. Use after video-understanding has produced agent_narration_brief.md + vlm_analysis.json, when you need to author the recap narration (style, anti-hallucination, 字数公式, density, hook/throughline). Input: the understanding index in work_dir. Output: narration.json (validated). 触发词: 解说词, 写解说, 视频旁白, narration script, 写稿, 解说文案.
Analyze a video into a structured understanding index: scene detection, ASR transcript, per-scene visual (VLM) analysis, silence windows, a fused timeline, and a narration-writing brief. Use to understand / index / summarize what happens in a video, or as the first stage of the video-recap bundle before writing narration. Input: a video file. Output: scenes.json, asr_result.json, vlm_analysis.json, silence_periods.json, timeline_fusion.json, agent_narration_brief.md. 触发词: 视频理解, 视频分析, 视频索引, video understanding, analyze video, 看懂视频.
Own this plugin?
Verify ownership to unlock analytics, metadata editing, and a verified badge. GitHub access is read-only (username + org membership).
Sign in to claimOwn this plugin?
Verify ownership to unlock analytics, metadata editing, and a verified badge. GitHub access is read-only (username + org membership).
Sign in to claimBased on adoption, maintenance, documentation, and repository signals. Not a security audit or endorsement.
中文 · English
一句话,把视频做成中文解说 recap。 在 Claude Code 里说一声就开跑,本地只要 ffmpeg 加一个小米 MiMo 的 API Key——不用 GPU、不下模型,macOS / Linux / Windows 都能跑。
https://github.com/user-attachments/assets/92698ec6-0d23-4f9f-8825-c3684ef57aff
成片之外,还能一键导出剪映草稿手动精修——原片、解说、BGM、字幕各一轨:
flowchart LR
research["背景调研"] --> understand
video(["视频"]) --> understand["理解<br/>场景·ASR·VLM"] --> script["写稿<br/>Agent"] --> voiceover["配音<br/>MiMo TTS"] --> assemble["组装<br/>混音·字幕"] --> output(["Recap"])
understand -. 剪辑模式:先剪后配 .-> cut["剪辑<br/>先剪成片"] -.-> script
classDef io fill:#eef6ff,stroke:#4f86c6,color:#1f2937;
classDef opt fill:#f3f4f6,stroke:#9ca3af,color:#374151;
class video,output io;
class research,cut opt;
ffmpeg 没别的依赖。background_research.json,VLM 才认得出谁是谁。--edit-mode cut 先把长视频剪成成片,再对着成片写解说,时间轴天然对齐;出稿前还有一道 LLM 评审挑幻觉、钩子和主线。ffmpeg,不装剪映照样出片。① 装插件——对 Claude Code 说:
安装这个插件:https://github.com/worldwonderer/video-recap-skills
② 装 ffmpeg(流水线本身不用 pip install,脚本都是标准库 + PATH 上的 ffmpeg,Python 3.10+):
brew install ffmpeg # macOS
sudo apt install ffmpeg # Debian/Ubuntu
choco install ffmpeg # Windows(或 scoop / winget install ffmpeg)
字幕默认烧进画面,需要带 libass(subtitles 滤镜) 的 ffmpeg——上面这些包基本都自带。如果你的 ffmpeg 没编 libass,开跑前会立刻报错并提示(也可以加 --no-burn-subtitles 输出未遮黑条的 MP4 + .srt 外挂字幕)。用 python3 scripts/recap.py --doctor 自检。
③ 配 MiMo API Key(一个 key 同时驱动 ASR / VLM / TTS,放环境变量、别写进仓库):
export MIMO_API_KEY=your-mimo-key
# tp-* 的 Token-Plan key 会自动连集群,可选 cn | sgp | ams:
export MIMO_TOKEN_PLAN_CLUSTER=cn
按量付费的 sk-* key 默认走 https://api.xiaomimimo.com/v1。其它都有默认值;想分别配 key/URL 或改模型、音色、响度、字幕等,见
配置手册。
把视频丢给它,顺手给点视频背景:
给 /path/to/video.mp4 做个解说。这是《庆余年》第一集,主角是范闲。
它会分析视频、照背景写解说,产出带字幕的 recap_<名>.mp4。想要别的花样,照样一句话:
把 /path/to/long.mp4 剪成十分钟左右的解说短片,字幕压进画面。
背后是编排器把几个阶段串起来跑,中间停下来让 Agent 写解说(剪辑模式会停两次:先写 clip_plan.json 挑片段,剪成成片后再对着成片写 narration.json)。第一次跑前先自检环境:
python3 skills/video-recap/scripts/recap.py --doctor
| Skill | 职责 | 输入 → 输出(work_dir 契约) |
|---|---|---|
| video-understanding | 场景检测 · 抽帧 · ASR(mimo-v2.5-asr)· VLM(mimo-v2.5)· 时间轴融合 · 生成 brief(--consolidate 索引默认开) | 视频 → scenes / asr_result / vlm_analysis / silence_periods / timeline_fusion / agent_narration_brief.md |
| video-script | 写作规则(SKILL.md)+ 评审(LLM 评委)+ lint/校验 | brief + 索引 → narration.json |
| video-cut | 片段计划 → 拼剪成片(剪辑模式先剪后配,解说按成片时间轴写,无需重映射) | clip_plan.json + 视频 → edited_source.mp4 |
| video-voiceover | 合成解说音频(MiMo TTS,mimo-v2.5-tts) | narration.json → tts_segments/ + tts_meta.json |
| video-assemble | 混音 · 压低原声 · 渲染字幕 · 多轨时间线(可选导出剪映) | 视频 + tts_meta → recap_<名>.mp4 + subtitles.srt/.ass + timeline.json |
| video-recap | 编排器 + --doctor | 视频 → recap_<名>.mp4 |
recap_<video>.mp4:成片(固定输出名,每次运行原地覆盖,迭代解说时刷新同一文件)。subtitles.srt(默认烧录字幕,同时产出 subtitles.ass;--no-burn-subtitles 关闭)work_dir/narration.json:解说脚本(narration_lint.json 时间诊断、narration_review.md 评审意见)work_dir/agent_narration_brief.md:给 Agent 的时间和场景 briefwork_dir/vlm_analysis.json · asr_result.json · silence_periods.json · timeline_fusion.json:理解产物work_dir/clip_plan.json · edited_source.mp4 · recap_phase.json:剪辑模式产物(解说在成片时间轴上写,recap_phase.json 记录剪/配进度供断点续跑)work_dir/timeline.json · work_dir/assembly_manifest.json · tts_segments/ · tts_meta.json:多轨时间线、渲染记录与 TTS 音频skills/<skill>/SKILL.md(写作规则在 video-script 的 SKILL.md 里)MIT,见 LICENSE。
npx claudepluginhub worldwonderer/video-recap-skills逆向导入已有小说。将已写好的小说(半成品或完本)反向解析为标准项目目录结构,兼容后续写作流程。
Ultra-compressed communication mode. Cuts ~75% of tokens while keeping full technical accuracy by speaking like a caveman.
Frontend design skill for UI/UX implementation
Comprehensive UI/UX design plugin for mobile (iOS, Android, React Native) and web applications with design systems, accessibility, and modern patterns
Memory compression system for Claude Code - persist context across sessions
Marketing skills for AI agents — conversion optimization, copywriting, SEO, paid ads, ad creative, and growth
Standalone image generation plugin using Nano Banana MCP server. Generates and edits images, icons, diagrams, patterns, and visual assets via Gemini image models. No Gemini CLI dependency required.