Smart-cut a monologue / talking-head video on macOS (Apple Silicon) by auto-transcribing with mlx-qwen3-asr, having Claude identify filler words (嗯/啊/呃/那个), repetitions, restarts, and long pauses, then showing the user an interactive review page (like 剪映 智能剪口播) to approve cuts before ffmpeg produces a clean video + synced SRT + clean transcript. Use this skill whenever the user wants to clean up a monologue, lecture, podcast-style video, or "口播视频" by removing filler / stumbles / dead air — covers phrases like "帮我剪这个视频", "清理一下口癖", "口播清理", "去口癖", "smart cut", "自动剪辑口播", "像剪映那样剪口播", "remove filler from this video". Only works on Apple Silicon macOS.
How this skill is triggered — by the user, by Claude, or both
Slash command
/podcast-video-toolkit:cut-fillersThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
A skill for one-command cleanup of monologue videos. The user hands you a video, you hand back a tighter version of the same video with filler words, pauses, and restarts removed — always with a human-in-the-loop review step before any cutting happens.
A skill for one-command cleanup of monologue videos. The user hands you a video, you hand back a tighter version of the same video with filler words, pauses, and restarts removed — always with a human-in-the-loop review step before any cutting happens.
Videos that are primarily one person talking to camera: course recordings, video essays, product demos, tutorial voice-overs, podcasts with video, 小红书/抖音 口播, YouTube monologues. Not suitable for: multi-speaker interviews (v1), music videos, scripted drama, anything where pauses are intentional pacing.
Hard requirements:
If either is violated, say so and stop rather than half-working.
Before anything, run scripts/preflight.sh. It verifies / helps install:
uname -m == arm64). If not, abort.ffmpeg and ffprobe on PATH. If missing, ask the user before running brew install ffmpeg.uv on PATH. If missing, ask the user before running brew install uv — don't silently install.mlx-qwen3-asr installed as a uv tool (isolated env, no global Python pollution). If missing, ask the user before running uv tool install mlx-qwen3-asr. Warn the user that the first transcription will download the Qwen3 model weights (several GB).All later scripts assume these prerequisites pass. If any step can't continue, report the exact failing check to the user — don't improvise a fallback.
The full flow is five stages. Walk through them in order. Each stage writes into a per-video work directory so you can resume if interrupted:
<video-dir>/smart-cut/<video-stem>/
├── transcript.json # mlx-qwen3-asr word-level output
├── transcript.srt # original subtitle
├── silence.json # ffmpeg silencedetect ranges
├── suggestions.json # Claude's proposed cuts
├── cuts.json # user-confirmed cuts (written by review server)
├── <stem>_cut.mp4 # final output
├── <stem>_cut.srt
└── <stem>_cut.txt
If a later stage's output already exists and the user hasn't asked to redo, skip that stage.
Run bash scripts/preflight.sh. Stop on any failure.
Run bash scripts/transcribe.sh <video-path>. This:
uv tool run mlx-qwen3-asr <video> -f json --timestamps -o <workdir>/ → transcript.jsonffmpeg -af silencedetect=noise=-30dB:d=0.6 and parses stderr → silence.json (ranges of media-level silence ≥ 0.6s)Read transcript.json and silence.json. Group words into phrase-level segments (split on sentence terminators and >400ms gaps). For each segment decide whether it should be a cut candidate and why. Write suggestions.json with this exact shape:
{
"video": "/abs/path/to/source.mp4",
"duration": 423.12,
"suggestions": [
{
"start": 12.340,
"end": 12.780,
"text": "嗯",
"reason": "filler",
"aggressiveness": "safe"
}
]
}
Classification rules (use theory of mind — these are heuristics not laws):
filler + safe: standalone 嗯 / 啊 / 呃 / 唉 / um / uh / er shorter than 800ms with speech before and afterpause + safe: silence ≥ 1.2s confirmed by both ASR gap and silence.json; trim to leave 200ms of breathing room on each siderepeat + moderate: speaker says the same phrase twice in a row — cut the earlier/incomplete one, keep the cleaner one. Be conservative: only mark when the repeated text ≥80% overlaps.restart + moderate: false starts cut off mid-phrase (common pattern: short fragment + self-correction). Look for "我想说... 我是说..." style.padding + aggressive: vague connectors like 然后那个/就是说/anyway, so, right — flag but do NOT default-check these; let the user decide.Never propose cuts that would leave <300ms of kept audio between them (the review tool will merge such slivers, but prefer not to generate them).
Run python3 scripts/review_server.py <workdir>. This:
http://127.0.0.1:<port>/review.html in the user's browsertranscript.json, and suggestions.json to the page/confirm with the user's final cut selection → writes cuts.json → shuts itself down and exitsThe page lets the user play the video, see every suggested segment with its reason tag, tick/untick each one, and click Confirm & Cut. Until they click, nothing is cut.
Tell the user plainly: "The review page is open in your browser. Tick the segments you want removed, then click Confirm & Cut. I'm waiting for you here."
Do not proceed until the script exits normally (exit code 0 = user confirmed; non-zero = user cancelled or closed the server).
Run python3 scripts/cut_video.py <workdir>. This:
cuts.json and normalizes the delete intervals (sort, clamp, merge overlaps, drop <100ms slivers)trim/atrim + setpts/asetpts + concat=v=1:a=1 — precise frame-accurate re-encode, not -c copy (keyframe snapping would be visible)+faststartffprobe matches original_duration − sum(deleted) within 2 framesReport the final paths and the time saved (e.g. "原 7:03 → 5:18, 删除 1:45").
transcribe.sh will fail loudly. Report and stop.review_server.py exits non-zero. Don't retry automatically — ask the user whether they want to re-open the review or abort.See references/workflow.md for deeper notes on ffmpeg filter choices, VFR handling, and SRT re-timing.
npx claudepluginhub pierrelzw/podcast-video-toolkit --plugin podcast-video-toolkitCreates, edits, and optimizes skills for Claude Code, including drafting, evaluating with test prompts, iterating on performance, and improving skill descriptions for better triggering accuracy.