From video-watcher
Watch a local video file or a video URL and turn it into a smart set of frames plus a timestamped transcript that Claude can read. Works on silent screen recordings as well as narrated videos. Use this when the user asks you to look at, analyze, summarize, or answer questions about a video. Runs ffmpeg with scene-change detection for frames, and optionally tesseract for on-screen text and whisper.cpp for audio.
How this skill is triggered — by the user, by Claude, or both
Slash command
/video-watcher:video-watcherThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
A pipeline that takes a video and produces a manifest of meaningful frames plus an on-disk `frames/` directory. Use it whenever the user wants you to understand a video.
A pipeline that takes a video and produces a manifest of meaningful frames plus an on-disk frames/ directory. Use it whenever the user wants you to understand a video.
.mp4, .mov, .webm, .mkv) and asks you to watch, summarize, or answer questions about it.yt-dlp).Do not use this skill for audio-only content or for questions that don't require seeing the video.
Run the pipeline from the directory that contains the skill:
node scripts/watch.mjs <path-to-video-or-url>
URLs (e.g. YouTube links) are downloaded via yt-dlp into out/source/ first, then processed.
Useful flags:
--out <dir> output directory (default ./out)--max-frames <n> cap on frames (default 40)--scene <0..1> scene-change threshold (default 0.3 — lower for slow UI videos, higher for action-heavy footage)--ocr run tesseract on each frame and attach the recognized text to the manifest (skipped if tesseract is not installed)--transcribe transcribe the audio track with whisper.cpp and add a transcript array of timestamped segments to the manifest (skipped if the video has no audio, or if whisper-cli / the model are not installed)Read out/manifest.json. It lists every frame with its timestamp and relative path, plus a transcript array if --transcribe ran.
Read individual frames from out/frames/NNN.png using the image-capable Read tool. Do not load every frame at once — pick the ones whose timestamps are relevant to the user's question. The manifest is small; the frames are not.
If a transcript is present, pair it with frames by timestamp. The transcript tells you what is being said; the frames tell you what is on screen. Quote timestamps when you reference specific moments.
If the user pastes their own transcript or captions, treat that as authoritative over the auto-generated one.
--scene (try 0.15).--max-frames rather than --scene.--ocr so on-screen text reaches you cheaply via the manifest rather than through vision.ffmpeg must be on PATH.
tesseract (optional, for --ocr): brew install tesseract.
yt-dlp (optional, for URL inputs): brew install yt-dlp.
whisper-cpp (optional, for --transcribe): brew install whisper-cpp, then download a ggml model and either place it at ~/.cache/whisper/ggml-base.en.bin or point at it via WHISPER_MODEL_PATH or --whisper-model. Example:
mkdir -p ~/.cache/whisper
curl -L -o ~/.cache/whisper/ggml-base.en.bin \
https://huggingface.co/ggerganov/whisper.cpp/resolve/main/ggml-base.en.bin
Creates, edits, and optimizes skills for Claude Code, including drafting, evaluating with test prompts, iterating on performance, and improving skill descriptions for better triggering accuracy.
npx claudepluginhub sixtusagbo/video-watcher --plugin video-watcher