By sixtusagbo
Lets Claude watch videos by turning them into a smart set of frames plus optional OCR text and optional audio transcript.
A Claude Skill that lets Claude reason about what's happening in a video — silent or narrated.
Most "let Claude watch a video" workflows mean running ffmpeg by hand, extracting a wall of frames at a fixed interval, and dragging them into the chat. That wastes tokens on stretches of identical screens and skips the moments that actually matter. This Skill does the picking for you: scene-change detection plus a hard cap on frame count, with optional OCR for on-screen text and optional whisper.cpp transcription for the audio track. Frames and transcript share timestamps, so Claude can pair "what's on screen" with "what's being said".
select='gt(scene,X)' filter emits a frame only when the visual delta crosses a threshold. For UI walkthroughs and screen recordings that drops the frame count by an order of magnitude versus naive sampling.manifest.json lists every frame with its timestamp, path, and OCR text, plus a transcript array when transcription ran. Claude reads the manifest first and pulls individual frames on demand.In Claude Code:
/plugin marketplace add sixtusagbo/video-watcher
/plugin install video-watcher@video-watcher
Or, to use it manually without the plugin system:
git clone https://github.com/sixtusagbo/video-watcher /tmp/video-watcher
cp -r /tmp/video-watcher/skills/video-watcher ~/.claude/skills/
Dependencies:
ffmpeg (required) — brew install ffmpegtesseract (optional, OCR) — brew install tesseractyt-dlp (optional, URL inputs) — brew install yt-dlpwhisper-cpp (optional, audio transcription) — brew install whisper-cpp, plus a ggml model (see below)No npm runtime dependencies.
--transcribe looks for a model at $WHISPER_MODEL_PATH, then ~/.cache/whisper/ggml-base.en.bin. To grab the default:
mkdir -p ~/.cache/whisper
curl -L -o ~/.cache/whisper/ggml-base.en.bin \
https://huggingface.co/ggerganov/whisper.cpp/resolve/main/ggml-base.en.bin
You can also point at any other ggml model with --whisper-model <path>.
From a Claude session:
Watch ./demo.mp4 and tell me what error appears.
The Skill runs the pipeline and produces a manifest plus a frames/ directory. Claude reads them from there.
Direct CLI use:
node scripts/watch.mjs ./demo.mp4
node scripts/watch.mjs ./demo.mp4 --max-frames 20 --scene 0.2 --ocr --transcribe
Flags:
--out <dir> output directory (default ./out)--max-frames <n> cap on frames in the manifest (default 40)--scene <0..1> scene-change threshold (default 0.3)--ocr run tesseract over each frame and attach text to the manifest--transcribe transcribe audio with whisper.cpp and attach a timestamped transcript to the manifest--whisper-model <path> explicit ggml model path (overrides WHISPER_MODEL_PATH and the default location)out/
frames/
000.png
001.png
...
manifest.json
manifest.json:
{
"video": "demo.mp4",
"duration_sec": 87.4,
"frame_count": 12,
"frames": [
{ "timestamp": "00:00.000", "path": "frames/000.png", "ocr": "Welcome screen\nGet started" },
{ "timestamp": "00:04.512", "path": "frames/001.png", "ocr": "..." }
],
"transcript": [
{ "timestamp": "00:00.000", "start": 0.0, "end": 4.5, "text": "Welcome. Let me show you the dashboard." },
{ "timestamp": "00:04.500", "start": 4.5, "end": 9.2, "text": "First, click the gear icon..." }
]
}
Early.
MIT.
Own this plugin?
Verify ownership to unlock analytics, metadata editing, and a verified badge. GitHub access is read-only (username + org membership).
Sign in to claimOwn this plugin?
Verify ownership to unlock analytics, metadata editing, and a verified badge. GitHub access is read-only (username + org membership).
Sign in to claimBased on adoption, maintenance, documentation, and repository signals. Not a security audit or endorsement.
npx claudepluginhub sixtusagbo/video-watcher --plugin video-watcherUltra-compressed communication mode. Cuts ~75% of tokens while keeping full technical accuracy by speaking like a caveman.
Frontend design skill for UI/UX implementation
Comprehensive UI/UX design plugin for mobile (iOS, Android, React Native) and web applications with design systems, accessibility, and modern patterns
Memory compression system for Claude Code - persist context across sessions
Marketing skills for AI agents — conversion optimization, copywriting, SEO, paid ads, ad creative, and growth
Standalone image generation plugin using Nano Banana MCP server. Generates and edits images, icons, diagrams, patterns, and visual assets via Gemini image models. No Gemini CLI dependency required.