From watch-video
This skill should be used when the user provides a path to a video file (.mp4, .mov, .webm, .mkv, .avi) and asks Claude to watch, describe, summarize, debug, verify, or analyze its contents. Common cases include screen-recordings of bug reproductions, mobile app demos, tutorials, UI walkthroughs, and short clips. The skill extracts JPEG frames with ffmpeg and instructs Claude to read them through the native Read tool. It auto-handles filenames containing spaces, colons, or other special characters that break ad-hoc ffmpeg commands. Requires ffmpeg on PATH.
How this skill is triggered — by the user, by Claude, or both
Slash command
/watch-video:watch-videoThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
Convert a video file into a sequence of JPEG frames so Claude can analyze its visual content through the native `Read` image tool. Claude cannot read video files directly, but it can read images — this skill bridges the gap.
Convert a video file into a sequence of JPEG frames so Claude can analyze its visual content through the native Read image tool. Claude cannot read video files directly, but it can read images — this skill bridges the gap.
Invoke when any of these match:
.mp4, .mov, .webm, .mkv, .avi, or similar file and asks Claude to look at it, watch it, describe it, summarize it, verify a fix in it, or check what happens in it.Do NOT invoke for: live streams, audio-only files, GIFs (use Read directly), or videos already provided as still frames.
ffmpeg and ffprobe must be available on PATH:
brew install ffmpeg # macOS
sudo apt-get install ffmpeg # Debian/Ubuntu
The script verifies this and emits a clear error if missing.
scripts/extract_frames.sh <video_path> — pass the path verbatim, including spaces and special characters. The script copies the file to a safe temp path internally.output_dir, frames[], frame_count, video_duration_seconds, and strategy.frame_count is large (>30), sanity-check that a smaller frame set isn't sufficient before reading them all — each Read of a JPG consumes vision tokens.frames[] with the Read tool. The frames are timestamp-ordered: frame_001.jpg is earliest, the highest-numbered frame is latest.index * (1 / effective_fps) for fps mode, or use the visible content for scene-detection mode).rm -rf "$OUTPUT_DIR" after the analysis is complete and reported.With no flags, the script picks an adaptive frame rate based on duration so the total stays in a useful range (~8–25 frames):
| Video duration | Default fps | Approx frames |
|---|---|---|
| ≤ 10s | 1.0 | up to 10 |
| ≤ 30s | 0.5 | up to 15 |
| ≤ 60s | 0.333 | up to 20 |
| ≤ 2 min | 0.2 | up to 24 |
| ≤ 5 min | 0.125 | up to ~37 |
| > 5 min | 0.0667 | depends |
Frames are downscaled to 590px wide by default — readable at mobile-screenshot scale without flooding the context window.
| Flag | Purpose | Example |
|---|---|---|
--max-frames N | Cap total frames to N (auto-calculates fps) | --max-frames 20 |
--fps F | Force a specific frame rate | --fps 1 |
--scene-threshold T | Extract one frame per detected scene change. T in 0.0–1.0; lower = more sensitive | --scene-threshold 0.3 |
--width W | Output width in pixels (height auto-scaled) | --width 800 |
--quality Q | JPEG quality 1 (best) – 31 (worst), default 4 | --quality 2 |
--output-dir D | Custom output directory (default: mktemp -d) | --output-dir /tmp/myframes |
scripts/extract_frames.sh "/path/to/ScreenRecording.mp4"
Auto-mode handles it. Read all frames; describe what changes between them.
scripts/extract_frames.sh tutorial.mp4 --scene-threshold 0.3
Returns one frame per visual scene change (e.g., slide transitions). Best for presentations, edited footage.
scripts/extract_frames.sh screencast.mp4 --width 1180 --quality 1 --max-frames 25
Higher resolution + best JPEG quality so small UI text remains legible.
scripts/extract_frames.sh long.mp4 --max-frames 12
The script computes an fps that yields ~12 evenly-spaced frames regardless of duration.
The script prints a single JSON object on stdout. Example:
{
"output_dir": "/tmp/watch-video-frames.AbC123",
"video_duration_seconds": 19.67,
"strategy": "auto",
"effective_fps": 0.5,
"frame_width": 590,
"frame_height": 1278,
"frame_count": 10,
"frames": [
"/tmp/watch-video-frames.AbC123/frame_001.jpg",
"/tmp/watch-video-frames.AbC123/frame_002.jpg"
]
}
On failure, a JSON object with an "error" key prints to stderr and the process exits with code 1.
ffprobe <file> directly to see ffmpeg's diagnostics.--scene-threshold, the threshold is too high; lower it (try 0.2). Otherwise the video may be very short — pass --fps 2 to force extraction.--width (try 1180 for iPhone-portrait recordings) and lower --quality (try 1 or 2).--max-frames 15 to cap.e2 80 af) between the time and "AM"/"PM" — it looks like a regular space but isn't. Get the path with shell completion or find -print0 | xargs -0 rather than typing it; that preserves the exact bytes.Creates, edits, and optimizes skills for Claude Code, including drafting, evaluating with test prompts, iterating on performance, and improving skill descriptions for better triggering accuracy.
npx claudepluginhub terminusapp-ai/watch-video --plugin watch-video