Skill

letmewatch

Extracts key frames from videos using ffmpeg scene detection, transcribes audio with optional whisper, for analyzing screen recordings, bug reports, tutorials, and demos.

Python

FFmpeg

developer-tools

Popularity

Stars

Forks

Invocation

How this skill is triggered — by the user, by Claude, or both

Slash command

/letmewatch:letmewatch

User invocable

Model invocable

Inline context

Default effort

Context Preview

The summary Claude sees in its skill listing — used to decide when to auto-load this skill

Analyze video content by extracting key frames using ffmpeg scene detection and viewing them as images.

Supporting Files

video-extract.pyvideo-extract.sh

SKILL.md

78 lines · ~739 tokens

Stats

LanguagePython

Stars8

Forks1

MaintenanceExcellent

Last CommitJun 13, 2026

Actions

View Source View Plugin View on GitHub View README

Let Me Watch

Analyze video content by extracting key frames using ffmpeg scene detection and viewing them as images.

When to Activate

User asks you to watch, review, or analyze a video
User shares a video file path (.mp4, .mov, .mkv, .webm, .avi)
User wants feedback on a screen recording or bug report video
User runs /letmewatch:video, /letmewatch:video-last, or /letmewatch:video-dir

Prerequisites

ffmpeg (required): brew install ffmpeg (macOS) or apt install ffmpeg (Linux)
whisper (optional, for audio): pip install openai-whisper or pip install mlx-whisper

How It Works

The extraction script uses ffmpeg scene detection to find frames where visuals change significantly (like LogRocket focusing on user interactions)
Each frame is timestamped (frame_01m23s.jpg) so you can reference specific moments
Frames are resized to 720p JPEG to stay within context limits
Audio is transcribed if whisper is installed
You view frames in batches of 8, building a complete understanding of the video

Processing a Video

Step 1: Extract frames

Run the extraction script bundled with this skill:

python3 ${CLAUDE_PLUGIN_ROOT}/skills/letmewatch/video-extract.py "<video_path>"

Read the output to find:

WORK_DIR — where frames are stored
TOTAL_FRAMES — how many frames were extracted
TRANSCRIPT — path to audio transcript (or "none")
FRAMES — list of frame file paths

Step 2: Read transcript (if available)

If TRANSCRIPT is not "none", read the transcript file first for audio/narration context.

Step 3: View frames in batches

Read frames in batches of 8 using the Read tool (all 8 in parallel). For each batch:

Note the timestamp in each filename (e.g., frame_00m23s.jpg = 0 minutes 23 seconds)
Describe what you observe: UI state, user interactions, changes between frames, errors

Step 4: Synthesize

After viewing all frames, provide a timestamped summary. Tailor your response:

Bug/UI review: Identify the issue, when it occurs, suggest fixes
Screen recording: Describe the workflow and any issues spotted
Tutorial/walkthrough: Summarize concepts and key takeaways
General video: Describe content and answer questions

Step 5: Cleanup

Remove the temp directory:

rm -rf <WORK_DIR>

Important Notes

Always reference timestamps: "At 01m15s, the error dialog appears"
If scene detection yields few frames, the script auto-falls back to interval extraction
The scene detection threshold is 0.1 by default (catches most UI changes)
Max 40 frames per video, batched in groups of 8
If frames look similar, note the video was static in that range

letmewatch

Popularity

Invocation

Context Preview

Supporting Files

SKILL.md

letmewatch

Popularity

Invocation

Context Preview

Supporting Files

SKILL.md

Let Me Watch

When to Activate

Prerequisites

How It Works

Processing a Video

Step 1: Extract frames

Step 2: Read transcript (if available)

Step 3: View frames in batches

Step 4: Synthesize

Step 5: Cleanup

Important Notes

Similar Skills

Let Me Watch

When to Activate

Prerequisites

How It Works

Processing a Video

Step 1: Extract frames

Step 2: Read transcript (if available)

Step 3: View frames in batches

Step 4: Synthesize

Step 5: Cleanup

Important Notes

Similar Skills