/watch
Give Claude the ability to watch any video.
Paste a URL or a local file and Claude watches it — scene-change frame extraction (one frame per cut instead of every-N-seconds), a 0-10s hook microscope (dense frames + word-level Whisper on the opening, where every video earns or loses your attention), and optional Obsidian auto-save so a watched video becomes a connected wiki entry without copy-paste.
Claude Code:
/plugin marketplace add taoufik123-collab/claude-watch
/plugin install watch@claude-watch
claude.ai (web): download watch.skill and drop it into Settings → Capabilities → Skills.
Codex / generic skills:
git clone https://github.com/taoufik123-collab/claude-watch.git ~/.codex/skills/watch
Zero config to start — yt-dlp and ffmpeg install on first run via brew on macOS (Linux/Windows print exact commands). Captions cover most public videos for free. Whisper API key is only needed when a video has no captions. Set $WATCH_VAULT_DIR to point at your Obsidian vault for auto-save, or leave it unset and the skill skips the ingest step quietly.
What's inside
- Scene-change frame extraction —
scripts/frames.py grabs one frame per detected shot via ffmpeg's select=gt(scene,...), not a uniform tick every N seconds. Token cost stays flat on long videos because the frame count is bounded by the number of cuts, not the duration.
- 0-10s hook microscope —
scripts/hook.py runs a denser 2 fps pass on the opening 10 seconds plus a word-level Whisper transcript, so the report tells you what was on screen as each word landed. The first 10 seconds is where every video either earns your attention or loses it.
- Structured
report.md with Claude-fill markers — scripts/report.py emits a fixed-schema report (TL;DR, key moments, hook breakdown, editorial profile, quotable moments, entities, concepts, transcript) where narrative sections are explicit <!-- pending Claude fill: ... --> markers. Claude has a job-list to walk before ingest, not a blank doc.
- Optional Obsidian auto-save — Step 4.4 stages the report into
$VAULT_DIR/raw/watched/<slug>/ and opens it via the obsidian:// URL scheme. Step 4.5 offers ingest into the vault's wiki. Both steps skip cleanly when no vault is detected. Vault path is resolved from $WATCH_VAULT_DIR or auto-detected from ~/Second brain/, ~/Documents/Obsidian/, ~/Obsidian/.
The core pipeline — yt-dlp download, ffmpeg frames, Groq/OpenAI Whisper backends, the --start/--end focused mode, the SessionStart hook, the multi-surface install — comes from the original claude-video project and works unchanged (see Credits).
Claude can read a webpage, run a script, browse a repo. What it can't do, out of the box, is watch a video. You paste a YouTube link and it has to either guess from the title or pull a transcript that's missing 90% of what's on screen.
With Claude Video /watch you can paste a URL or a local path, ask a question, and Claude downloads the video, extracts frames at an auto-scaled rate, pulls a timestamped transcript (free captions when available, Whisper API as fallback), and Reads every frame as an image. By the time it answers, it has seen the video and heard the audio.
/watch https://youtu.be/dQw4w9WgXcQ what happens at the 30 second mark?
Why this exists
I built this because I'm constantly using video to keep up with content. If I see a YouTube video that's blowing up, I want to know how the creator structured the hook — what's on screen in the first 3 seconds, what they said, why it worked. That used to mean watching it myself with a notepad. Now I just paste the URL and ask.
The other half is summarization. Most YouTube videos don't deserve 20 minutes of my attention. I hand the URL to Claude, it pulls the transcript, and tells me what actually happened. If the visual matters, frames come along too. If it's a podcast or a talking head, transcript is enough.
Claude is great at reading and synthesizing — but until now, video was the one input I couldn't hand it. Pasting a YouTube link got you nothing useful. /watch closes that gap.
What people actually use it for
Analyze someone else's content. /watch https://youtu.be/<viral-video> what hook did they open with? Claude looks at the first frames, reads the opening transcript, breaks down the structure. Same for ad creative, competitor launches, podcast intros, anything where the how matters as much as the what.
Diagnose a bug from a video. Someone sends you a screen recording of something broken. /watch bug-repro.mov what's going wrong? Claude watches the recording, finds the frame where the issue appears, describes what's on screen, often catches the cause without you ever opening the file.