From watch
Extracts scene-change frames, pacing metrics, and transcript from video URLs or local paths; produces structured report for editorial analysis.
How this skill is triggered — by the user, by Claude, or both
Slash command
/watch:watch <video-url-or-path> [why you're watching it]<video-url-or-path> [why you're watching it]This skill is limited to the following tools:
The summary Claude sees in its skill listing — used to decide when to auto-load this skill
You don't have a video input; this skill gives you one. A Python script downloads the video, extracts frames as JPEGs (one per detected shot via scene-change), gets a timestamped transcript (native captions first, then Whisper API as fallback), runs editorial pacing metrics, and microscopes the first 10 seconds at higher density. You then `Read` each frame path to see the images, combine them w...
You don't have a video input; this skill gives you one. A Python script downloads the video, extracts frames as JPEGs (one per detected shot via scene-change), gets a timestamped transcript (native captions first, then Whisper API as fallback), runs editorial pacing metrics, and microscopes the first 10 seconds at higher density. You then Read each frame path to see the images, combine them with the transcript to answer the user, fill the structured report.md, and offer to ingest the analysis into Taoufik's Second Brain.
report.md — every watch emits an ingest-shaped report at <workdir>/report.md with TL;DR, key moments, hook breakdown, editorial profile, quotable moments, entities, concepts, and transcript. Narrative sections are emitted as <!-- pending Claude fill: ... --> markers — you fill them in before offering ingest.$VAULT_DIR/CLAUDE.md (if it exists) and run that vault's Ingest op against the report.None of the above add new dependencies — pure ffmpeg + stdlib + the existing Whisper backend.
Steps 4.4 and 4.5 stage the report inside an Obsidian vault so the user can read it where they read everything else. Resolve the vault directory in this order — first hit wins, and the result is what $VAULT_DIR refers to everywhere below:
$WATCH_VAULT_DIR env var — if set and the path exists, use it. This is the user-controlled override.~/Second brain/ — if it exists as a directory.~/Documents/Obsidian/ — if it exists as a directory.~/Obsidian/ — if it exists as a directory.📄 Report (no vault detected): <workdir>/report.md. Suggest they set WATCH_VAULT_DIR if they want auto-ingest.A quick way to resolve it in bash inside the skill:
VAULT_DIR="${WATCH_VAULT_DIR:-}"
if [ -z "$VAULT_DIR" ] || [ ! -d "$VAULT_DIR" ]; then
for candidate in "$HOME/Second brain" "$HOME/Documents/Obsidian" "$HOME/Obsidian"; do
if [ -d "$candidate" ]; then VAULT_DIR="$candidate"; break; fi
done
fi
The vault's URL-name (for the obsidian:// URL scheme in Step 4.4) is the final path component — e.g. $HOME/Second brain → Second brain. URL-encode spaces as %20.
/watch invocation, silent on success)Python interpreter: every python3 ... command in this skill is for macOS/Linux. On Windows, substitute python — the python3 command on Windows is the Microsoft Store stub and will not run the script.
Before every /watch run, verify that dependencies and an API key are in place:
python3 "${CLAUDE_SKILL_DIR}/scripts/setup.py" --check
This is a <100ms lookup. On exit 0, the script emits nothing — proceed to Step 1 without comment. Do NOT announce "setup is complete" to the user — they don't need a status message on every turn. The only acceptable user-visible output from Step 0 is when remediation is required.
On non-zero exit, follow the table:
| Exit | Meaning | Action |
|---|---|---|
2 | Missing binaries (ffmpeg / ffprobe / yt-dlp) | Run installer |
3 | No Whisper API key | Run installer to scaffold .env, then ask user for a key |
4 | Both missing | Run installer, then ask for a key |
The installer is idempotent — safe to re-run:
python3 "${CLAUDE_SKILL_DIR}/scripts/setup.py"
On macOS with Homebrew, it auto-installs ffmpeg and yt-dlp. On Linux/Windows, it prints the exact install commands for the user to run. It scaffolds ~/.config/watch/.env with commented placeholders at 0600 perms, and writes SETUP_COMPLETE=true once deps + a key are in place so the next session knows this user has already been through the wizard.
If an API key is still missing after install: use AskUserQuestion to ask the user whether they have a Groq API key (preferred — cheaper, faster) or an OpenAI key. Then write it into ~/.config/watch/.env — set the matching GROQ_API_KEY=... or OPENAI_API_KEY=... line. If they don't want to set up Whisper, proceed with --no-whisper and tell them videos without native captions will come back frames-only.
Structured mode (optional): python3 "${CLAUDE_SKILL_DIR}/scripts/setup.py" --json emits {status, first_run, missing_binaries, whisper_backend, has_api_key, config_file, platform} where status is one of ready | needs_install | needs_key | needs_install_and_key. Use this when you need to branch on specifics (e.g. "is this the user's very first run?" → first_run: true).
Within a single session, you can skip Step 0 on follow-up /watch calls — once --check returned 0, nothing about the environment changes between turns.
.mp4, .mov, .mkv, .webm, etc.) and asks about it./watch <url-or-path> [question].Step 1 — parse the user input. Separate the video source from any question the user asked. The question (or the user's prior stated interest) IS the intent — pass it to the script via --intent. Example: /watch https://youtu.be/abc what's the hook pattern? → source = https://youtu.be/abc, intent = what's the hook pattern?. If no question is given, use a brief inferred intent ("general summary") so the report's TL;DR has a lens. The intent shapes how the report's TL;DR and entity/concept sections get filled at Step 4 — same video with intent "pricing tactics" vs "editing style" produces different reports.
Step 2 — run the watch script. Pass the source verbatim. Do not shell-escape it yourself beyond normal quoting:
python3 "${CLAUDE_SKILL_DIR}/scripts/watch.py" "<source>" --intent "<intent string>"
Pass --intent whenever you have any signal from the user about why they want this video — the question they asked, a stated goal, or a brief inferred summary. Empty --intent works but produces less-targeted report sections.
Optional flags:
--start T / --end T — focus on a section. Accepts SS, MM:SS, or HH:MM:SS. When either is set, fps auto-scales denser (see "Focusing on a section" below).--max-frames N — lower the cap for tighter token budget (e.g. --max-frames 40)--resolution W — change frame width in px (default 512; bump to 1024 only if the user needs to read on-screen text)--fps F — override auto-fps (clamped to 2 fps max). Setting --fps disables scene-change sampling.--out-dir DIR — keep working files somewhere specific (default: an auto-generated tmp dir)--whisper groq|openai — force a specific Whisper backend (default: prefer Groq if both keys exist)--no-whisper — disable the Whisper fallback entirely (frames-only if no captions)--no-scene-change — force uniform frame sampling (debug only; usually leave on)--no-hook-microscope — skip the 0-10s dense pass (saves ~1 Whisper call)When the user asks about a specific moment — "what happens at the 2 minute mark?", "zoom into 0:45 to 1:00", "the first 10 seconds" — pass --start and/or --end. The script switches to focused-mode budgets, which are denser than full-video budgets (still capped at 2 fps):
Focused mode is the right call for:
Transcript is auto-filtered to the same range. Frame timestamps are absolute (real video timeline, not offset-from-start).
Examples:
# Last 10 seconds of a 1 minute video
python3 "${CLAUDE_SKILL_DIR}/scripts/watch.py" video.mp4 --start 50 --end 60
# Zoom into 2:15 → 2:45 at 3 fps (90 frames)
python3 "${CLAUDE_SKILL_DIR}/scripts/watch.py" "$URL" --start 2:15 --end 2:45 --fps 3
# From 1h12m to the end of the video
python3 "${CLAUDE_SKILL_DIR}/scripts/watch.py" "$URL" --start 1:12:00
Step 3 — Read every frame path the script lists. The Read tool renders JPEGs directly as images for you. Read all frames in a single message (parallel tool calls) so you see them together. The frames are in chronological order with a t=MM:SS timestamp so you can align them to the transcript.
Step 4 — answer the user, then fill the report. You now have three streams of evidence:
report.md — structured artifact at <workdir>/report.md with <!-- pending Claude fill: ... --> markersFirst, answer the user's question in chat citing timestamps.
Then, fill in the pending markers in report.md using the Edit tool. Walk every <!-- pending Claude fill: ... --> in order:
[[wikilink]] style.The fully-filled report.md is what gets ingested at Step 4.5. Do not skip the fill — empty markers won't ingest cleanly.
Step 4.4 — Stage to the Obsidian vault and open in Obsidian (when a vault is detected). After filling every marker, resolve $VAULT_DIR per the Configuration section. If no vault is detected, skip this step and emit 📄 Report (no vault detected): <workdir>/report.md in chat instead.
When $VAULT_DIR resolves:
report.md frontmatter, slugify (lowercase, ASCII-only, hyphens, max 60 chars), append -YYYY-MM-DD. Example: karpathy-claude-md-43k-installs-2026-05-24.mkdir -p "$VAULT_DIR/raw/watched/<slug>".report.md + every hero frame (filenames in the report frontmatter under hero_frames:) into that dir. The report MUST live inside the vault for Obsidian to open it.$VAULT_DIR with spaces URL-encoded as %20:
VAULT_NAME=$(basename "$VAULT_DIR" | sed 's/ /%20/g')
open "obsidian://open?vault=${VAULT_NAME}&file=raw/watched/<slug>/report.md"
The file= value is the path relative to the vault root, no leading slash. Don't ask permission — the user has already opted in by running /watch.📄 Report (open in Obsidian): raw/watched/<slug>/report.md. So if Obsidian was closed / the URL handler missed, the user can still navigate to it manually inside the vault.Rationale: the report is the leverage point of /watch. If the user reads everything in Obsidian, opening in Preview or VS Code defeats the purpose. Staging at 4.4 also means Step 4.5's "Yes / Stage" branches are no-ops on the copy step (the file is already in the vault); they only differ in whether the Ingest op runs.
Cleanup implication for Step 4.5: if the user picks "No, drop it" at 4.5 AND a vault was staged at 4.4, ALSO rm -rf "$VAULT_DIR/raw/watched/<slug>" since we pre-staged. Do NOT drop the vault copy if they picked Yes or Stage.
Step 4.5 — Offer ingest into the Obsidian vault. Skip this step entirely if no vault was detected at Step 4.4. Otherwise use AskUserQuestion once, with these options (do NOT skip if a vault was found unless the user explicitly said "don't ingest" before /watch ran):
Question: "Want to ingest this into your Obsidian vault?"
- Yes — same angle ("")
- Yes — different angle (user specifies in the notes field)
- Stage to
raw/watched/for later- No, drop it
Routing based on response:
A. Yes (same or different angle):
report.md frontmatter, slugify (lowercase, ASCII-only, hyphens, max 60 chars), append -YYYY-MM-DD. Example: me-at-the-zoo-2026-05-24.$VAULT_DIR/raw/watched/<slug>/ (Step 4.4 already created it).$VAULT_DIR/CLAUDE.md exists: Read it to refresh the Ingest op definition — that file is authoritative; this skill must not duplicate its steps. Execute the Ingest op against raw/watched/<slug>/report.md exactly as $VAULT_DIR/CLAUDE.md defines it.$VAULT_DIR/CLAUDE.md exists: Run a generic ingest — read the report, identify entities + concepts, append a one-line entry to $VAULT_DIR/log.md (create if missing), and tell the user the report is staged at raw/watched/<slug>/report.md and they can wire up an Ingest op of their own.log.md entry written.B. Stage to raw/watched/ for later:
log.md.$(basename $VAULT_DIR)/raw/watched/<slug>/. Run an Ingest op against it when you're ready."C. No, drop it: proceed to Step 5 (cleanup) — and per the cleanup-implication note in Step 4.4, rm -rf "$VAULT_DIR/raw/watched/<slug>" to undo the pre-staging.
The "different angle" path is what makes /watch truly plug-and-play — the user can watch a video for one reason, then on the way out decide it's actually more useful for a different concept, and the resulting wiki entry reframes accordingly.
Step 5 — clean up. The script prints a working directory at the end. If you ingested (Step 4.5 path A), the hero frames + report.md are already copied to Second Brain — you can rm -rf the original workdir. If you staged (path B), same — the workdir copy is no longer needed. If the user picked "no, drop it" (path C) and isn't going to ask follow-ups, delete with rm -rf <dir>. If they might ask follow-ups, leave it in place.
The script gets a timestamped transcript in one of two ways:
ffmpeg -vn -ac 1 -ar 16000 -b:a 64k, ~0.5 MB/min) and uploads it to whichever Whisper API has a key configured:
whisper-large-v3. Preferred default: cheaper, faster. Get a key at console.groq.com/keys.whisper-1. Fallback. Get a key at platform.openai.com/api-keys.Both keys live in ~/.config/watch/.env. The script prefers Groq when both are set; override with --whisper openai to force OpenAI. Use --no-whisper to skip the fallback entirely.
python3 "${CLAUDE_SKILL_DIR}/scripts/setup.py" (auto-installs ffmpeg/yt-dlp via brew on macOS, scaffolds the .env). For API key, ask the user via AskUserQuestion and write it to ~/.config/watch/.env.--start/--end rather than a sparse full-video scan.--whisper openai if Groq failed (or vice versa).<!-- pending Claude fill: ... --> markers → you skipped Step 4. Go back, read the report, fill every marker via Edit, then offer ingest. Never ingest a half-filled report — the Second Brain Ingest op will produce sparse/wrong entity pages.raw/watched/<slug>/, and they can re-run by saying "ingest the staged report at <slug>".This skill burns tokens primarily on frames. Order of magnitude:
--resolution to 1024 roughly quadruples the image tokens per frame. Only do it when necessary.If you already watched a video this session and the user asks a follow-up, do not re-run the script — you already have the frames and transcript in context. Just answer from what you have.
What this skill does:
yt-dlp locally to download the video and pull native captions when the source supports them (public data; the request goes directly to whatever host the URL points at)ffmpeg / ffprobe locally to extract frames as JPEGs and, when Whisper is needed, a mono 16 kHz audio clipapi.groq.com/openai/v1/audio/transcriptions) when GROQ_API_KEY is set (preferred — cheaper, faster)api.openai.com/v1/audio/transcriptions) when OPENAI_API_KEY is set and Groq is not, or when --whisper openai is forced--out-dir if specified) so Claude can Read them~/.config/watch/.env (mode 0600) to store the Whisper API key(s) and a SETUP_COMPLETE marker. As a fallback, also reads .env in the current working directory$VAULT_DIR/CLAUDE.md at orchestration time (only when ingest is requested and the file exists) to follow that vault's Ingest operation definitionreport.md plus copies of hero frames into $VAULT_DIR/raw/watched/<slug>/ when a vault is detected at Step 4.4$VAULT_DIR/wiki/ (entities, concepts, sources, index.md) and appends to $VAULT_DIR/log.md — following the actions defined by the vault's Ingest op (or a generic fallback if no CLAUDE.md is present)What this skill does NOT do:
--no-whisperapi.groq.com, OpenAI key only goes to api.openai.com)~/.config/watch/.env (and Second Brain when ingest is consented to) — clean up the working directory when you're done (Step 5)Bundled scripts: scripts/watch.py (entry point), scripts/download.py (yt-dlp wrapper), scripts/frames.py (ffmpeg uniform + scene-change extraction + hero selection), scripts/pacing.py (editorial metrics), scripts/hook.py (0-10s microscope), scripts/report.py (structured report emitter), scripts/transcribe.py (caption selection + Whisper orchestration), scripts/whisper.py (Groq / OpenAI clients, supports word-level timestamps), scripts/setup.py (preflight + installer)
Review scripts before first use to verify behavior.
npx claudepluginhub taoufik123-collab/claude-watch --plugin watchDownloads videos from YouTube, Instagram, X/Twitter, Vimeo, TikTok, or local paths, extracts frames and transcripts (via captions or on-device mlx-whisper), and lets Claude answer questions about the video content.
Ingests video/audio from files, URLs, RTSP feeds, or desktop capture; indexes visual/spoken content for search; transcodes, edits timelines, generates assets, and creates real-time alerts.
Analyzes product walkthroughs, bug report videos, Loom, or ScreenPal recordings into a durable brief with transcript, key frames, issues, and next steps.