By devinilabs
Turn any video URL or local file into structured markdown study notes with section-by-section breakdown, embedded screenshots, and a timestamped transcript, optionally focused on a specific topic or question.
Turn any tutorial or lecture video into structured study notes. Paste a URL, walk away, come back to a markdown file with embedded screenshots, timestamped transcript, and Claude's synthesis — saved to a persistent library.
/claude-watch https://youtu.be/<lecture> backprop intuition
| Surface | Command |
|---|---|
| Claude Code | /plugin marketplace add devinilabs/claude-watch then /plugin install claude-watch@claude-watch |
| claude.ai (web) | Download claude-watch.skill from the latest release → Settings → Capabilities → Skills → + |
| Codex | git clone https://github.com/devinilabs/claude-watch ~/.codex/skills/claude-watch |
yt-dlp (or accepts a local file).ffmpeg. Inserts coverage-floor frames every 45s across long static gaps so a lecture with one slide for 5 minutes still gets ~7 frames, not 1.Reads every frame as an image and writes notes.md to a strict template:
## TLDR — 3-4 sentence synthesis## Key Concepts — bulleted with timestamps## Notes — one section per scene with embedded screenshot, on-screen text, what was said, Claude's synthesis## Code & Commands — every code-on-screen frame transcribed into a runnable fenced block## Diagrams Referenced, ## Open Questions~/claude-watch/library/<slug>/ — re-running the same URL is a cache hit.claude-video's uniform frame sampling spends the budget poorly on long lectures with slow-changing slides. And answers live in chat, so you can't go back to "the notes from that video." claude-watch is opinionated for the tutorial workflow: scene-aware frames, persistent library, structured notes file.
/claude-watch <url-or-path> [topic]
/claude-watch ~/Lectures/cs231n.mp4 backpropagation derivation
/claude-watch https://youtu.be/<long> --start 5:00 --end 25:00
/claude-watch <url> --resolution 1024 # for slides with tiny code text
Flags: --start/--end, --max-frames, --resolution, --scene-threshold, --max-gap, --whisper groq|openai, --no-whisper, --out-dir.
Captions cover the majority of public videos for free. Whisper only kicks in when a video has no caption track.
| Need | Cost |
|---|---|
| Download + native captions | free (yt-dlp + ffmpeg) |
| Whisper fallback (preferred) | Groq whisper-large-v3 — cheap, fast |
| Whisper fallback (alt) | OpenAI whisper-1 |
| Disable Whisper | --no-whisper (frames-only when no captions) |
Keys go in ~/.config/claude-watch/.env (mode 0600).
The library is keyed on slug = YYYY-MM-DD-<title>-<short-hash> where the short hash is sha1(source + focus_range)[:4]. Re-running the same URL with the same focus range hits the cache — no re-download, no re-transcribe, only frames + notes regenerate. Different focus range = different slug = a separate notes file.
To force a fresh run, delete the meta.json in the library dir.
--start/--end to focus.--max-frames (token cost grows linearly).git clone https://github.com/devinilabs/claude-watch
cd claude-watch
python3 -m pytest # full suite
bash scripts/build-skill.sh # → dist/claude-watch.skill (claude.ai bundle)
Releasing: tag vX.Y.Z, push the tag — CI builds and attaches claude-watch.skill.
MIT. Built on yt-dlp, ffmpeg, and Claude's multimodal Read tool. Whisper transcription via Groq or OpenAI.
Own this plugin?
Verify ownership to unlock analytics, metadata editing, and a verified badge. GitHub access is read-only (username + org membership).
Sign in to claimOwn this plugin?
Verify ownership to unlock analytics, metadata editing, and a verified badge. GitHub access is read-only (username + org membership).
Sign in to claimBased on adoption, maintenance, documentation, and repository signals. Not a security audit or endorsement.
npx claudepluginhub devinilabs/claude-watch --plugin claude-watchGoogle Stitch MCP server + agent skills for Claude Code. Generate, edit, and export UI screens from text prompts — with design-system awareness.
Let Claude watch a video. Downloads with yt-dlp, extracts auto-scaled frames with ffmpeg, pulls captions or falls back to Whisper, and hands frames + transcript to Claude so it can answer questions about the video.
Turn videos into a sequence of relevant still frames + transcript + a self-contained HTML report so Claude can view them as images, hear the audio, and write its analysis back into the report. Pass a local path, an http(s) URL, or pipe video bytes on stdin.
Give Claude the ability to watch and understand videos — extracts frames and audio for full video perception
Summarize videos, audio, and podcasts via BibiGPT CLI directly in the terminal
Compose yt-dlp + ffmpeg + Whisper into a single command that hands an AI agent the raw materials to watch any social video — VIDEO + FRAMES + TRANSCRIPT, ready for an LLM to read frames as images and transcript as text.
Download videos from 1800+ platforms and generate AI summaries with complete resource packages