From Claude Eyes
Watch a video (URL or local path) and answer questions about it. Use when the user shares a video URL, asks Claude to look at a video file, or asks about content in a video. Smart scene-aware frame extraction + optional OCR keep token cost low.
How this skill is triggered — by the user, by Claude, or both
Slash command
/claude-eyes:see <video-url-or-path> [question]<video-url-or-path> [question]This skill is limited to the following tools:
The summary Claude sees in its skill listing — used to decide when to auto-load this skill
You don't have a video input; this skill gives you one. A Python pipeline
You don't have a video input; this skill gives you one. A Python pipeline
downloads the video, picks interesting frames (scene changes + minimum
cadence, after perceptual-dedup), optionally OCRs them, and produces a
markdown report you Read to see the video. The report lists frame paths +
timestamps + transcript; you Read frames in parallel to view them.
Every /see run starts with:
"${CLAUDE_PLUGIN_ROOT}"/scripts/setup.py --check --quiet
Exit 0 → proceed silently. Do NOT announce "setup is complete" — the user doesn't need a status message every turn.
On non-zero exit, follow the table:
| Exit | Meaning | Action |
|---|---|---|
2 | Missing required binaries (ffmpeg / ffprobe / yt-dlp) | Run the installer |
3 | No Whisper API key (only matters when the user wants Whisper) | Run installer to scaffold .env, then ask user for a key — OR proceed with --no-whisper if user passes --transcript |
4 | Both missing | Run installer first, then handle key |
Installer (idempotent):
"${CLAUDE_PLUGIN_ROOT}"/scripts/setup.py
For exit 3, also offer to skip Whisper entirely by passing --transcript <file>
on this invocation — many users have their own subtitle file and don't need
Whisper at all.
A SessionStart hook runs this preflight once per session, so within a
single session you can skip Step 0 on follow-up /see calls.
yt-dlp supports) and asks about it..mp4, .mov, .mkv, .webm, etc.)
and asks about it./claude-eyes:see <url-or-path> [question].Split the argument into:
--start, --end, --transcript, --resolution,
--max-frames, --profile, --ocr, --preview, --no-whisper,
--no-cache out of the user input if present.Examples:
https://youtu.be/abc what language is this? → source = URL, question = "what language is this?"~/screencast.mp4 --transcript ~/sub.srt summarize → source = path, --transcript ~/sub.srt, question = "summarize"~/talk.mp4 --start 2:15 --end 2:45 → source = path, --start 2:15 --end 2:45, no question.For broad questions about long videos, full-resolution frame extraction is overkill. Default to a contact-sheet preview first:
Use --preview when ANY of these are true:
--start/--end to scope it.Skip --preview when:
--start/--end themselves.After reading the contact sheet, decide:
--start/--end
on a section if they want detail.--preview on the relevant section.python3 "${CLAUDE_PLUGIN_ROOT}/scripts/see.py" "<source>" [flags]
Flag cheat-sheet (all optional):
| Flag | Meaning |
|---|---|
--question "…" | Pass the user's question so the script can auto-toggle --ocr when it mentions on-screen text |
--transcript PATH | User-supplied transcript (.vtt, .srt, .json, .txt). Skips both caption fetch and Whisper — saves API spend and download time. |
--profile {auto,tutorial,live} | Default auto. tutorial is aggressive about deduping (screencasts); live preserves subtle motion (cameras, gameplay). |
--start T --end T | Focus on a section. Accepts SS, MM:SS, HH:MM:SS. |
--max-frames N | Tighter token budget. Default 100. |
--resolution W | Frame width (default 512). Bump to 768/1024 ONLY if text/visual detail needs it AND OCR didn't capture it. |
--ocr | Force-enable Tesseract OCR pass; writes <frame>.txt siblings. Auto-on when the user's question mentions text. |
--no-whisper | Disable Whisper fallback. |
--whisper {groq,openai} | Force a Whisper backend. |
--preview | Contact-sheet mode — one image instead of frames. |
--no-cache | Skip cache lookup and don't write cache. |
--out-dir DIR | Specific working directory (default: cache dir or tmp). |
By default, re-running /see on the same source with the same settings
re-emits the cached report instantly. If the first line of the report is
[cached], that's why. To force a fresh run, pass --no-cache.
The script prints a markdown report listing:
## Frames (in chronological order with t=MM:SS timestamps).<frame>.txt OCR siblings (when --ocr ran).## Transcript (source labeled: captions, user-supplied, or whisper).If OCR ran, Read the .txt files first — they're cheap text. Only Read
the JPEGs when:
If no OCR, Read all frame JPEGs in parallel (one message, multiple Read calls) so you see them together.
Frame timestamps are absolute (real video timeline, even when --start
was used).
You now have:
If the user asked a specific question, answer it directly with timestamp citations. If they didn't ask, give a structured summary: what the video is about, key moments, notable visuals, spoken content.
Use MM:SS timestamps when citing — they match the report and the user can
seek to them in the original video.
The report's last line shows the working directory. If caching was on
(the default), the dir IS the cache entry — don't delete it, so future
re-runs hit the cache. If you passed --no-cache or --out-dir, and the
user isn't going to ask follow-ups, rm -rf <dir> is fine.
--transcript as the alternative before asking the user for a key.yt-dlp failed → its error goes to stderr. If it's login-required or
region-locked, tell the user plainly. Don't retry.--transcript for next time.--whisper openai (or --no-whisper + manual
transcript).--preview or --start/--end
rather than running a sparse full scan.The pipeline's whole point is fewer tokens than a fixed-fps sampler:
--preview = one contact sheet ≈ 1–2 k vision tokens for the whole video.tutorial profile dedupes static stretches → typically 20–40 frames where
fixed-fps would emit 60–80 for the same coverage.--ocr text is near-free; reading 5 text files is cheaper than reading
one 1024-px frame.yt-dlp, ffmpeg, ffprobe, and optionally tesseract locally.--transcript
wasn't provided, and (c) --no-whisper wasn't set.~/.config/claude-eyes/.env (mode 0600) for API keys.${CLAUDE_PLUGIN_DATA}/cache/<hash>/ (or tmp if --no-cache).Bundled scripts: scripts/see.py (orchestrator), scripts/frames.py (smart
extractor), scripts/download.py (yt-dlp wrapper), scripts/transcript.py
(parser), scripts/whisper.py (Groq/OpenAI client), scripts/ocr.py
(Tesseract wrapper), scripts/contact.py (preview), scripts/cache.py,
scripts/setup.py. Review before first use if you like.
npx claudepluginhub jain777/claude-eyesCreates, edits, and optimizes skills for Claude Code, including drafting, evaluating with test prompts, iterating on performance, and improving skill descriptions for better triggering accuracy.