By dzivkovi
YouTube video intelligence via Gemini: channel scanning, rich multimodal transcripts with on-screen content, concept taxonomy, hybrid search, and Bosnian/Croatian/Serbian subtitle translation. Three skills: video-intel (ingest/curate, runs in the plugin repo), video-intel-search (read-only, globally installable), translate-bcs (subtitle translation).
Translate YouTube videos and rich English transcripts into Bosnian/Croatian/Serbian (BCS) subtitles via Gemini. Use this skill whenever the user wants to: translate a YouTube video to Bosnian, Croatian, Serbian, or Serbo-Croatian; produce BCS captions or subtitles for a video; download just the English SRT from a YouTube video with no translation; or translate a context-rich transcript (with on-screen content and speaker labels) into BCS while preserving that context. Trigger phrases include "translate to Bosnian", "translate to Croatian", "translate to Serbian", "Serbo-Croatian subtitles", "BCS subtitles", "BCS captions", "titl na bosanski", "titl na srpski", "titl na hrvatski", "prevedi ovaj video", "prevedi titlove", "download the English SRT", "just give me the SRT", any YouTube URL followed by a request for Bosnian/Croatian/Serbian output, and any request to caption or subtitle a YouTube video in one of those languages. This skill is for subtitle translation only — for English transcription with on-screen content, use the video-intel skill instead. Built for diaspora audiences in Bosnia, Serbia, Croatia, Montenegro, Kosovo, and neighboring regions who want long-form YouTube content in a language they read fluently.
Query a pre-built video corpus (mindmaps, transcripts, concept taxonomy, hybrid search index) produced by the video-intel skill. Use whenever the user wants to: find videos about a topic; look up what a creator said about something; retrieve evidence or quotes from transcripts; browse concepts in the library; synthesize a cross-creator brief ("nugget") grounded in indexed evidence; ask about corpus status (last scan, video counts); summarize a specific video that is already in the corpus; decide if a video is worth watching based on its indexed content. This skill is read-only against an existing corpus - it does not scan, index, or transcribe. Safe to install globally and invoke from any project. Trigger phrases: "find videos about [X]", "search my videos for [X]", "what videos cover [X]", "what did [creator] say about [Y]", "evidence for [claim]", "when did [creator] mention [Z]", "nugget brief on [X]", "consultant brief on [X]", "what do creators say about [X]", "agreements and disagreements on [X]", "synthesize insights across creators", "mental models across creators", "find the nuggets about [X]", "show corpus status", "when was this last scanned", "what concepts are in my library", "what topics recur across channels", "summarize this video", "is this worth watching", "what should I watch", "verify whether [creator] said [paraphrase]", "fact-check this quote against [creator]'s videos", "did [creator] really say [X]", "is this [creator] quote real", "find the source for this [creator] claim", "check the corpus for the quote [paraphrase]", any YouTube URL followed by a question about its content. For scanning new videos, transcribing, generating mindmaps, rebuilding the index, or any write operation on the corpus, use the video-intel skill from the plugin repo instead - those operations require channels configured and API keys the search skill does not need.
Ingest/curate side of the video intelligence plugin. Use whenever the user wants to modify or build the corpus: scan YouTube channels for new videos and generate mind maps via Gemini; transcribe a video (URL or local MP4); run the full mindmap+transcript+concepts pipeline on a local video; extract and normalize concepts into the taxonomy; rebuild the LanceDB hybrid-search index from existing transcripts; clean up title-rotation duplicates; prune YouTube Shorts that polluted the corpus; manage channel configuration. Trigger phrases: "scan channel", "scan all channels", "what's new from [creator]" (with scan intent), "last N days of [creator]", "transcribe this video", "transcribe [creator]'s backlog", "videos I am missing from [creator]", "catch up on [creator]", "fully scan [creator]", "backfill [creator]", "add [channel] to my watchlist", "find duplicate videos", "clean up duplicates", "dedupe my corpus", "the same video got scanned twice", "why do I have two mindmaps for [video]", "creator rotated the title", "prune shorts", "remove shorts from corpus", "delete YouTube Shorts", "clean up shorts", "too many shorts in my corpus", "process this local video", "run the full pipeline on [file]", "mindmap plus transcript plus concepts on one upload", "do everything on this MP4", "extract concepts", "rebuild the taxonomy", "rebuild the index", "build the search index". Requires GEMINI_API_KEY, YOUTUBE_API_KEY, and `channels:` configured in config.yaml - this skill must run from the plugin repo checkout, not a globally-installed cache. Calls Gemini as multimodal proxy (frames + on-screen text + audio). For read-only queries against an already-built corpus (library search, cross-creator synthesis, corpus-freshness reports, summarizing a video that is already indexed), use the video-intel-search skill instead - that one is safe to install globally and invoke from any project.
Own this plugin?
Verify ownership to unlock analytics, metadata editing, and a verified badge. GitHub access is read-only (username + org membership).
Sign in to claimOwn this plugin?
Verify ownership to unlock analytics, metadata editing, and a verified badge. GitHub access is read-only (username + org membership).
Sign in to claimBased on adoption, maintenance, documentation, and repository signals. Not a security audit or endorsement.
30 seconds to read a mind map vs. 30 minutes to watch the video. Scanned 15 videos from a single channel in under 2 minutes, ~$0.15-0.25 each. Free tier covers 8 hours of YouTube video per day.
Multimodal video intelligence powered by Gemini. Scan YouTube channels, generate thematic mind maps, and produce enriched transcripts that capture what was said AND what was shown on screen.
The architecture is a narrowing funnel - like fishing, where you look for birds before you cast a line and read the water before you commit to a spot.
┌─────────────────────────────────────────────────────────────────┐
│ SCAN (the birds) Cost: ~$0.20/video │
│ ┌───────────────┐ ┌───────────────────┐ ┌─────────────┐ │
│ │ YouTube Data │───>│ Gemini Multimodal │───>│ mindmap.md │ │
│ │ API: discover │ │ API: watch frames │ │ meta.json │ │
│ │ new videos │ │ + audio (parallel)│ │ per video │ │
│ └───────────────┘ └───────────────────┘ └─────────────┘ │
├─────────────────────────────────────────────────────────────────┤
│ TRIAGE (the drop-off) Cost: $0 (no API) │
│ ┌──────────────────────────────────────────────────────────┐ │
│ │ You + Claude read mind maps. No Gemini needed. │ │
│ │ "Which of these 15 videos matter for agentic patterns?" │ │
│ └──────────────────────────────────────────────────────────┘ │
├─────────────────────────────────────────────────────────────────┤
│ TRANSCRIPT (the catch) Cost: ~$0.50/video │
│ ┌────────────────────┐ ┌─────────────────────────────────┐ │
│ │ Gemini: 3-task │───>│ transcript.md │ │
│ │ decoupled prompt │ │ Diarized speech interleaved │ │
│ │ (audio + vision + │ │ with SCREEN sections describing │ │
│ │ speaker ID) │ │ slides, diagrams, code, demos │ │
│ └────────────────────┘ └─────────────────────────────────┘ │
├─────────────────────────────────────────────────────────────────┤
│ CONCEPTS (the index) Cost: ~$0.001/video │
│ ┌────────────────────┐ ┌─────────────────────────────────┐ │
│ │ Gemini: text-only │───>│ concepts.json per video │ │
│ │ reads mindmap.md + │ │ Canonical IDs + synonyms │ │
│ │ existing taxonomy │ │ │ │
│ └────────────────────┘ │ taxonomy.json (derived master) │ │
│ └─────────────────────────────────┘ │
├─────────────────────────────────────────────────────────────────┤
│ SEARCH (the retrieval) Cost: ~$0.02/query │
│ ┌───────────────────────────────────────────────────────────┐ │
│ │ Concept search: taxonomy.json labels + aliases (free) │ │
│ │ Hybrid search: BM25 keyword + vector semantic + RRF fusion│ │
│ │ Voyage AI embeds (voyage-4-large docs, voyage-4-lite │ │
│ │ queries), LanceDB stores + searches, BM25 matches exact │ │
│ │ words in titles + text. Results include full transcript │ │
│ │ passages + clickable YouTube URLs with timestamp links. │ │
│ └───────────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────────┘
scan - Fetch new videos from configured channels, generate thematic mind maps in parallel via Gemini. Optionally auto-generate transcripts for channels where you want everything.
transcript - Fused document for a single video: diarized speech interleaved with timestamped SCREEN sections describing slides, diagrams, code, and demos. Speaker names identified from visual cues with evidence.
concepts - Extract and normalize key concepts from each mindmap against a growing canonical vocabulary. One video calls it "Agent-Centric Engineering," another calls it "Multi-Agent Orchestration" — the concept layer resolves them to the same canonical ID.
taxonomy-build - Rebuild the master vocabulary (taxonomy.json) from all
per-video concept files. This is a derived artifact — always rebuildable,
never manually edited.
triage - After scanning, ask Claude (no Gemini cost):
npx claudepluginhub dzivkovi/video-intel --plugin video-intelYouTube tools — search, transcripts, video info, channel browsing, playlists
Media research assistant for bulk video and podcast summarization across YouTube, Apple Podcasts, Spotify, and other platforms
Summarize YouTube videos with transcript, insights, Korean translation, and quizzes
Let Claude watch a video. Downloads with yt-dlp, extracts auto-scaled frames with ffmpeg, pulls captions or falls back to Whisper, and hands frames + transcript to Claude so it can answer questions about the video.
Claude Code plugin for video analysis, deep research, content extraction, web search, and explainer video creation — powered by Gemini 3.5 Flash.
Summarize videos, audio, and podcasts via BibiGPT CLI directly in the terminal