Stats

Actions

Available In

Tags

Video Intel

30 seconds to read a mind map vs. 30 minutes to watch the video. Scanned 15 videos from a single channel in under 2 minutes, ~$0.15-0.25 each. Free tier covers 8 hours of YouTube video per day.

Multimodal video intelligence powered by Gemini. Scan YouTube channels, generate thematic mind maps, and produce enriched transcripts that capture what was said AND what was shown on screen.

Key Principles

Multimodal, not transcript-based. Gemini sees video frames at 1 FPS, reads all on-screen text, and hears audio simultaneously. When a presenter says "as you can see here," the output tells you what was actually shown.

Decoupled task prompting. Transcription (audio) and speaker identification (vision) run as separate tasks within a single prompt to preserve attention quality, borrowed from Laurent Picard's research.

Scan-then-triage funnel. Mind maps are cheap and fast. Read 30-second summaries, then spend transcript budget only on videos worth deep engagement.

Idempotent processing. Re-running a scan skips already-processed videos. Safe to interrupt, safe to re-run.

How It Works

The architecture is a narrowing funnel - like fishing, where you look for birds before you cast a line and read the water before you commit to a spot.

┌─────────────────────────────────────────────────────────────────┐ │ SCAN (the birds) Cost: ~$0.20/video │ │ ┌───────────────┐ ┌───────────────────┐ ┌─────────────┐ │ │ │ YouTube Data │───>│ Gemini Multimodal │───>│ mindmap.md │ │ │ │ API: discover │ │ API: watch frames │ │ meta.json │ │ │ │ new videos │ │ + audio (parallel)│ │ per video │ │ │ └───────────────┘ └───────────────────┘ └─────────────┘ │ ├─────────────────────────────────────────────────────────────────┤ │ TRIAGE (the drop-off) Cost: $0 (no API) │ │ ┌──────────────────────────────────────────────────────────┐ │ │ │ You + Claude read mind maps. No Gemini needed. │ │ │ │ "Which of these 15 videos matter for agentic patterns?" │ │ │ └──────────────────────────────────────────────────────────┘ │ ├─────────────────────────────────────────────────────────────────┤ │ TRANSCRIPT (the catch) Cost: ~$0.50/video │ │ ┌────────────────────┐ ┌─────────────────────────────────┐ │ │ │ Gemini: 3-task │───>│ transcript.md │ │ │ │ decoupled prompt │ │ Diarized speech interleaved │ │ │ │ (audio + vision + │ │ with SCREEN sections describing │ │ │ │ speaker ID) │ │ slides, diagrams, code, demos │ │ │ └────────────────────┘ └─────────────────────────────────┘ │ ├─────────────────────────────────────────────────────────────────┤ │ CONCEPTS (the index) Cost: ~$0.001/video │ │ ┌────────────────────┐ ┌─────────────────────────────────┐ │ │ │ Gemini: text-only │───>│ concepts.json per video │ │ │ │ reads mindmap.md + │ │ Canonical IDs + synonyms │ │ │ │ existing taxonomy │ │ │ │ │ └────────────────────┘ │ taxonomy.json (derived master) │ │ │ └─────────────────────────────────┘ │ ├─────────────────────────────────────────────────────────────────┤ │ SEARCH (the retrieval) Cost: ~$0.02/query │ │ ┌───────────────────────────────────────────────────────────┐ │ │ │ Concept search: taxonomy.json labels + aliases (free) │ │ │ │ Hybrid search: BM25 keyword + vector semantic + RRF fusion│ │ │ │ Voyage AI embeds (voyage-4-large docs, voyage-4-lite │ │ │ │ queries), LanceDB stores + searches, BM25 matches exact │ │ │ │ words in titles + text. Results include full transcript │ │ │ │ passages + clickable YouTube URLs with timestamp links. │ │ │ └───────────────────────────────────────────────────────────┘ │ └─────────────────────────────────────────────────────────────────┘

scan - Fetch new videos from configured channels, generate thematic mind maps in parallel via Gemini. Optionally auto-generate transcripts for channels where you want everything.

transcript - Fused document for a single video: diarized speech interleaved with timestamped SCREEN sections describing slides, diagrams, code, and demos. Speaker names identified from visual cues with evidence.

concepts - Extract and normalize key concepts from each mindmap against a growing canonical vocabulary. One video calls it "Agent-Centric Engineering," another calls it "Multi-Agent Orchestration" — the concept layer resolves them to the same canonical ID.

taxonomy-build - Rebuild the master vocabulary (taxonomy.json) from all per-video concept files. This is a derived artifact — always rebuildable, never manually edited.

triage - After scanning, ask Claude (no Gemini cost):

Video Intel

30 seconds to read a mind map vs. 30 minutes to watch the video. Scanned 15 videos from a single channel in under 2 minutes, ~$0.15-0.25 each. Free tier covers 8 hours of YouTube video per day.

Multimodal video intelligence powered by Gemini. Scan YouTube channels, generate thematic mind maps, and produce enriched transcripts that capture what was said AND what was shown on screen.

Key Principles

Multimodal, not transcript-based. Gemini sees video frames at 1 FPS, reads all on-screen text, and hears audio simultaneously. When a presenter says "as you can see here," the output tells you what was actually shown.
Decoupled task prompting. Transcription (audio) and speaker identification (vision) run as separate tasks within a single prompt to preserve attention quality, borrowed from Laurent Picard's research.
Scan-then-triage funnel. Mind maps are cheap and fast. Read 30-second summaries, then spend transcript budget only on videos worth deep engagement.
Idempotent processing. Re-running a scan skips already-processed videos. Safe to interrupt, safe to re-run.

How It Works

The architecture is a narrowing funnel - like fishing, where you look for birds before you cast a line and read the water before you commit to a spot.

┌─────────────────────────────────────────────────────────────────┐
│  SCAN (the birds)                          Cost: ~$0.20/video   │
│  ┌───────────────┐    ┌───────────────────┐    ┌─────────────┐  │
│  │ YouTube Data  │───>│ Gemini Multimodal │───>│ mindmap.md  │  │
│  │ API: discover │    │ API: watch frames │    │ meta.json   │  │
│  │ new videos    │    │ + audio (parallel)│    │ per video   │  │
│  └───────────────┘    └───────────────────┘    └─────────────┘  │
├─────────────────────────────────────────────────────────────────┤
│  TRIAGE (the drop-off)                     Cost: $0 (no API)    │
│  ┌──────────────────────────────────────────────────────────┐   │
│  │ You + Claude read mind maps. No Gemini needed.           │   │
│  │ "Which of these 15 videos matter for agentic patterns?"  │   │
│  └──────────────────────────────────────────────────────────┘   │
├─────────────────────────────────────────────────────────────────┤
│  TRANSCRIPT (the catch)                    Cost: ~$0.50/video   │
│  ┌────────────────────┐    ┌─────────────────────────────────┐  │
│  │ Gemini: 3-task     │───>│ transcript.md                   │  │
│  │ decoupled prompt   │    │ Diarized speech interleaved     │  │
│  │ (audio + vision +  │    │ with SCREEN sections describing │  │
│  │  speaker ID)       │    │ slides, diagrams, code, demos   │  │
│  └────────────────────┘    └─────────────────────────────────┘  │
├─────────────────────────────────────────────────────────────────┤
│  CONCEPTS (the index)                      Cost: ~$0.001/video  │
│  ┌────────────────────┐    ┌─────────────────────────────────┐  │
│  │ Gemini: text-only  │───>│ concepts.json per video         │  │
│  │ reads mindmap.md + │    │ Canonical IDs + synonyms        │  │
│  │ existing taxonomy  │    │                                 │  │
│  └────────────────────┘    │ taxonomy.json (derived master)  │  │
│                            └─────────────────────────────────┘  │
├─────────────────────────────────────────────────────────────────┤
│  SEARCH (the retrieval)                    Cost: ~$0.02/query   │
│  ┌───────────────────────────────────────────────────────────┐  │
│  │ Concept search: taxonomy.json labels + aliases (free)     │  │
│  │ Hybrid search: BM25 keyword + vector semantic + RRF fusion│  │
│  │   Voyage AI embeds (voyage-4-large docs, voyage-4-lite    │  │
│  │   queries), LanceDB stores + searches, BM25 matches exact │  │
│  │   words in titles + text. Results include full transcript │  │
│  │   passages + clickable YouTube URLs with timestamp links. │  │
│  └───────────────────────────────────────────────────────────┘  │
└─────────────────────────────────────────────────────────────────┘

scan - Fetch new videos from configured channels, generate thematic mind maps in parallel via Gemini. Optionally auto-generate transcripts for channels where you want everything.

taxonomy-build - Rebuild the master vocabulary (taxonomy.json) from all per-video concept files. This is a derived artifact — always rebuildable, never manually edited.

triage - After scanning, ask Claude (no Gemini cost):

video-intel

Popularity

What's Inside

Confidence

README

Video Intel

Key Principles

How It Works

Similar Plugins

youtube

bulk-summarize

youtube-digest

watch

gr

bibi

Video Intel

Key Principles

How It Works

Popularity

Health & Quality

Similar Plugins

youtube

bulk-summarize

youtube-digest

watch

gr

bibi