From brain-os
Transcribes YouTube/podcast/audio URLs to clean text using auto-captions or local whisper-cpp with Silero VAD. Provides verbatim transcripts as source-of-truth artifacts for research and quote extraction.
How this skill is triggered — by the user, by Claude, or both
Slash command
/brain-os:transcribe-videoThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
**Always before any research/finding extraction from a video or audio source.** Aggregator articles and same-cycle summaries hallucinate quotes — they conflate quotes from different talks by the same person. The verbatim transcript is the only trustable primary source.
Always before any research/finding extraction from a video or audio source. Aggregator articles and same-cycle summaries hallucinate quotes — they conflate quotes from different talks by the same person. The verbatim transcript is the only trustable primary source.
Trigger conditions:
/research is given a YouTube / podcast URL as the source[paraphrase] or [aggregator]Skip conditions:
_transcript-verbatim.md already exists for this URL in the findings folderThe script writes two files:
| File | Content |
|---|---|
<out>/_transcript-verbatim.md | Cleaned, paragraph-reflowed transcript with frontmatter (source URL, voice, method, capture date) |
<out>/_transcription-metadata.json | Build metadata: word count, method used, repetition zones detected, model + prompt, runtime |
The verbatim file is the source-of-truth artifact. Findings, reports, and content angles cite only quotes that grep into this file. Everything else is paraphrase and must be source-tagged.
yt-dlp --write-auto-subs --sub-langs en → fetch VTT → strip karaoke tags → dedupe consecutive lines → reflow into paragraphs.
references/typo-fixes.json); maintain it as new patterns surface--whisper flag)Download audio via yt-dlp → convert to 16kHz mono wav → run whisper-cli with Silero VAD + topic-seeded initial prompt.
When to use whisper:
When NOT to use whisper:
Default to auto-subs. Re-run with --whisper if a finding's verbatim quote contains a typo'd proper noun the user will want to cite.
--prompt seeds whisper's context with topic-relevant proper nouns. Without a prompt, whisper hallucinates novel spellings ("OpenClaw" / "CORTCO" / "JSON things"). Pull the prompt from:
--prompt "..." flag if user provided--get-title --get-description)/research contextAlways include in the prompt: speaker name(s), event name, and 5–10 likely proper nouns / acronyms.
Default output path: {vault}/knowledge/research/findings/{slug}/. Where slug is derived from --slug flag, the calling skill's findings dir, or <voice>-<topic> from yt-dlp metadata.
If the calling skill is /research, write to {vault}/knowledge/research/findings/{research-slug}/_transcript-verbatim.md so subsequent finding files in the same folder can [[wiki-link]] it.
When /research is invoked with a URL that's a YouTube watch link, podcast RSS item, or other audio/video source:
/transcribe-video first, write to the research's findings folder[primary — verbatim] not [paraphrase] or [aggregator]${CLAUDE_PLUGIN_ROOT}/scripts/transcribe-video.ts — TS+bun per global rule.
Direct invoke:
PLUGIN_ROOT="${CLAUDE_PLUGIN_ROOT:-$(ls -d ~/.claude/plugins/cache/brain-os-marketplace/brain-os/*/ 2>/dev/null | sort -V | tail -1)}"; PLUGIN_ROOT="${PLUGIN_ROOT%/}"
bun ${PLUGIN_ROOT}/scripts/transcribe-video.ts <URL> --out <DIR> [--whisper] [--prompt "...."]
The script requires:
yt-dlp (brew install yt-dlp)ffmpeg (brew install ffmpeg)whisper-cli (brew install whisper-cpp) + model files (auto-downloaded to ~/.cache/whisper-models/ on first whisper run)If a dependency is missing, the script prints a single-line install command and exits with code 2. Don't try to auto-install — let the user run it.
# default — auto-subs, save to current research findings dir
bun scripts/transcribe-video.ts https://www.youtube.com/watch?v=96jN2OCOfLs --out knowledge/research/findings/karpathy-vibe-to-agentic
# high quality with whisper, with topic-seeded prompt
bun scripts/transcribe-video.ts https://www.youtube.com/watch?v=96jN2OCOfLs \
--out knowledge/research/findings/karpathy-vibe-to-agentic \
--whisper \
--prompt "Andrej Karpathy at Sequoia AI Ascent 2026. Topics: vibe coding, agentic engineering, Software 1.0/2.0/3.0, Claude Code, OpenCode, Codex, NanoGPT, jaggedness, verifiability, Menugen, Nano Banana."
| Failure | Cause | Mitigation |
|---|---|---|
| Repetition loop in whisper output | Beam search degenerate state on long silence / repeated content | Script detects + splices auto-sub text into repetition zone |
| Auto-subs unavailable | Channel disabled subs / region-locked | Auto-fall back to whisper |
| URL is not YouTube | Generic audio file | Skip yt-dlp subs path, go straight to ffmpeg + whisper |
| Whisper missing model | First run | Auto-download large-v3-turbo (1.6GB) + Silero VAD (885KB) — happens once, ~30 sec on fast network |
| Wrong proper nouns in auto-subs | Caption upload defaults | Apply references/typo-fixes.json post-process; user can extend the file |
references/typo-fixes.json — keep current. When a transcript has a new misheard proper noun, add it. The file is sorted by precedence (longer phrases first to avoid partial-match collisions).
When whisper-cli or yt-dlp behavior shifts (e.g. a new flag default), update the script. Keep the SKILL.md decision rules stable — those are the contract with calling skills.
Follow {vault}/skill-spec.md § 11. After the transcript is written (the skill's own outcome — not after the caller's research/findings step), append to {vault}/daily/skill-outcomes/transcribe-video.log:
{date} | transcribe-video | transcribe | ~/work/brain-os-plugin | {out}/_transcript-verbatim.md | commit:none | {result}
result: pass — transcript delivered (either path) with no repetition-zone splices needed; partial — delivered but whisper repetition zones were spliced or new typo-fix entries had to be added; fail — both auto-subs and whisper paths failed, no transcript written.commit:none — transcripts land in findings dirs that are committed by the calling skill, not by this one.args="{url}" (enables replay), method=auto-subs|whisper.If result != pass, auto-invoke /brain-os:improve transcribe-video.
npx claudepluginhub sonthanh/brain-os-pluginDownloads YouTube video transcripts as timestamped text files via youtube-transcript-api. Use for extracting transcripts from videos for analysis, documentation, or content review.
Transcribes audio/video from YouTube URLs or local files to structured markdown with timestamps, speaker labels, and chapters using Google Gemini API.
Transcribes audio/video files to text using Faster-Whisper or Whisper, generating structured meeting minutes, executive summaries, and subtitle files (SRT, VTT).