From alemtuzlak-skills
Transcribes video/audio files to text with word-level timestamps using a local whisper.cpp Docker service. Returns transcript.txt, transcript.srt, and transcript.words.json.
How this skill is triggered — by the user, by Claude, or both
Slash command
/alemtuzlak-skills:transcribe-videoThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
Transcribes any video/audio file locally using a bundled whisper.cpp service. No external project or cloud API. Returns word-level timestamps (needed for synced overlays/captions).
Transcribes any video/audio file locally using a bundled whisper.cpp service. No external project or cloud API. Returns word-level timestamps (needed for synced overlays/captions).
/produce-video.Everything is driven by the bundled runner — never ask the user to manage Docker:
node scripts/transcribe.mjs <path-to-video-or-audio> [--out <dir>] [--port 9111] [--language en] [--no-word-ts] [--task transcribe|translate]
node scripts/transcribe.mjs --stop # stop the warm container
The runner:
transcribe-video-whisper image from assets/whisper-service/ if it is missing (first build is slow: it compiles whisper.cpp and bakes the ggml-base.en.bin model).transcribe-video-whisper container (host port 9111 → container 9001); reuses it if already running, docker starts it if stopped.GET /healthz to report ok./transcribe with word_ts=true.transcript.txt, transcript.srt, transcript.words.json to --out (default: the input file's directory), and prints a JSON result line to stdout.transcript.txt — plain text.transcript.srt — subtitle text.transcript.words.json — [{ "word", "start", "end" }], seconds, time-ordered. This is the sync source for overlays/captions.stdout result line (for programmatic callers like /produce-video):
{ "ok": true, "outDir": "...", "wordCount": 38, "segmentCount": 2, "files": { "txt": "transcript.txt", "srt": "transcript.srt", "words": "transcript.words.json" } }
--restart unless-stopped) for warm reuse; stop with --stop.references/ for the Docker lifecycle, the API shape, and the vendoring provenance.npx claudepluginhub alemtuzlak/skills --plugin self-improveGenerates SRT/VTT subtitles and plain text transcripts from video or audio files using AWS Transcribe and ffmpeg. Useful for captions, extracting speech, notes, or searchable content.
Extract transcript or subtitles from a local video file. Use this skill whenever the user asks to transcribe a video, extract speech-to-text, get subtitles, or wants a text version of what's said in a video. Also trigger on "提取字幕", "视频转文字", "语音转文字", "transcribe", "extract audio text", or when the user references getting a script/transcript from any video file (mp4, mkv, mov, avi, webm). This skill is for LOCAL video files — for YouTube or other online URLs, use the download-video skill first to get the file, then transcribe it.
Transcribes YouTube/podcast/audio URLs to clean text using auto-captions or local whisper-cpp with Silero VAD. Provides verbatim transcripts as source-of-truth artifacts for research and quote extraction.