From transcribe
Audio/video transcription with speaker diarization (OpenAI Whisper), AI summarization (GPT-5.1), and infographic generation (Gemini). Use when processing recordings, meeting videos, podcasts, or any audio/video content that needs to be converted to text, summarized, or visualized.
How this skill is triggered — by the user, by Claude, or both
Slash command
/transcribe:transcribeThis skill is limited to the following tools:
The summary Claude sees in its skill listing — used to decide when to auto-load this skill
CLI: `npx @krasnoperov/transcribe <command> [args] [options]`
CLI: npx @krasnoperov/transcribe <command> [args] [options]
Four operations for audio/video content processing:
transcribe <input> # Audio/Video -> VTT transcript with speaker diarization
summarize <input> # Text/VTT -> Markdown summary
infographic <input> # Text -> Visual infographic image
process <input> # All-in-one pipeline: video -> transcript -> summary -> infographic
# Required: ffmpeg for audio processing
brew install ffmpeg # macOS
apt install ffmpeg # Linux
# API keys
export OPENAI_API_KEY="your-key" # Get at https://platform.openai.com/api-keys
export GOOGLE_AI_STUDIO_KEY="your-key" # Get at https://ai.google.dev/
# Transcribe a video with speaker diarization (OpenAI, default)
npx @krasnoperov/transcribe transcribe meeting.mp4 -o transcript.vtt
# Transcribe with Gemini (good for long audio, up to ~8 hours)
npx @krasnoperov/transcribe transcribe podcast.mp3 --model gemini-3 -o transcript.vtt
# Generate summary from transcript
npx @krasnoperov/transcribe summarize transcript.vtt -o summary.md
# Create infographic from summary
npx @krasnoperov/transcribe infographic summary.md --style "modern minimal" -o visual.png
# All-in-one: process entire video
npx @krasnoperov/transcribe process recording.mp4 --language en --output-dir ./output
--model <model> Transcription model:
OpenAI models:
- gpt-4o-transcribe-diarize (default, with speakers)
- gpt-4o-transcribe (no speakers)
- whisper-1 (legacy)
Google models:
- gemini-3 (with speakers, handles long audio up to ~8h)
--language <lang> Language code (e.g., en, es, ru, de)
-o, --output <file> Output VTT file path
--prompt <text> Custom summarization instructions
-o, --output <file> Output markdown file path
--style <text> Style instructions (e.g., "artistic", "corporate", "playful")
--reference <image> Reference image for visual style
-o, --output <file> Output image file path
--output-dir <dir> Output directory for all files
--language <lang> Language code for transcription
--model <model> Transcription model
--style <text> Style for infographic
<v Speaker 1>text</v>Creates, edits, and optimizes skills for Claude Code, including drafting, evaluating with test prompts, iterating on performance, and improving skill descriptions for better triggering accuracy.
npx claudepluginhub krasnoperov/claude-plugins --plugin transcribe