By krasnoperov
Audio/video transcription with speaker diarization, AI summarization, and infographic generation. Transform recordings into transcripts, summaries, and visual content.
AI transcription skill for Claude Code - Transform audio/video recordings into transcripts with speaker diarization, AI-powered summaries, and visual infographics.
This skill provides a complete pipeline for processing recordings:
See skills/transcribe/SKILL.md for complete usage guide.
This is a Claude Code skill. Install it from the marketplace:
/plugin marketplace add krasnoperov/claude-plugins
/plugin install transcribe@krasnoperov-plugins
Once installed, use the /transcribe skill in your conversations:
/transcribe transcribe meeting.mp4 to VTT with speaker diarization
/transcribe summarize this transcript into key points
/transcribe create an infographic from this summary
You can also use this package directly via npx:
export OPENAI_API_KEY="your-openai-key"
export GOOGLE_AI_STUDIO_KEY="your-google-key"
# Transcribe audio/video
npx -y @krasnoperov/transcribe@latest transcribe meeting.mp4 -o transcript.vtt
# Generate summary
npx -y @krasnoperov/transcribe@latest summarize transcript.vtt -o summary.md
# Create infographic
npx -y @krasnoperov/transcribe@latest infographic summary.md -o visual.png
# All-in-one pipeline
npx -y @krasnoperov/transcribe@latest process recording.mp4 --output-dir ./output
Get your API keys:
transcribe <input> Audio/Video → VTT transcript with speakers
summarize <input> Text/VTT → Markdown summary
infographic <input> Text → Visual infographic image
process <input> All-in-one: video → transcript → summary → infographic
These operations can be used individually or chained together.
See skills/transcribe/examples/ directory:
npx -y @krasnoperov/transcribe@latest transcribe podcast.mp3 \
--language es \
--model gpt-4o-transcribe-diarize \
-o podcast.vtt
npx -y @krasnoperov/transcribe@latest transcribe meeting.mp4 \
--model gemini-3 \
-o meeting.vtt
Gemini 3 offers excellent transcription with built-in speaker diarization and can handle very long audio files (up to ~8 hours).
Output (VTT with speaker tags):
WEBVTT
00:00:00.000 --> 00:00:02.450
<v A>Welcome to the podcast...
00:00:02.850 --> 00:00:08.200
<v B>Thanks for having me...
npx -y @krasnoperov/transcribe@latest summarize transcript.vtt \
--prompt "Focus on action items and decisions" \
-o summary.md
npx -y @krasnoperov/transcribe@latest infographic summary.md \
--style "modern minimal corporate" \
-o infographic.png
--model <model> Transcription model:
OpenAI: gpt-4o-transcribe-diarize (default), gpt-4o-transcribe, whisper-1
Google: gemini-3
--language <lang> Language code (en, es, ru, de, etc.)
-o, --output <file> Output VTT file
--prompt <text> Custom summarization instructions
-o, --output <file> Output markdown file
--style <text> Style instructions for visual
--reference <image> Reference image for style
-o, --output <file> Output image file
--output-dir <dir> Output directory for all files
--language <lang> Language for transcription
--model <model> Transcription model
--style <text> Style for infographic
# macOS
brew install ffmpeg
# Ubuntu/Debian
sudo apt install ffmpeg
npm run build # Build TypeScript
npm run typecheck # Type checking
npm run test # Run tests
npm run dev # Dev mode with type stripping
MIT License - Copyright (c) 2025 Aleksei Krasnoperov
See LICENSE file for details.
Own this plugin?
Verify ownership to unlock analytics, metadata editing, and a verified badge. GitHub access is read-only (username + org membership).
Sign in to claimOwn this plugin?
Verify ownership to unlock analytics, metadata editing, and a verified badge. GitHub access is read-only (username + org membership).
Sign in to claimBased on adoption, maintenance, documentation, and repository signals. Not a security audit or endorsement.
npx claudepluginhub krasnoperov/claude-plugins --plugin transcribeConsistent image generation with Gemini - Generate images from text, edit with natural language, create consistent image series with reference sheets methodology.
Blocking-gate workflow for delivering a PR end-to-end with review-quill and merge-steward.
Ultra-compressed communication mode. Cuts ~75% of tokens while keeping full technical accuracy by speaking like a caveman.
Frontend design skill for UI/UX implementation
Comprehensive UI/UX design plugin for mobile (iOS, Android, React Native) and web applications with design systems, accessibility, and modern patterns
Memory compression system for Claude Code - persist context across sessions
Marketing skills for AI agents — conversion optimization, copywriting, SEO, paid ads, ad creative, and growth
Standalone image generation plugin using Nano Banana MCP server. Generates and edits images, icons, diagrams, patterns, and visual assets via Gemini image models. No Gemini CLI dependency required.