From audio-understanding
Transcribe and analyze audio content using Google Gemini. Supports local audio files (mp3, wav, m4a, ogg, flac) and YouTube links up to 9.5 hours long. Use this skill when you need to transcribe, summarize, or extract information from audio content.
How this skill is triggered — by the user, by Claude, or both
Slash command
/audio-understanding:audio-understandingThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
This skill supports audio analysis using Google Gemini models. Supported formats:
This skill supports audio analysis using Google Gemini models. Supported formats:
| Category | Extensions |
|---|---|
| Audio | .mp3, .wav, .m4a, .ogg, .flac |
Reference: https://ai.google.dev/gemini-api/docs/audio?example=dialogue
bash ${CLAUDE_PLUGIN_ROOT}/scripts/gemini.sh --file=AUDIO_PATH "YOUR QUESTION ABOUT THE AUDIO"
Arguments:
--file - Required: Local audio file path or YouTube URL--model - Optional: Model to use (defaults to gemini-3-flash-preview)Examples:
# Transcribe a local audio file
npx -y superconductor-gemini-skills --file=recording.mp3 "Transcribe this audio"
npx -y superconductor-gemini-skills --file=meeting.wav "Summarize the key points discussed"
# Analyze a podcast or YouTube audio
npx -y superconductor-gemini-skills --file="https://www.youtube.com/watch?v=dQw4w9WgXcQ" "Transcribe this audio"
npx -y superconductor-gemini-skills --file="https://youtu.be/dQw4w9WgXcQ" "What topics are discussed?"
# Extract specific information
npx -y superconductor-gemini-skills --file=interview.m4a "List all the questions asked by the interviewer"
npx -y superconductor-gemini-skills --file=lecture.ogg "Create a bullet-point summary of the main concepts"
The GEMINI_API_KEY environment variable must be set. Get your key at: https://ai.google.dev/gemini-api/docs/api-key
| Model ID | Context Window | Pricing (Input / Output) |
|---|---|---|
gemini-3-pro-preview | 1M / 64k | $2 / $12 (<200k), $4 / $18 (>200k) |
gemini-3-flash-preview | 1M / 64k | $0.50 / $3 |
gemini-2.5-pro | 1M / 65k | $1.25 / $10 (<200k), $2.50 / $15 (>200k) |
gemini-2.5-flash | 1M / 65k | $0.30 / $2.50 |
npx claudepluginhub superconductor/superconductor-plugin-marketplace --plugin audio-understandingSummarizes or extracts text/transcripts from URLs, local files, and YouTube links via a CLI tool. Supports multiple AI providers and configurable output length.
Transcribes audio/video from YouTube URLs or local files to structured markdown with timestamps, speaker labels, and chapters using Google Gemini API.
Transcribes audio/video files to Markdown with speaker diarization, timestamps, metadata, meeting minutes, and LLM summaries using Faster-Whisper or Whisper.