From text-to-speech
Convert text to natural-sounding speech using Google Gemini TTS models. Supports 30 different voices and 24 languages. Use this skill when you need to generate audio narration, voiceovers, or spoken content from text.
How this skill is triggered — by the user, by Claude, or both
Slash command
/text-to-speech:text-to-speechThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
Convert text to natural-sounding speech using Google Gemini's TTS models. Supports:
Convert text to natural-sounding speech using Google Gemini's TTS models. Supports:
Reference: https://ai.google.dev/gemini-api/docs/speech-generation
bash ${CLAUDE_PLUGIN_ROOT}/scripts/gemini.sh --model=gemini-2.5-flash-preview-tts "TEXT TO SPEAK"
Arguments:
--model - Required: Use a TTS model (see Models below)--voice - Optional: Voice name (default: Kore)Examples:
# Generate speech with default voice
npx -y superconductor-gemini-skills --model=gemini-2.5-flash-preview-tts "Hello, welcome to our application."
# Use a specific voice
npx -y superconductor-gemini-skills --model=gemini-2.5-flash-preview-tts --voice=Puck "The quick brown fox jumps over the lazy dog."
# Generate longer narration
npx -y superconductor-gemini-skills --model=gemini-2.5-flash-preview-tts --voice=Charon "In today's tutorial, we'll explore the fundamentals of machine learning."
# Use higher quality model for professional content
npx -y superconductor-gemini-skills --model=gemini-2.5-pro-preview-tts --voice=Kore "This is a premium quality voice synthesis."
| Voice Name | Description |
|---|---|
Kore | Default voice, clear and professional |
Puck | Friendly and warm |
Charon | Deep and authoritative |
Fenrir | Energetic and dynamic |
Leda | Soft and gentle |
Orus | Neutral and balanced |
Zephyr | Light and airy |
Aoede | Melodic and expressive |
Additional voices: Altair, Calliope, Clio, Electra, Ember, Eris, Helios, Hyperion, Iris, Lyra, Melpomene, Nova, Orion, Polaris, Sage, Selene, Thalia, Titan, Vega, and more.
English, Spanish, French, German, Italian, Portuguese, Dutch, Russian, Japanese, Korean, Chinese (Simplified/Traditional), Arabic, Hindi, Turkish, Polish, Vietnamese, Thai, Indonesian, and more.
Languages are automatically detected from the input text.
Generated audio is saved to the current directory as gemini-speech-{timestamp}.wav.
The GEMINI_API_KEY environment variable must be set. Get your key at: https://ai.google.dev/gemini-api/docs/api-key
| Model ID | Context Window | Pricing (Input / Output) |
|---|---|---|
gemini-2.5-flash-preview-tts | 8k / 16k | $0.50 / $10 per 1M tokens |
gemini-2.5-pro-preview-tts | 8k / 16k | $1.00 / $20 per 1M tokens |
npx claudepluginhub superconductor/superconductor-plugin-marketplace --plugin text-to-speechGenerates realistic AI text-to-speech audio using Google Gemini TTS, ElevenLabs, and OpenAI TTS. Supports multi-speaker dialogues, podcasts, audiobooks, and voiceovers.
Generate speech from text via POST /audio/speech. Covers TTS models (Kokoro, Qwen 3, xAI, Inworld, Chatterbox, Orpheus, ElevenLabs Turbo, MiniMax, Gemini Flash), voices per family, output formats (mp3/opus/aac/flac/wav/pcm), streaming, prompt/emotion styling, temperature/top_p, and language hints.
Generate audio content — text-to-speech, podcasts, voice cloning, sound effects, speech-to-speech, dubbing, and audio isolation. Currently powered by ElevenLabs. Works with both the Python SDK and the ElevenLabs CLI. Includes ready-to-run generator scripts that Claude writes to a temp file and executes directly. Triggers: audio, elevenlabs, text-to-speech, TTS, podcast, voice, voiceover, narration, voice clone, sound effects, dubbing, speech-to-speech, audio isolation.