From summer
Generates TTS voice lines for NPCs, narrator, and dialogue with voice-id discovery, character-to-voice matching, and stability/style/similarity guidance.
How this skill is triggered — by the user, by Claude, or both
Slash command
/summer:voice-line.summer/**audio/voice/**scripts/****/*.tscnThis skill is limited to the following tools:
The summary Claude sees in its skill listing — used to decide when to auto-load this skill
The hardest part of TTS isn't the line — it's picking the right voice. ElevenLabs ships hundreds; pick wrong and a tough warlord NPC sounds like a podcast host. This skill walks the voice-id discovery flow (`summer_list_models` family `audio-voice`), narrows by gender / accent / age / energy, presents 4–5 finalists with sample URLs for the user to audition, and locks the pick before generating.
The hardest part of TTS isn't the line — it's picking the right voice. ElevenLabs ships hundreds; pick wrong and a tough warlord NPC sounds like a podcast host. This skill walks the voice-id discovery flow (summer_list_models family audio-voice), narrows by gender / accent / age / energy, presents 4–5 finalists with sample URLs for the user to audition, and locks the pick before generating.
Then it generates one line (or a multi-turn dialogue) with the right stability / style / similarity / speed for the delivery — flat for narration, expressive for a bark, fast for an excited shout.
text_to_dialogue capability.audio/sound-effect with prompt male grunt of pain, mid-thirties, sharp, 400ms.summer_import_from_url and skip TTS.Read .summer/audio-bible.md
Read .summer/memory/casting/voices.md # preferred cast memory, if present
Read .summer/voice-cast.md # legacy cast memory, if present
Read .summer/characters.md # if present
Glob .summer/characters/*.md
Glob .summer/memory/characters/*.md
If a character bible exists, the voice should match the character's age / gender / regional origin / energy. If it doesn't, ask:
Tell me the character: gender, rough age, accent or regional flavor, and energy (calm / measured / excited / gruff). Or just give me a reference — "sounds like the captain in Mass Effect" works.
summer_list_models(family="audio-voice")
Returns a list with id, name, accent, gender, age, description, and previewUrl. Filter mentally to 8–12 candidates by the character's hard constraints (gender + age + accent), then pick 4–5 finalists with maximally distinct character (gruff vs warm vs neutral vs theatrical).
Present them to the user like this:
Four finalists for "old gruff dwarven warlord":
- Brogan — male, 50s, rough Scottish, gravelly. Sample:
<previewUrl>- Hodge — male, 60s, neutral, weathered baritone. Sample:
<previewUrl>- Marrick — male, 40s, rough English, mid-energy. Sample:
<previewUrl>- Old Drum — male, 70s, deep, slow speech. Sample:
<previewUrl>Audition and pick one. Or ask for four more.
Wait for the user. Do not pick for them — voice is the highest-stakes audio decision in the project.
If the user says "you pick", lean on this tree:
| Character archetype | Look for |
|---|---|
| Old gruff warlord | male, 50–70, rough or weathered, low pitch, slow speech, regional accent for color |
| Wise mentor | male or female, 50+, neutral or warm, measured cadence, mid-low pitch |
| Excited young hero | 18–28, high energy, mid-high pitch, your own region's dominant accent |
| Cold villain | 30–50, neutral or theatrical, low-mid pitch, slow with sharp consonants |
| Mischievous trickster | varies age, theatrical, light pitch, fast cadence |
| Calm scholarly NPC | 30–50, neutral, mid pitch, even cadence, no regional color |
| Scared child | child voice if available; otherwise high-pitched young adult with high stability |
| Robotic / AI | flat / synthesized voice; or any voice with stability=1.0 and style=0.0 for monotone |
| Narrator (cinematic) | mid-deep male or female, neutral or theatrical, even cadence, very stable |
| Narrator (intimate) | warmer, mid pitch, lower stability for breathy nuance |
After the pick:
Locked voice:
Hodge(id21m00Tcm4TlvDq8ikWAM). I'll use this for all<character>lines unless you say otherwise.
Ask before writing:
May I update
.summer/memory/casting/voices.mdwith this locked voice assignment?
Save it in .summer/memory/casting/voices.md:
---
id: casting.voice.main-cast
type: casting
status: active
priority: locked
stable: true
providers:
- elevenlabs
---
# Voice Cast
| Character | Voice name | Voice id | Notes |
|---|---|---|---|
| Captain Hodge | Hodge | 21m00Tcm4TlvDq8ikWAM | Calm, weathered, low-mid |
| Bram the Smith | Brogan | <id> | Gruff Scottish, mid energy |
This is the cast bible. Every future voice line uses these ids — consistency is the point.
If legacy .summer/voice-cast.md already exists, read it first and either keep using it for this project or ask to migrate it into .summer/memory/casting/voices.md. Do not maintain two conflicting cast files.
ElevenLabs voice options:
| Param | Range | Effect |
|---|---|---|
stability | 0.0–1.0 | Higher = monotone, consistent. Lower = expressive, variable. |
similarity_boost | 0.0–1.0 | Higher = closer to the original sample. Lower = more freedom (can drift). |
style | 0.0–1.0 | Higher = more theatrical / exaggerated. 0 = neutral. |
speed | 0.7–1.2 | Speech speed multiplier. |
Defaults that work as a starting point:
| Delivery | stability | similarity_boost | style | speed |
|---|---|---|---|---|
| Calm narration | 0.75 | 0.75 | 0.0 | 1.0 |
| Cinematic narration | 0.5 | 0.85 | 0.4 | 0.95 |
| NPC bark (combat shout) | 0.3 | 0.75 | 0.6 | 1.1 |
| NPC bark (calm idle) | 0.6 | 0.75 | 0.2 | 1.0 |
| Excited youth | 0.35 | 0.7 | 0.55 | 1.1 |
| Cold villain | 0.65 | 0.85 | 0.3 | 0.92 |
| Robotic | 1.0 | 0.85 | 0.0 | 1.0 |
| Scared / breathy | 0.25 | 0.7 | 0.5 | 1.05 |
Ask:
Delivery flavor? Calm idle / combat shout / cinematic / robotic / scared / cold? I'll set stability and style to match.
For a single line, the prompt is just the text. Punctuation and formatting matter:
You SHALL not pass.I... I don't know what to say.(angrily) is read as the word "angrily". Use stability/style instead.Example lines:
"Hold the line. They're coming through the south gate. Ready arrows."
"You shall not pass. Turn back."
"I... I don't know if I can do this."
"BREACH! BREACH! Fall back to the inner courtyard!"
summer_generate_voice(
text="Hold the line. They're coming through the south gate. Ready arrows.",
voiceId="21m00Tcm4TlvDq8ikWAM",
modelId="eleven_multilingual_v2",
stability=0.6,
similarity_boost=0.75,
style=0.2,
speed=1.0,
outputPath="audio/voice/hodge_hold_the_line.mp3"
)
modelId defaults to eleven_multilingual_v2. For very short / latency-sensitive in-game barks, eleven_turbo_v2_5 is faster; for highest fidelity, eleven_multilingual_v2 is the default.
Use summer_generate_audio with capability: "text_to_dialogue":
summer_generate_audio(
capability="text_to_dialogue",
speakers=[
{ name: "Hodge", voiceId: "21m00Tcm4TlvDq8ikWAM" },
{ name: "Brogan", voiceId: "<brogan-id>" }
],
script=[
{ speaker: "Hodge", text: "They're at the gate." },
{ speaker: "Brogan", text: "Aye. Let 'em come." },
{ speaker: "Hodge", text: "Steady. On my mark." }
],
outputPath="audio/voice/dialogue_gate_warning.mp3"
)
This produces a single MP3 with both voices, with dialogue-aware pacing the model handles internally. For very long scripts, split into chunks of ~6 turns and concatenate.
AudioStreamPlayer on the Voice bussummer_add_node(parentPath="/root/Game/NPC", type="AudioStreamPlayer", name="VoiceLine")
summer_set_prop(path="/root/Game/NPC/VoiceLine", property="stream", value="res://audio/voice/hodge_hold_the_line.mp3")
summer_set_prop(path="/root/Game/NPC/VoiceLine", property="bus", value="Voice")
summer_set_prop(path="/root/Game/NPC/VoiceLine", property="volume_db", value=0.0)
summer_set_prop(path="/root/Game/NPC/VoiceLine", property="autoplay", value=false)
For positional NPCs, AudioStreamPlayer3D with max_distance=15.0 so distant NPCs aren't audible.
The bible's mix rule says voice ducks music: when this player starts, fade the music bus by -6 dB. Wire this in the bus setup or in code with AudioServer.set_bus_volume_db().
After generation, append to .summer/memory/casting/voices.md so future skills find existing lines and don't regenerate. Include line text + filename + voice id.
If the assignment is marked priority: locked, do not replace the voice id automatically. If a scene, script, import config, or user prompt conflicts with the locked memory, stop and ask whether to update memory or fix the implementation.
Old gruff warlord: male, 50-70, rough/weathered, low pitch, slow
Wise mentor: male/female, 50+, warm, measured, mid-low pitch
Excited young hero: 18-28, high energy, mid-high pitch
Cold villain: 30-50, neutral/theatrical, low-mid, slow with sharp consonants
Mischievous trickster: theatrical, light, fast cadence
Calm scholar: 30-50, neutral, mid, even cadence
Scared / breathy: high stability=0.25, style=0.5
Robotic AI: stability=1.0, style=0.0
Narrator cinematic: mid-deep, neutral/theatrical, even, very stable
Narrator intimate: warmer mid pitch, lower stability for breath
(angrily) Get out! is read literally as "angrily get out". Use stability/style.text_to_dialogue so pacing is dialogue-aware.Master bus. Must be on Voice bus so it can duck music per the bible.AudioStreamPlayer.queue or chained finished signals.Caelthorne → KEL-thorn or KAYL-thorn. Or use SSML if the model supports it.similarity_boost toward 0.9; or split.eleven_multilingual_v2 supports 29 languages. Just write in the target language; the same voice id works.speed past 1.15 (artifacts)."They're at the gate!", "Enemy approaching!", "Look out!") and pick at random in code so the bark doesn't repeat.Print the call (text, voice id, params, target path). User runs via the Summer dashboard, then summer_import_from_url the result.
Line
hodge_hold_the_line.mp3wired toNPC/VoiceLineon theVoicebus. Cast bible updated. Next:
- Generate 3 bark variants of the same intent so the NPC doesn't sound like a recording.
- For the upcoming gate scene, run
/voice-linewithtext_to_dialoguefor the multi-turn exchange.- Set up music ducking on the
Musicbus whenVoiceis active (one Tween or aBusEffect Compressorsidechained toVoice).
audio/audio-direction — Voice bus and ducking rulesaudio/sound-effect — non-verbal vocal SFX (grunt, scream)audio/adaptive-music — ducking integrationnpx claudepluginhub summerengine/summer-engine-agent --plugin summerCreates single-voice audio content like audiobooks, voiceovers, narrations, jingles, and ads via TTS orchestration, background music, and FFmpeg assembly.
Generate audio content — text-to-speech, podcasts, voice cloning, sound effects, speech-to-speech, dubbing, and audio isolation. Currently powered by ElevenLabs. Works with both the Python SDK and the ElevenLabs CLI. Includes ready-to-run generator scripts that Claude writes to a temp file and executes directly. Triggers: audio, elevenlabs, text-to-speech, TTS, podcast, voice, voiceover, narration, voice clone, sound effects, dubbing, speech-to-speech, audio isolation.
Generates TTS, music, sound effects, and voice clones via ElevenLabs and fal.ai. Use when you need audio without managing multiple API keys.