From voice-skill
Voice message handling — transcribe audio to text (OpenAI Whisper or Groq Whisper) and reply with voice (OpenAI TTS or Groq PlayAI TTS). MUST use this skill whenever: a Telegram voice message arrives (attachment_kind='voice'), a Live Web Chat voice message arrives, the user asks to reply with voice/audio, the user says 'voice mode', 'answer by voice', or anything about voice messages, dictation, or audio transcription.
How this skill is triggered — by the user, by Claude, or both
Slash command
/voice-skill:voiceThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
Handle voice messages from Telegram / Live Web Chat and generate voice replies. Backed by OpenAI by default; can be switched to Groq.
Handle voice messages from Telegram / Live Web Chat and generate voice replies. Backed by OpenAI by default; can be switched to Groq.
Requires OPENAI_API_KEY environment variable (or GROQ_API_KEY if using Groq).
Both scripts respect the VOICE_PROVIDER env var:
| Provider | Transcription model | TTS model | Best for |
|---|---|---|---|
openai (default) | whisper-1 | tts-1 / tts-1-hd | Ukrainian voice replies (Nova/Onyx speak Ukrainian fluently); broad language coverage in Whisper. |
groq | whisper-large-v3-turbo (faster + cheaper than OpenAI Whisper) | playai-tts (English-only voices) / playai-tts-arabic | Fast transcription; English/Arabic TTS only — do NOT use Groq TTS for Ukrainian replies, the PlayAI voices do not speak it. |
Switch via env:
export VOICE_PROVIDER=groq
export GROQ_API_KEY=gsk_...
Optional overrides: VOICE_TRANSCRIBE_MODEL, VOICE_TTS_MODEL.
Default policy: leave VOICE_PROVIDER unset (OpenAI). Switch to groq only when the user explicitly asks for it, or for English/Arabic content where Groq's speed/cost wins.
This skill ships two Python scripts (transcribe.py, tts.py) in its scripts/ directory. Depending on how the skill was installed, they live at one of:
refactor-ua/Voice-Skill marketplace): ~/.claude/plugins/cache/refactor-ua/voice-skill/<version>/skills/voice/scripts/~/.claude/skills/voice/scripts/Use whichever path exists on the current machine. A quick way to locate it:
SCRIPTS_DIR="$(dirname "$(find ~/.claude -path '*/voice/scripts/transcribe.py' 2>/dev/null | head -1)")"
In the examples below, <scripts> stands for that resolved directory.
The reply path depends on where the message came from. Check the <channel source="..."> tag on the incoming message:
| Source | Voice reply tool | Why |
|---|---|---|
telegram | reply with files: [mp3] | Telegram natively renders MP3 as a voice message bubble. |
live-web-chat | speak (server-side TTS) | The browser UI plays audio WebSocket frames automatically via enqueueAudio. Sending a file via reply only produces a 📄 download link that the user must click. |
Never use reply + files: [mp3] on Live Web Chat for voice output. It does not auto-play in the browser.
When you receive a message with attachment_kind: "voice" and attachment_file_id:
download_attachment with the file_idpython <scripts>/transcribe.py "<downloaded_file_path>"
Returns JSON: {"text": "...", "language": "uk", "duration": 5.2}Live Web Chat transcribes incoming voice server-side and delivers it as text, so no client-side transcription is needed there.
Compose your text response (keep it concise for listening). Write in feminine form (жіночий рід) — the voice persona is female (Nova). Use "я зробила", "я перевірила", "готова" etc.
speakCall mcp__plugin_live_web_chat_live_web_chat__speak directly with the text. The Live Web Chat server handles TTS and streams audio to the browser for immediate playback. Do NOT pre-generate MP3 with tts.py and do NOT attach it via reply.
speak(text="<your response>", voice="nova")
Notes:
nova. Other options: alloy, echo, fable, onyx, shimmer.reply with the same text — but the audio comes from speak.reply — if the user has it on, plain reply already gets voiced and speak is redundant.replypython <scripts>/tts.py "<your_text>" "nova"
nova (OpenAI) — always use unless user explicitly requests anotheralloy, ash, coral, echo, fable, onyx, sage, shimmerVOICE_PROVIDER=groq is exported, English/Arabic only): Arista-PlayAI (default), Atlas-PlayAI, Basil-PlayAI, Briggs-PlayAI, Calum-PlayAI, Celeste-PlayAI, Cheyenne-PlayAI, Chip-PlayAI, Cillian-PlayAI, Deedee-PlayAI, Fritz-PlayAI, Gail-PlayAI, Indigo-PlayAI, Mamaw-PlayAI, Mason-PlayAI, Mikail-PlayAI, Mitch-PlayAI, Quinn-PlayAI, Thunder-PlayAIC:\Users\...\AppData\Local\Temp\tts_nova_*.mp3)tts-1 default for OpenAI, playai-tts default for Groq)speed)reply tool with files: ["<output_path>"]Generate the same phrase with all 10 voices for the user to compare:
for voice in alloy ash coral echo fable nova onyx sage shimmer; do
python <scripts>/tts.py "<text>" "$voice"
done
Then send all files so the user can pick their favorite.
uk as 2nd arg to transcribe.py.oga (Opus in OGG) — Whisper handles this nativelyOPENAI_API_KEY is not set, scripts fail with a clear errorCreates, edits, and optimizes skills for Claude Code, including drafting, evaluating with test prompts, iterating on performance, and improving skill descriptions for better triggering accuracy.
npx claudepluginhub refactor-ua/voice-skill --plugin voice-skill