Skill

voice

Voice message handling — transcribe audio to text (OpenAI Whisper or Groq Whisper) and reply with voice (OpenAI TTS or Groq PlayAI TTS). MUST use this skill whenever: a Telegram voice message arrives (attachment_kind='voice'), a Live Web Chat voice message arrives, the user asks to reply with voice/audio, the user says 'voice mode', 'answer by voice', or anything about voice messages, dictation, or audio transcription.

Invocation

How this skill is triggered — by the user, by Claude, or both

Slash command

/voice-skill:voice

User invocable

Model invocable

Inline context

Default effort

Context Preview

The summary Claude sees in its skill listing — used to decide when to auto-load this skill

Handle voice messages from Telegram / Live Web Chat and generate voice replies. Backed by OpenAI by default; can be switched to Groq.

Supporting Files

scripts/transcribe.pyscripts/tts.py

SKILL.md

126 lines · ~1.7k tokens

Stats

LanguagePython

Stars0

MaintenanceGood

Last CommitMay 11, 2026

Actions

View Source View Plugin View on GitHub View README

Stats

Actions

Voice Messages: Transcribe & Speak

Handle voice messages from Telegram / Live Web Chat and generate voice replies. Backed by OpenAI by default; can be switched to Groq.

Requires OPENAI_API_KEY environment variable (or GROQ_API_KEY if using Groq).

Provider selection

Both scripts respect the VOICE_PROVIDER env var:

Provider	Transcription model	TTS model	Best for
`openai` (default)	`whisper-1`	`tts-1` / `tts-1-hd`	Ukrainian voice replies (Nova/Onyx speak Ukrainian fluently); broad language coverage in Whisper.
`groq`	`whisper-large-v3-turbo` (faster + cheaper than OpenAI Whisper)	`playai-tts` (English-only voices) / `playai-tts-arabic`	Fast transcription; English/Arabic TTS only — do NOT use Groq TTS for Ukrainian replies, the PlayAI voices do not speak it.

Switch via env:

export VOICE_PROVIDER=groq
export GROQ_API_KEY=gsk_...

Optional overrides: VOICE_TRANSCRIBE_MODEL, VOICE_TTS_MODEL.

Default policy: leave VOICE_PROVIDER unset (OpenAI). Switch to groq only when the user explicitly asks for it, or for English/Arabic content where Groq's speed/cost wins.

Locating the scripts

This skill ships two Python scripts (transcribe.py, tts.py) in its scripts/ directory. Depending on how the skill was installed, they live at one of:

Plugin install (via refactor-ua/Voice-Skill marketplace): ~/.claude/plugins/cache/refactor-ua/voice-skill/<version>/skills/voice/scripts/
User-scope (manual install): ~/.claude/skills/voice/scripts/

Use whichever path exists on the current machine. A quick way to locate it:

SCRIPTS_DIR="$(dirname "$(find ~/.claude -path '*/voice/scripts/transcribe.py' 2>/dev/null | head -1)")"

In the examples below, <scripts> stands for that resolved directory.

Choosing the right channel

The reply path depends on where the message came from. Check the <channel source="..."> tag on the incoming message:

Source	Voice reply tool	Why
`telegram`	`reply` with `files: [mp3]`	Telegram natively renders MP3 as a voice message bubble.
`live-web-chat`	`speak` (server-side TTS)	The browser UI plays `audio` WebSocket frames automatically via `enqueueAudio`. Sending a file via `reply` only produces a 📄 download link that the user must click.

Never use reply + files: [mp3] on Live Web Chat for voice output. It does not auto-play in the browser.

When a voice message arrives (Telegram)

When you receive a message with attachment_kind: "voice" and attachment_file_id:

Download the audio file using download_attachment with the file_id
Transcribe by running:
```
python <scripts>/transcribe.py "<downloaded_file_path>"
```
Returns JSON: {"text": "...", "language": "uk", "duration": 5.2}
The transcribed text IS the user's message — process it as their request/command directly

Live Web Chat transcribes incoming voice server-side and delivers it as text, so no client-side transcription is needed there.

Replying with voice

Compose your text response (keep it concise for listening). Write in feminine form (жіночий рід) — the voice persona is female (Nova). Use "я зробила", "я перевірила", "готова" etc.

Live Web Chat — use `speak`

Call mcp__plugin_live_web_chat_live_web_chat__speak directly with the text. The Live Web Chat server handles TTS and streams audio to the browser for immediate playback. Do NOT pre-generate MP3 with tts.py and do NOT attach it via reply.

speak(text="<your response>", voice="nova")

Notes:

Default voice: nova. Other options: alloy, echo, fable, onyx, shimmer.
If the user wants text alongside, also send a reply with the same text — but the audio comes from speak.
First audio on a page load may be blocked by browser autoplay policy; the UI shows a "🔇 Натисни, щоб увімкнути озвучку" banner, and the first user click unlocks playback for the session.
The Live Web Chat Speak replies toggle (in Settings) auto-TTSs any plain reply — if the user has it on, plain reply already gets voiced and speak is redundant.

Telegram — generate MP3, attach via `reply`

Generate audio (do NOT pass output_path — let tts.py save to system temp automatically):
```
python <scripts>/tts.py "<your_text>" "nova"
```
- Default and primary voice: nova (OpenAI) — always use unless user explicitly requests another
- Other OpenAI voices if requested: alloy, ash, coral, echo, fable, onyx, sage, shimmer
- Groq voices (only when VOICE_PROVIDER=groq is exported, English/Arabic only): Arista-PlayAI (default), Atlas-PlayAI, Basil-PlayAI, Briggs-PlayAI, Calum-PlayAI, Celeste-PlayAI, Cheyenne-PlayAI, Chip-PlayAI, Cillian-PlayAI, Deedee-PlayAI, Fritz-PlayAI, Gail-PlayAI, Indigo-PlayAI, Mamaw-PlayAI, Mason-PlayAI, Mikail-PlayAI, Mitch-PlayAI, Quinn-PlayAI, Thunder-PlayAI
- Output: auto-saved to system temp dir (e.g. C:\Users\...\AppData\Local\Temp\tts_nova_*.mp3)
- NEVER save audio files inside the project directory — always use system temp
- Optional 3rd arg: output_path (only use for non-project locations)
- Optional 4th arg: model (tts-1 default for OpenAI, playai-tts default for Groq)
- Optional 5th arg: speed (0.25–4.0, default 1.0; Groq PlayAI ignores speed)
Send audio via Telegram reply tool with files: ["<output_path>"]
Always include text alongside audio for accessibility

Voice comparison

Generate the same phrase with all 10 voices for the user to compare:

for voice in alloy ash coral echo fable nova onyx sage shimmer; do
  python <scripts>/tts.py "<text>" "$voice"
done

Then send all files so the user can pick their favorite.

Tips

Whisper auto-detects language well. For poor accuracy with Ukrainian, pass uk as 2nd arg to transcribe.py
Keep TTS text under ~4000 chars. For longer responses, split into chunks
Telegram voice messages are .oga (Opus in OGG) — Whisper handles this natively
If OPENAI_API_KEY is not set, scripts fail with a clear error

voice

Invocation

Context Preview

Supporting Files

SKILL.md

voice

Invocation

Context Preview

Supporting Files

SKILL.md

Voice Messages: Transcribe & Speak

Provider selection

Locating the scripts

Choosing the right channel

When a voice message arrives (Telegram)

Replying with voice

Live Web Chat — use `speak`

Telegram — generate MP3, attach via `reply`

Voice comparison

Tips

Similar Skills

Voice Messages: Transcribe & Speak

Provider selection

Locating the scripts

Choosing the right channel

When a voice message arrives (Telegram)

Replying with voice

Live Web Chat — use `speak`

Telegram — generate MP3, attach via `reply`

Voice comparison

Tips

Similar Skills

voice

Invocation

Context Preview

Supporting Files

SKILL.md

voice

Invocation

Context Preview

Supporting Files

SKILL.md

Voice Messages: Transcribe & Speak

Provider selection

Locating the scripts

Choosing the right channel

When a voice message arrives (Telegram)

Replying with voice

Live Web Chat — use speak

Telegram — generate MP3, attach via reply

Voice comparison

Tips

Similar Skills

Voice Messages: Transcribe & Speak

Provider selection

Locating the scripts

Choosing the right channel

When a voice message arrives (Telegram)

Replying with voice

Live Web Chat — use speak

Telegram — generate MP3, attach via reply

Voice comparison

Tips

Similar Skills

Live Web Chat — use `speak`

Telegram — generate MP3, attach via `reply`

Live Web Chat — use `speak`

Telegram — generate MP3, attach via `reply`