From agent
Sets up and tests text-to-speech and transcription backends (sag, OpenAI TTS, macOS say, Whisper). Run `/agent:voice` for status, setup, or test.
How this skill is triggered — by the user, by Claude, or both
Slash command
/agent:voice status|setup|teststatus|setup|testThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
Set up and test the agent's voice backends. Voice is OPTIONAL — off by default. See `docs/voice.md` for the full reference including channel-plugin precedence.
Set up and test the agent's voice backends. Voice is OPTIONAL — off by default. See docs/voice.md for the full reference including channel-plugin precedence.
| User says | Action |
|---|---|
/agent:voice (no arg) or /agent:voice status | Call voice_status and print the card |
/agent:voice setup | Guided setup (see flow below) |
/agent:voice test | Call voice_speak({ text: "Hola, soy <name>. Esta es una prueba.", ... }) using the user's language, report the path |
The agent invokes voice_speak / voice_transcribe directly when it needs to produce or consume audio during regular conversation. This skill is only for setup and diagnostics.
voice_status({ format: "json" }) to see current state.brew install steipete/tap/sag. It's a small wrapper around ElevenLabs with good voice-prompting conventions."export OPENAI_API_KEY=... in your shell rc (~/.zshrc or ~/.bashrc). Then restart the agent."say: "Built in on macOS. Sounds robotic but zero setup. No action needed — will be used as fallback."ELEVENLABS_API_KEY is missing, instruct: "Get a key from https://elevenlabs.io. Add export ELEVENLABS_API_KEY=sk_... to your shell rc. Restart the agent."whisper-cli (brew install whisper-cpp, offline, free) or OpenAI Whisper API (same OPENAI_API_KEY).voice.enabled: true in your config. Run agent_config(action='set', key='voice.enabled', value='true') or edit agent-config.json directly."agent_config for them after they confirm.)sag skill is in an OpenClaw workspace (~/.openclaw/workspace*/skills/sag/), offer: "I see you have the sag skill in an OpenClaw workspace. Want me to install it into this agent? Run /agent:skill install <that path>."voice_status reports whatsapp.audioEnabled: true — "Your WhatsApp plugin already transcribes voice notes locally. For inbound WhatsApp audio you don't need voice_transcribe. Setting this up is for WebChat uploads, iMessage audio, outbound voice notes, etc."false — "Your WhatsApp plugin doesn't transcribe by default. Either turn that on with /whatsapp:configure audio (local Whisper, free), or use our voice_transcribe per message."voice_status to confirm voice is enabled AND a backend is available. If not, redirect to setup.voice_speak({ text: "<greeting in user's language>" })./tmp/... using backend X. Play it or attach it in a messaging channel."MEDIA:/tmp/... or a dedicated send_media tool).Never put API keys in agent-config.json. They are SECRETS and the config file may end up in a git repo. Use environment variables:
export ELEVENLABS_API_KEY=sk_...
export OPENAI_API_KEY=sk-...
Add them to ~/.zshrc / ~/.bashrc / ~/.config/fish/config.fish for persistence.
Non-secret settings (default backend, voice ID, output dir) go in agent-config.json.
A channel plugin that already transcribes audio (like the WhatsApp plugin with audio on) is authoritative for THAT channel. Do not call voice_transcribe on an audio file that arrived through such a plugin — you'd just be re-doing work the plugin already did, and you'd end up with two different transcriptions.
voice_transcribe is for:
/agent:voice status on CLI: full card./agent:voice setup: step-by-step, one instruction at a time. Don't dump all 7 steps — guide the user through what's missing for their situation.agent-config.json — they're env-only.brew install — always let the user do it.voice_transcribe if the plugin has audio-on.docs/voice.md — full doc (backends, precedence, secrets, troubleshooting)lib/voice.ts — routing + backendsskills/skill-manager/SKILL.md — for installing the sag skill from OpenClaw pathnpx claudepluginhub crisandrews/clawcode --plugin agentBuilds real-time voice AI applications and agents using OpenAI Realtime API, Vapi, Deepgram for transcription, ElevenLabs for synthesis, LiveKit, and WebRTC fundamentals. Optimizes latency and audio quality.
Builds ElevenLabs conversational AI voice agents: configure via CLI/dashboard, add tools/knowledge, integrate React/React Native/Swift/JS SDKs, test/deploy. For voice AI, phone systems, or ElevenLabs errors.
Enables voice conversations with Claude Code using speech-to-text and text-to-speech. Includes setup, diagnostics, and MCP-based voice interaction.