By Generovo
Three Claude Code skills for building voice agents with OpenAI's May 2026 Realtime API: conversational voicebots (gpt-realtime-2), live translation (gpt-realtime-translate), and streaming transcription (gpt-realtime-whisper). Includes runnable TypeScript and Python examples for SIP/Twilio, WhatsApp, Web/WebRTC, mobile, and meeting bots.
Use when building conversational voice agents (voicebots, callbots, vocal assistants) that need reasoning and tool calling with OpenAI's GPT-Realtime-2 model — including telephony (SIP/Twilio), WhatsApp calls, web (WebRTC), mobile, and meeting bots. Covers session config, audio formats, function calling in parallel, preambles, interruption handling, reasoning levels, and the WebSocket/WebRTC event protocol.
Use when building streaming speech-to-text with OpenAI's GPT-Realtime-Whisper model — live captions, meeting transcription, voice notes, low-latency transcripts. Covers progressive deltas, latency tuning profiles (0.4s / 0.8-1.2s / 1.5-2s), and the streaming protocol. Do NOT use for conversational agents (see voice-agent-realtime) or translation (see voice-translate-live).
Use when building live voice translation features with OpenAI's GPT-Realtime-Translate model — interpreters, multilingual calls, conferences, live video translation. Covers the dedicated `/v1/realtime/translations` endpoint, multi-speaker session architecture, language support (70+ in / 13 out). Do NOT use for conversational agents (see voice-agent-realtime).
Own this plugin?
Verify ownership to unlock analytics, metadata editing, and a verified badge. GitHub access is read-only (username + org membership).
Sign in to claimOwn this plugin?
Verify ownership to unlock analytics, metadata editing, and a verified badge. GitHub access is read-only (username + org membership).
Sign in to claimBased on adoption, maintenance, documentation, and repository signals. Not a security audit or endorsement.
Plugin Claude Code de Generovo — trois skills pour construire des voicebots avec l'API OpenAI Realtime (release du 7 mai 2026).
| Skill | Modèle OpenAI | Cas d'usage | Tarif |
|---|---|---|---|
voice-agent-realtime | gpt-realtime-2 | Voicebots, callbots, assistants vocaux avec tool calling | $32 in / $64 out par 1M tokens audio |
voice-translate-live | gpt-realtime-translate | Traduction vocale live (70+ in / 13 out) | $0.034/min |
voice-transcribe-stream | gpt-realtime-whisper | Transcription streaming basse latence | $0.017/min |
Chaque skill ship :
SKILL.md — when to use, when not, session lifecycle, connection patternsreferences/ — catalogue d'événements, pricing, latency tuningexamples/ — code TypeScript (Next.js 16 + React 19 + WebRTC) + Python (FastAPI + Twilio / WhatsApp / Recall.ai)scripts/ — utilitaires de latency et eval testés en TDDtemplates/ (voice-agent uniquement) — prompt système bilingue paramétré# Ajoute la marketplace Generovo (une seule fois)
/plugin marketplace add Generovo/claude-voice-skills
# Installe le plugin
/plugin install claude-voice-skills@generovo-voice
Les 3 skills sont automatiquement disponibles dans toutes tes sessions Claude Code. Pas besoin de redémarrer.
Tape une demande dans Claude Code :
"Construis un voicebot WhatsApp qui prend des réservations"
Claude doit annoncer qu'il charge voice-agent-realtime. Si tu demandes une traduction live ou des sous-titres, il chargera les autres skills.
/plugin update claude-voice-skills
Les exemples de code utilisent ces variables d'env (à mettre dans le .env de ton projet voicebot, jamais dans le plugin) :
# Authentification OpenAI
OPENAI_API_KEY=sk-proj-...
OPENAI_MODEL=gpt-realtime-2 # défaut
# Twilio (si tu utilises les bridges téléphoniques)
TWILIO_AUTH_TOKEN=...
PUBLIC_WS_URL=wss://<tunnel>/media
# WhatsApp Cloud API (si tu utilises whatsapp-call.py)
WHATSAPP_ACCESS_TOKEN=...
WHATSAPP_PHONE_NUMBER_ID=...
WHATSAPP_VERIFY_TOKEN=...
WHATSAPP_APP_SECRET=...
# Recall.ai (si tu utilises meeting-bot.py)
RECALL_API_KEY=...
macOS uniquement : si tu lances les scripts Python depuis une session Claude Code sur Mac, exporte :
export SSL_CERT_FILE=$(python3 -m certifi)
(Sinon SSL handshake échoue avec le Python système.)
Les 3 modèles V2 (announce 7 mai 2026) sont GA-only. NE PAS envoyer le header OpenAI-Beta: realtime=v1 — il route vers la beta API qui n'a pas V2. Les exemples du plugin sont déjà nettoyés. Détails dans docs/research/smoke-findings-2026-05-11.md.
| Tu demandes... | Skill chargé |
|---|---|
| « callbot téléphonique qui prend des RDV » | voice-agent-realtime |
| « assistant vocal WhatsApp avec function calling » | voice-agent-realtime |
| « traduction live FR↔AR pour un appel commercial » | voice-translate-live |
| « interprète automatique pour une vidéo YouTube live » | voice-translate-live |
| « sous-titres live pour un meeting Zoom » | voice-transcribe-stream |
| « transcription temps réel des appels du call center » | voice-transcribe-stream |
Les descriptions de skills sont structurées pour éviter les chevauchements. Chaque SKILL.md a une section explicite « When NOT to use » qui redirige vers les bons skills voisins.
git clone https://github.com/Generovo/claude-voice-skills.git
cd claude-voice-skills
# Validation locale
bash scripts/validate-skills.sh
# → ALL SKILLS VALID
# Unit tests
for s in voice-agent-realtime voice-translate-live voice-transcribe-stream; do
(cd skills/$s/scripts && python3 -m pytest -q)
done
# → 28 passed total
Structure du repo :
claude-voice-skills/
├── .claude-plugin/
│ ├── plugin.json # Manifest du plugin
│ └── marketplace.json # Marketplace generovo-voice
├── skills/ # Les 3 skills
│ ├── voice-agent-realtime/
│ ├── voice-translate-live/
│ └── voice-transcribe-stream/
├── scripts/
│ └── validate-skills.sh # Harness de validation
├── docs/
│ ├── plans/ # Design + plan d'implémentation
│ └── research/ # Snapshot API OpenAI 2026-05-11
└── README.md
npx claudepluginhub generovo/claude-voice-skills --plugin claude-voice-skillsMulti-model consensus engine integrating OpenAI Codex CLI, Gemini CLI, and Claude CLI for collaborative code review and problem-solving.
Ultra-compressed communication mode. Cuts ~75% of tokens while keeping full technical accuracy by speaking like a caveman.
Frontend design skill for UI/UX implementation
Comprehensive UI/UX design plugin for mobile (iOS, Android, React Native) and web applications with design systems, accessibility, and modern patterns
Memory compression system for Claude Code - persist context across sessions
Marketing skills for AI agents — conversion optimization, copywriting, SEO, paid ads, ad creative, and growth