From togetherai-skills
Text-to-speech and speech-to-text via Together AI: REST, streaming, realtime WebSocket TTS, transcription, translation, diarization, and live STT.
How this skill is triggered — by the user, by Claude, or both
Slash command
/togetherai-skills:together-audioThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
Use Together AI audio APIs for:
Use Together AI audio APIs for:
together-chat-completions for text-only generationtogether-video or together-images for visual generation workflowstogether-dedicated-endpoints only when the audio model itself must be hosted on dedicated infrastructuretogether>=2.0.0). If the user is on an older version, they must upgrade first: uv pip install --upgrade "together>=2.0.0".client.audio.speech.create() for TTS.BinaryAPIResponse; call response.write_to_file(path) to save it. Do NOT use stream_to_file (it does not exist on this object).stream=True) returns a Stream of AudioSpeechStreamChunk objects. Iterate chunks, check chunk.type, and decode base64.b64decode(chunk.delta) for audio data. There is no file-writing helper on the stream object.client.audio.transcriptions.create() for transcription and client.audio.translations.create() for translation.file=; for longer audio, split into ≤ 4 h chunks. See the Limits section of references/stt-models.md.npx claudepluginhub togethercomputer/skills --plugin togetherai-skillsImplements ElevenLabs TTS with voice settings, instant voice cloning from audio samples, and WebSocket streaming. For building voice generation features.
Handles fal.ai STT/TTS: Whisper transcription/translation with timestamps, voice cloning via F5-TTS/XTTS/ElevenLabs, Kokoro multi-lang TTS, SRT subtitles. Provides endpoints, params, TS/Python code.
ElevenLabs Speech-to-Text transcription workflows with Scribe v1 supporting 99 languages, speaker diarization, and Vercel AI SDK integration. Use when implementing audio transcription, building STT features, integrating speech-to-text, setting up Vercel AI SDK with ElevenLabs, or when user mentions transcription, STT, Scribe v1, audio-to-text, speaker diarization, or multi-language transcription.