From togetherai-skills
Handles text-to-speech via REST, streaming, realtime WebSocket and speech-to-text including transcription, translation, diarization, timestamps, live STT using Together AI APIs.
How this skill is triggered — by the user, by Claude, or both
Slash command
/togetherai-skills:together-audioThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
Use Together AI audio APIs for:
Use Together AI audio APIs for:
together-chat-completions for text-only generationtogether-video or together-images for visual generation workflowstogether-dedicated-endpoints only when the audio model itself must be hosted on dedicated infrastructuretogether>=2.0.0). If the user is on an older version, they must upgrade first: uv pip install --upgrade "together>=2.0.0".client.audio.speech.create() for TTS.BinaryAPIResponse; call response.write_to_file(path) to save it. Do NOT use stream_to_file (it does not exist on this object).stream=True) returns a Stream of AudioSpeechStreamChunk objects. Iterate chunks, check chunk.type, and decode base64.b64decode(chunk.delta) for audio data. There is no file-writing helper on the stream object.client.audio.transcriptions.create() for transcription and client.audio.translations.create() for translation.npx claudepluginhub zainhas/skillsText-to-speech and speech-to-text via Together AI: REST, streaming, realtime WebSocket TTS, transcription, translation, diarization, and live STT.
Implements ElevenLabs TTS with voice settings, instant voice cloning from audio samples, and WebSocket streaming. For building voice generation features.
ElevenLabs Speech-to-Text transcription workflows with Scribe v1 supporting 99 languages, speaker diarization, and Vercel AI SDK integration. Use when implementing audio transcription, building STT features, integrating speech-to-text, setting up Vercel AI SDK with ElevenLabs, or when user mentions transcription, STT, Scribe v1, audio-to-text, speaker diarization, or multi-language transcription.