Skill

together-audio

Text-to-speech and speech-to-text via Together AI: REST, streaming, realtime WebSocket TTS, transcription, translation, diarization, and live STT.

ai-ml

api-development

Popularity

Stars

Forks

Invocation

How this skill is triggered — by the user, by Claude, or both

Slash command

/togetherai-skills:together-audio

User invocable

Model invocable

Inline context

Default effort

Context Preview

The summary Claude sees in its skill listing — used to decide when to auto-load this skill

Use Together AI audio APIs for:

Supporting Files

agents/openai.yamlreferences/stt-models.mdreferences/tts-models.mdscripts/stt_realtime.pyscripts/stt_transcribe.pyscripts/stt_transcribe.tsscripts/tts_generate.pyscripts/tts_generate.tsscripts/tts_websocket.py

SKILL.md

85 lines · ~1.2k tokens

Stats

LanguagePython

Stars32

Forks4

MaintenanceExcellent

Last CommitJun 10, 2026

Actions

View Source View Plugin View on GitHub View README

Together Audio

Overview

Use Together AI audio APIs for:

text-to-speech generation
streaming or realtime voice output
speech-to-text transcription
translation, diarization, and timestamps
live captioning and realtime transcription

When This Skill Wins

Generate spoken audio from text
Transcribe uploaded audio files or URLs
Add realtime voice or captioning to an app
Extract speaker segments or word timings

Hand Off To Another Skill

Use together-chat-completions for text-only generation
Use together-video or together-images for visual generation workflows
Use together-dedicated-endpoints only when the audio model itself must be hosted on dedicated infrastructure

Quick Routing

REST TTS or streaming TTS
- Read references/tts-models.md
- Start with scripts/tts_generate.py or scripts/tts_generate.ts
Realtime TTS over WebSocket
- Read references/tts-models.md
- Start with scripts/tts_websocket.py
File transcription, translation, diarization, or timestamps
- Read references/stt-models.md
- Start with scripts/stt_transcribe.py or scripts/stt_transcribe.ts
Realtime STT
- Read references/stt-models.md
- Start with scripts/stt_realtime.py

Workflow

Confirm whether the task is TTS or STT.
Choose REST, streaming, or realtime transport based on latency and interaction needs.
Pick the model and response format from the relevant reference file.
Start from the matching script instead of rebuilding the request contract from memory.
For Python STT uploads, open audio files in binary mode and pass the file handle rather than a bare path string.

High-Signal Rules

Python scripts require the Together v2 SDK (together>=2.0.0). If the user is on an older version, they must upgrade first: uv pip install --upgrade "together>=2.0.0".
Use client.audio.speech.create() for TTS.
REST TTS returns a BinaryAPIResponse; call response.write_to_file(path) to save it. Do NOT use stream_to_file (it does not exist on this object).
Streaming TTS (stream=True) returns a Stream of AudioSpeechStreamChunk objects. Iterate chunks, check chunk.type, and decode base64.b64decode(chunk.delta) for audio data. There is no file-writing helper on the stream object.
Use client.audio.transcriptions.create() for transcription and client.audio.translations.create() for translation.
Batch transcription and translation share hard limits: 500 MB direct upload, 1 GB URL-fetch, 4 hours of audio per request. For larger payloads, pass a public HTTPS URL on file=; for longer audio, split into ≤ 4 h chunks. See the Limits section of references/stt-models.md.
Realtime APIs require audio-format discipline; confirm PCM expectations before streaming bytes.
Diarization and word timestamps change response shape; code for the richer verbose output explicitly.

Resource Map

TTS reference: references/tts-models.md
STT reference: references/stt-models.md
Python TTS workflow: scripts/tts_generate.py
TypeScript TTS workflow: scripts/tts_generate.ts
Python realtime TTS workflow: scripts/tts_websocket.py
Python STT workflow: scripts/stt_transcribe.py
TypeScript STT workflow: scripts/stt_transcribe.ts
Python realtime STT workflow: scripts/stt_realtime.py

together-audio

Popularity

Invocation

Context Preview

Supporting Files

SKILL.md

together-audio

Popularity

Invocation

Context Preview

Supporting Files

SKILL.md

Together Audio

Overview

When This Skill Wins

Hand Off To Another Skill

Quick Routing

Workflow

High-Signal Rules

Resource Map

Official Docs

Similar Skills

Together Audio

Overview

When This Skill Wins

Hand Off To Another Skill

Quick Routing

Workflow

High-Signal Rules

Resource Map

Official Docs

Similar Skills