From Cartesia Skills
Integrate Cartesia speech APIs (TTS, STT, voices) in application code or coding-agent workflows. Use when the user asks about Cartesia REST/WebSocket APIs, SDKs, API keys, Sonic TTS, Ink STT, voice IDs, access tokens, or embedding voice in an app. For Cartesia Line deployed agents, CLI deploy, and telephony, use the line-voice-agent skill instead.
How this skill is triggered — by the user, by Claude, or both
Slash command
/cartesia-skills:cartesia-apiThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
Cartesia provides **text-to-speech (Sonic)**, **speech-to-text (Ink)**, **voices** (library, clone, localize), and related **HTTPS** and **WebSocket** APIs. This skill covers **application integration** and **agent-assisted coding**. For **Cartesia Line** (managed voice agents, `cartesia` CLI, telephony, Line SDK), use **[line-voice-agent](../line-voice-agent/SKILL.md)**.
Cartesia provides text-to-speech (Sonic), speech-to-text (Ink), voices (library, clone, localize), and related HTTPS and WebSocket APIs. This skill covers application integration and agent-assisted coding. For Cartesia Line (managed voice agents, cartesia CLI, telephony, Line SDK), use line-voice-agent.
https://api.cartesia.ai. WebSockets use wss://
Cartesia-Version: YYYY-MM-DD). Pin one version across your services and treat bumping it like a major dependency upgrade — a new version can carry breaking changes — so it's a fixed choice, not a runtime/per-call variable.
Cartesia-Version it was built and tested against. Let it manage that — do not override it (e.g. via default_headers) with your own date, since the SDK may not work against an arbitrary version. To move to a newer API version, upgrade the SDK.Cartesia-Version yourself. Browser WebSockets can't set handshake headers, so pass it as ?cartesia_version=... (query wins if both are present).GET https://api.cartesia.ai/ returns {"ok":true,"version":"..."} (the gateway's current default), useful when wiring a new client.Authorization: Bearer <api_key> with your Cartesia API key (sk_car_...).Authorization: Bearer <access_token>
?access_token=<token> (headers are not available on WS handshake)Cartesia-Version 2026-03-01 and newer, errors are structured JSON (error_code, title, message, request_id, optional doc_url)
.md to any docs.cartesia.ai page to get the agent-readable Markdown (e.g. https://docs.cartesia.ai/api-reference/stt/transcribe.md); the bare URL is the human HTML page. Prefer the .md form when fetching docs programmaticallyllms.txt and llms-full.txtsonic-2 is no longer current)POST /tts/bytes — it streams audio back as it's generated and is simpler. Reach for the WebSocket only when the input text arrives incrementally (e.g. piping an LLM's token stream) — the continuations case. Don't default to WebSockets; compare them in TTS endpointswav + pcm_s16le @ 44.1 kHz is a safe, self-contained default; mp3 carries no separate encoding).wav carry their own header).strip(), no normalization)pip install cartesia
Replace /heads/main/ with /tags/vX.X.X/ (e.g. /tags/v3.2.0/) to source code specific to your SDK version.
npm i @cartesia/cartesia-js
Replace /heads/main/ with /tags/vX.X.X/ (e.g. /tags/v3.2.0/) to source code specific to your SDK version.
| Goal | Path |
|---|---|
| App or backend calling REST/WebSocket | Python SDK, JS/TS SDK, or native API requests / fetch |
| IDE agent with MCP | cartesia-mcp + docs fallback |
| Deployed voice agent, Line, telephony | line-voice-agent |
| OpenClaw bootstrap | https://cartesia.sh/openclaw.md then docs / llms.txt |
Shape only: confirm field names and enums against the API reference for your Cartesia-Version:
curl -X POST "https://api.cartesia.ai/tts/bytes" \
-H "Authorization: Bearer $CARTESIA_API_KEY" \
-H "Cartesia-Version: 2026-03-01" \
-H "Content-Type: application/json" \
-d '{
"model_id": "sonic-3",
"transcript": "Hello from Cartesia.",
"voice": { "id": "f786b574-daa5-4673-aa0c-cbe3e8534c02" },
"output_format": {
"container": "wav",
"encoding": "pcm_s16le",
"sample_rate": 44100
},
"language": "en"
}' \
--output /tmp/out.wav
429-class and structured concurrency_limited / quota_exceeded per API errors.Cartesia-Version: with an SDK, don't set or override the version — it pins one it's tested against; upgrade the SDK to move versions. Only raw HTTP/WS callers send the date themselves, and then keep one date everywhere (wrong date → subtle breakage or legacy error shapes).Authorization: Bearer; match current docs, not old snippets.sonic-2), and the JS SDK's named import { CartesiaClient } is deprecated and won't run in browsers — use import Cartesia from "@cartesia/cartesia-js" (or the default import). Confirm models against the models docs and SDK usage against the linked README / examples.cartesia deploy / Line: switch to line-voice-agent.Creates, edits, and optimizes skills for Claude Code, including drafting, evaluating with test prompts, iterating on performance, and improving skill descriptions for better triggering accuracy.
npx claudepluginhub cartesia-ai/skills --plugin cartesia-skills