From oatda
Use when the user wants to transcribe audio to text using OATDA's unified audio API. Supports speech-to-text (STT), meetings, podcasts, voice notes, Whisper-style transcription, and the transcribe_audio MCP capability.
How this skill is triggered — by the user, by Claude, or both
Slash command
/oatda:oatda-transcribe-audioThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
Transcribe audio files to text through OATDA's unified audio API.
Transcribe audio files to text through OATDA's unified audio API.
Use this skill when the user wants to:
transcribe_audio capabilityThe user needs an OATDA API key. Check in this order:
$OATDA_API_KEY environment variable~/.oatda/credentials.json config fileIf neither exists, tell the user:
You need an OATDA API key. Get one at https://oatda.com, then set it:
export OATDA_API_KEY=your_key_here
# Check env var first; if empty, auto-load from credentials file
if [[ -z "$OATDA_API_KEY" ]]; then
export OATDA_API_KEY=$(cat ~/.oatda/credentials.json 2>/dev/null | jq -r '.profiles[.defaultProfile].apiKey' 2>/dev/null)
fi
# Verify key exists (show first 8 chars only)
echo "${OATDA_API_KEY:0:8}"
If the output is empty or null, stop and ask the user to configure their API key.
IMPORTANT:
curl commands must run in the same shell session. Each separate bash/terminal invocation starts with an isolated environment where previously exported variables are lost. Either run all commands in one session, or chain them.Map common aliases:
| User says | Provider | Model |
|---|---|---|
| whisper, whisper-1, openai whisper (default) | openai | whisper-1 |
| transcription, speech to text, stt | openai | whisper-1 |
Default: openai / whisper-1 if no model is specified.
If the user provides provider/model format directly (e.g., openai/whisper-1), split on / to get separate provider and model values.
The endpoint supports:
multipart/form-data with a local file uploadfilefile_base64 for providers that support direct base64 payloadsMaximum audio file size is 25MB.
For local files, prefer multipart upload because it avoids manually building large JSON bodies.
curl -s -X GET "https://oatda.com/api/v1/llm/models?type=audio" \
-H "Authorization: Bearer $OATDA_API_KEY" | jq '.audio_models[] | {id, supported_params}'
Use supported_params to confirm whether the model supports transcription and optional fields such as timestamps or diarization.
curl -s -X POST "https://oatda.com/api/v1/llm/transcriptions" \
-H "Authorization: Bearer $OATDA_API_KEY" \
-F "provider=<PROVIDER>" \
-F "model=<MODEL>" \
-F "file=@<AUDIO_FILE>" \
-F "response_format=json"
Replace <PROVIDER>, <MODEL>, and <AUDIO_FILE> with actual values.
AUDIO_DATA_URL="data:audio/mpeg;base64,$(base64 -w 0 audio.mp3)"
curl -s -X POST "https://oatda.com/api/v1/llm/transcriptions" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $OATDA_API_KEY" \
-d "$(jq -n \
--arg provider "<PROVIDER>" \
--arg model "<MODEL>" \
--arg file "$AUDIO_DATA_URL" \
'{provider: $provider, model: $model, file: $file, response_format: "json"}')"
language: ISO-639-1 language code, e.g. en, de, frprompt: Context for names, acronyms, or domain-specific termsresponse_format: json, text, srt, verbose_json, vtt, or diarized_jsontemperature: 0 to 1timestamp_granularities: word and/or segmentchunking_strategy: autohotwords: Provider-specific keyword hintsstream: true for streaming transcription if supportedThe API returns JSON like:
{
"text": "The transcribed text...",
"language": "en",
"duration": 42.5,
"segments": [],
"words": [],
"costs": {
"inputCost": 0,
"outputCost": 0.0001,
"totalCost": 0.0001,
"currency": "USD"
},
"metadata": {
"provider": "openai",
"model": "whisper-1",
"latency": 1200
}
}
Present the text field to the user. Include segments, words, or subtitles if the user requested a timestamped format.
| HTTP Status | Meaning | Action |
|---|---|---|
| 401 | Invalid API key | Tell user to check their key |
| 402 | Insufficient credits | Tell user to check balance |
| 400 | Bad request / model not supported | Check model format, file format, and use /oatda:oatda-list-models with type=audio |
| 413 | File too large | Keep audio under 25MB or split it |
| 429 | Rate limited or monthly cap | Wait briefly and retry once |
User asks: "Transcribe this recording with Whisper"
curl -s -X POST "https://oatda.com/api/v1/llm/transcriptions" \
-H "Authorization: Bearer $OATDA_API_KEY" \
-F "provider=openai" \
-F "model=whisper-1" \
-F "[email protected]" \
-F "response_format=json"
/api/v1/llm/transcriptions.response_format=srt or vtt when the user wants subtitles.language to improve recognition for known source-language audio.transcribe_audio./oatda:oatda-generate-speech, /oatda:oatda-translate-audio, /oatda:oatda-list-models.Creates, edits, and optimizes skills for Claude Code, including drafting, evaluating with test prompts, iterating on performance, and improving skill descriptions for better triggering accuracy.
npx claudepluginhub devcsde/oatda-skills --plugin oatda