From obul-media
USE THIS SKILL WHEN: the user wants to generate speech audio from text (TTS) or transcribe audio files to text. Provides pay-per-use text-to-speech and transcription via x402engine through the Obul proxy.
How this skill is triggered — by the user, by Claude, or both
Slash command
/obul-media:x402engine-audioThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
x402engine provides pay-per-call text-to-speech and audio transcription endpoints. Convert text to speech using OpenAI or ElevenLabs voices, or transcribe audio files to text with speaker diarization. No API key needed — payment is handled automatically via `obulx`.
x402engine provides pay-per-call text-to-speech and audio transcription endpoints. Convert text to speech using OpenAI or ElevenLabs voices, or transcribe audio files to text with speaker diarization. No API key needed — payment is handled automatically via obulx.
All requests use the obulx CLI, which handles x402 payment automatically.
Generate speech audio from text using OpenAI's TTS models.
Pricing: $0.01
Request:
obulx -X POST -H "Content-Type: application/json" \
-d '{"text": "Hello, this is a test of text-to-speech generation.", "voice": "alloy"}' \
"https://x402engine.app/api/tts/openai" \
-o output.mp3
Voices: alloy, echo, fable, onyx, nova, shimmer
Response: Returns audio data (MP3 or other format). Save to a file with -o filename.mp3.
Generate ultra-realistic speech using ElevenLabs voices.
Pricing: $0.02
Request:
obulx -X POST -H "Content-Type: application/json" \
-d '{"text": "Welcome to the future of AI-generated speech.", "voice": "rachel"}' \
"https://x402engine.app/api/tts/elevenlabs" \
-o output.mp3
Response: Returns ultra-realistic audio data. ElevenLabs voices are more natural and expressive than OpenAI, but cost 2x more.
Transcribe audio files to text with speaker diarization.
Pricing: $0.10
Request:
obulx -X POST -H "Content-Type: multipart/form-data" \
-F "[email protected]" \
"https://x402engine.app/api/transcribe"
Response: JSON with transcribed text, speaker diarization labels, and timestamps.
| Endpoint | Price | Purpose |
|---|---|---|
POST /api/tts/openai | $0.01 | Text-to-speech with OpenAI voices |
POST /api/tts/elevenlabs | $0.02 | Text-to-speech with ElevenLabs voices |
POST /api/transcribe | $0.10 | Audio transcription with Deepgram |
-o filename.mp3 to save output| Error | Cause | Solution |
|---|---|---|
402 Payment Required | Payment not processed or insufficient | Verify your obulx setup is correct and your account has sufficient balance at my.obul.ai. |
400 Bad Request | Missing or invalid request body | Ensure text is present for TTS or file is provided for transcription. |
415 Unsupported Media | Invalid audio format for transcription | Ensure the audio file is in a supported format. |
429 Too Many Requests | Rate limit exceeded | Add a short delay between requests. |
500 Internal Server Error | x402engine service issue | Wait a few seconds and retry. If persistent, the service may be experiencing downtime. |
npx claudepluginhub polymerdao/pay-plugin --plugin obul-mediaGenerate audio content — text-to-speech, podcasts, voice cloning, sound effects, speech-to-speech, dubbing, and audio isolation. Currently powered by ElevenLabs. Works with both the Python SDK and the ElevenLabs CLI. Includes ready-to-run generator scripts that Claude writes to a temp file and executes directly. Triggers: audio, elevenlabs, text-to-speech, TTS, podcast, voice, voiceover, narration, voice clone, sound effects, dubbing, speech-to-speech, audio isolation.
Implements ElevenLabs APIs for speech-to-speech voice conversion, text-to-sound-effects, audio noise removal, and transcription. For voice changing, SFX generation, or audio cleanup tasks.
Handles fal.ai STT/TTS: Whisper transcription/translation with timestamps, voice cloning via F5-TTS/XTTS/ElevenLabs, Kokoro multi-lang TTS, SRT subtitles. Provides endpoints, params, TS/Python code.