Skill

obul-x402engine-audio

USE THIS SKILL WHEN: the user wants to generate speech audio from text (TTS) or transcribe audio files to text. Provides pay-per-use text-to-speech and transcription via x402engine through the Obul proxy.

Popularity

Parent forks

Invocation

How this skill is triggered — by the user, by Claude, or both

Slash command

/obul-media:x402engine-audio

User invocable

Model invocable

Inline context

Default effort

Context Preview

The summary Claude sees in its skill listing — used to decide when to auto-load this skill

x402engine provides pay-per-call text-to-speech and audio transcription endpoints. Convert text to speech using OpenAI or ElevenLabs voices, or transcribe audio files to text with speaker diarization. No API key needed — payment is handled automatically via `obulx`.

SKILL.md

110 lines · ~1.2k tokens

Stats

Parent stars0

Parent forks1

MaintenanceGood

Last CommitMar 6, 2026

Actions

View Source View Plugin View on GitHub View README

Stats

Actions

x402engine Audio

Authentication

All requests use the obulx CLI, which handles x402 payment automatically.

Common Operations

Text-to-Speech (OpenAI)

Generate speech audio from text using OpenAI's TTS models.

Pricing: $0.01

Request:

obulx -X POST -H "Content-Type: application/json" \
  -d '{"text": "Hello, this is a test of text-to-speech generation.", "voice": "alloy"}' \
  "https://x402engine.app/api/tts/openai" \
  -o output.mp3

Voices: alloy, echo, fable, onyx, nova, shimmer

Response: Returns audio data (MP3 or other format). Save to a file with -o filename.mp3.

Text-to-Speech (ElevenLabs)

Generate ultra-realistic speech using ElevenLabs voices.

Pricing: $0.02

Request:

obulx -X POST -H "Content-Type: application/json" \
  -d '{"text": "Welcome to the future of AI-generated speech.", "voice": "rachel"}' \
  "https://x402engine.app/api/tts/elevenlabs" \
  -o output.mp3

Response: Returns ultra-realistic audio data. ElevenLabs voices are more natural and expressive than OpenAI, but cost 2x more.

Audio Transcription (Deepgram Nova-3)

Transcribe audio files to text with speaker diarization.

Pricing: $0.10

Request:

obulx -X POST -H "Content-Type: multipart/form-data" \
  -F "[email protected]" \
  "https://x402engine.app/api/transcribe"

Response: JSON with transcribed text, speaker diarization labels, and timestamps.

Endpoint Pricing Reference

Endpoint	Price	Purpose
`POST /api/tts/openai`	$0.01	Text-to-speech with OpenAI voices
`POST /api/tts/elevenlabs`	$0.02	Text-to-speech with ElevenLabs voices
`POST /api/transcribe`	$0.10	Audio transcription with Deepgram

When to Use

Text-to-speech — User wants to convert text to speech or generate audio from text
Audio transcription — User wants to transcribe an audio file to text
TTS specifically — User asks for TTS, text-to-speech, or voice generation
Transcription — User asks for speech-to-text, transcription, or audio-to-text
ElevenLabs/OpenAI — User mentions ElevenLabs, OpenAI TTS, or Deepgram

Best Practices

OpenAI TTS — $0.01, reliable, 6 voices, HD quality, good for most use cases
ElevenLabs TTS — $0.02, ultra-realistic, multilingual, best voice quality
Transcription — $0.10 includes speaker diarization with labels and timestamps
Save TTS output — TTS endpoints return audio binary; always use -o filename.mp3 to save output
Multipart for transcription — For transcription, send the audio file as multipart form data

Error Handling

Error	Cause	Solution
`402 Payment Required`	Payment not processed or insufficient	Verify your obulx setup is correct and your account has sufficient balance at my.obul.ai.
`400 Bad Request`	Missing or invalid request body	Ensure `text` is present for TTS or `file` is provided for transcription.
`415 Unsupported Media`	Invalid audio format for transcription	Ensure the audio file is in a supported format.
`429 Too Many Requests`	Rate limit exceeded	Add a short delay between requests.
`500 Internal Server Error`	x402engine service issue	Wait a few seconds and retry. If persistent, the service may be experiencing downtime.

obul-x402engine-audio

Popularity

Invocation

Context Preview

SKILL.md

obul-x402engine-audio

Popularity

Invocation

Context Preview

SKILL.md

x402engine Audio

Authentication

Common Operations

Text-to-Speech (OpenAI)

Text-to-Speech (ElevenLabs)

Audio Transcription (Deepgram Nova-3)

Endpoint Pricing Reference

When to Use

Best Practices

Error Handling

Similar Skills

x402engine Audio

Authentication

Common Operations

Text-to-Speech (OpenAI)

Text-to-Speech (ElevenLabs)

Audio Transcription (Deepgram Nova-3)

Endpoint Pricing Reference

When to Use

Best Practices

Error Handling

Similar Skills