From obul-media
USE THIS SKILL WHEN: the user wants to transcribe audio or speech to text in real-time. Provides production-grade speech-to-text with dual-engine architecture, 99+ languages, VAD, noise reduction, and hallucination filtering via dTelecom through the Obul proxy.
How this skill is triggered — by the user, by Claude, or both
Slash command
/obul-media:dtelecomThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
Production-grade real-time speech-to-text for AI agents via dTelecom's x402-enabled STT service. Features a
Production-grade real-time speech-to-text for AI agents via dTelecom's x402-enabled STT service. Features a dual-engine architecture (Parakeet-TDT for 25 European languages at 3-4x speed, Whisper for 99+ languages) with smart routing, voice activity detection, neural noise reduction, speech validation, and hallucination filtering. Through the Obul proxy, each session is paid per minute at $0.005/min — no dTelecom account or API key required.
All requests use the obulx CLI, which handles proxy routing and authentication automatically.
Install and log in (one-time setup):
npm install -g @obul.ai/obulx
obulx login
Base URL: https://x402stt.dtelecom.org
Create and pay for a new STT session. Specify the number of minutes and target language. The response includes a session key and WebSocket URL for streaming audio.
Pricing: $0.005 per minute (minimum 5 minutes = $0.025, maximum 120 minutes = $0.60)
obulx -X POST -H "Content-Type: application/json" \
-d '{"minutes": 5, "language": "en"}' \
"https://x402stt.dtelecom.org/v1/session"
Response:
{
"session_id": "abc-123-uuid",
"session_key": "eyJ...",
"ws_url": "wss://x402stt.dtelecom.org/v1/stream",
"remaining_seconds": 300,
"minutes": 5,
"price_usd": "0.025000"
}
Add more time to an active session without interrupting the audio stream.
Pricing: $0.005 per minute added
obulx -X POST -H "Content-Type: application/json" \
-d '{"session_id": "abc-123-uuid", "minutes": 5}' \
"https://x402stt.dtelecom.org/v1/session/extend"
Response:
{
"session_id": "abc-123-uuid",
"remaining_seconds": 600,
"minutes_added": 5,
"price_usd": "0.025000"
}
Check the remaining time and usage of an active session. This endpoint is free.
Pricing: $0.00
obulx "https://x402stt.dtelecom.org/v1/session/{session_id}/status"
Response:
{
"session_id": "abc-123-uuid",
"status": "connected",
"remaining_seconds": 245.3,
"used_seconds": 54.7,
"balance_seconds": 300,
"language": "en"
}
After creating a session, connect to the WebSocket URL with the session key. Send audio as PCM16 (16-bit little-endian, 16kHz, mono) in 20ms chunks (640 bytes). The server returns real-time transcription results.
Pricing: Included in session cost
WebSocket handshake (client sends):
{
"type": "config",
"language": "en",
"session_key": "eyJ..."
}
Server ready response:
{
"type": "ready",
"remaining_seconds": 300
}
Transcription response (server sends for each utterance):
{
"type": "transcription",
"text": "Hello, how are you?",
"start": 1.5,
"end": 2.8,
"confidence": 0.95,
"is_final": true
}
Verify the STT service is available. No payment required.
Pricing: $0.00
obulx "https://x402stt.dtelecom.org/health"
| Endpoint | Price | Purpose |
|---|---|---|
POST /v1/session | $0.005/min | Create and pay for a new STT session |
POST /v1/session/extend | $0.005/min | Add time to an active session |
GET /v1/session/{id}/status | $0.00 | Check remaining session time and usage |
WS /v1/stream | Included | Stream audio and receive transcriptions |
GET /health | $0.00 | Service availability check |
GET /pricing | $0.00 | Retrieve current rate information |
session_expiring messages at 60 and 10 seconds remaining.
Use these to trigger automatic session extension.| Error | Cause | Solution |
|---|---|---|
402 Payment Required | x402 payment not processed | Verify your account has sufficient balance at my.obul.ai. Run obulx login if not authenticated. |
400 Bad Request | Missing or invalid parameters | Ensure minutes (5-120) and language are provided and correctly typed. |
404 Not Found | Session ID does not exist | Verify the session ID is correct. Sessions expire after their purchased time runs out. |
WS 4001 | Config timeout or invalid config | Send the config message with valid session_key immediately after WebSocket connection. |
WS 4002 | Session expired or not found | Create a new session. The previous session's time has been exhausted. |
WS 4003 | Authentication failure | Verify the session_key matches the one returned from session creation. |
WS 4004 | Session already connected elsewhere | Only one WebSocket connection per session is allowed. Close the other connection first. |
500 Internal Server Error | Upstream dTelecom service issue | Wait a few seconds and retry. If persistent, check /health for service status. |
npx claudepluginhub polymerdao/pay-plugin --plugin obul-mediaImplements real-time streaming transcription with Deepgram WebSocket API. Captures microphone audio, handles interim/final results, utterance detection for voice AI and live captioning apps.
Executes AssemblyAI streaming transcription and LeMUR workflows for real-time speech-to-text, live captions, voice agents, and LLM audio analysis.
Streams Telnyx call audio in real-time, forks media to WebSockets or destinations, and starts transcription using Python SDK. For voice analytics and AI integrations.