From elevenlabs-pack
Implements ElevenLabs rate limiting with plan-aware concurrency queuing, exponential backoff, and 429 error handling for TTS requests.
How this skill is triggered — by the user, by Claude, or both
Slash command
/elevenlabs-pack:elevenlabs-rate-limitsThis skill is limited to the following tools:
The summary Claude sees in its skill listing — used to decide when to auto-load this skill
Handle ElevenLabs rate limits with plan-aware concurrency queuing, exponential backoff, and quota monitoring. ElevenLabs uses two rate limit mechanisms: concurrent request limits (per plan) and system-level throttling.
Handle ElevenLabs rate limits with plan-aware concurrency queuing, exponential backoff, and quota monitoring. ElevenLabs uses two rate limit mechanisms: concurrent request limits (per plan) and system-level throttling.
p-queue package (recommended): npm install p-queueElevenLabs returns HTTP 429 for two different reasons:
| 429 Variant | Response Body | Cause | Strategy |
|---|---|---|---|
too_many_concurrent_requests | {"detail":{"status":"too_many_concurrent_requests"}} | Exceeded plan concurrency | Queue requests, don't backoff |
system_busy | {"detail":{"status":"system_busy"}} | Server overload | Exponential backoff |
| Plan | Max Concurrent Requests | Characters/Month |
|---|---|---|
| Free | 2 | 10,000 |
| Starter | 3 | 30,000 |
| Creator | 5 | 100,000 |
| Pro | 10 | 500,000 |
| Scale | 15 | 2,000,000 |
| Business | 15 | Custom |
// src/elevenlabs/rate-limiter.ts
import PQueue from "p-queue";
type ElevenLabsPlan = "free" | "starter" | "creator" | "pro" | "scale" | "business";
const CONCURRENCY_LIMITS: Record<ElevenLabsPlan, number> = {
free: 2,
starter: 3,
creator: 5,
pro: 10,
scale: 15,
business: 15,
};
export function createRequestQueue(plan: ElevenLabsPlan) {
const concurrency = CONCURRENCY_LIMITS[plan];
const queue = new PQueue({
concurrency,
// Each queued request adds ~50ms to response time
// so keep queue depth reasonable
timeout: 120_000, // 2 minute timeout per request
throwOnTimeout: true,
});
queue.on("error", (error) => {
console.error("[ElevenLabs Queue] Request failed:", error.message);
});
return queue;
}
// Usage
const queue = createRequestQueue("pro"); // 10 concurrent
async function generateWithQueue(voiceId: string, text: string) {
return queue.add(async () => {
return client.textToSpeech.convert(voiceId, {
text,
model_id: "eleven_flash_v2_5",
});
});
}
// All 20 requests run with max 10 concurrent
const results = await Promise.all(
texts.map(text => generateWithQueue("21m00Tcm4TlvDq8ikWAM", text))
);
// src/elevenlabs/backoff.ts
export async function withBackoff<T>(
operation: () => Promise<T>,
config = {
maxRetries: 5,
baseDelayMs: 1000,
maxDelayMs: 32_000,
jitterMs: 500,
}
): Promise<T> {
for (let attempt = 0; attempt <= config.maxRetries; attempt++) {
try {
return await operation();
} catch (error: any) {
const status = error.statusCode || error.status;
const errorType = error.body?.detail?.status;
// Don't retry non-retryable errors
if (status === 401 || status === 400 || status === 404) throw error;
// For concurrent limit, retry immediately (queue handles spacing)
if (errorType === "too_many_concurrent_requests") {
if (attempt === config.maxRetries) throw error;
// Short pause — the queue is managing concurrency
await new Promise(r => setTimeout(r, 50 * (attempt + 1)));
continue;
}
// For system_busy or 5xx, exponential backoff with jitter
if (attempt === config.maxRetries) throw error;
const exponentialDelay = config.baseDelayMs * Math.pow(2, attempt);
const jitter = Math.random() * config.jitterMs;
const delay = Math.min(exponentialDelay + jitter, config.maxDelayMs);
console.warn(`[ElevenLabs] ${errorType || status}. Retry ${attempt + 1}/${config.maxRetries} in ${delay.toFixed(0)}ms`);
await new Promise(r => setTimeout(r, delay));
}
}
throw new Error("Unreachable");
}
// src/elevenlabs/quota-monitor.ts
import { ElevenLabsClient } from "@elevenlabs/elevenlabs-js";
export class QuotaMonitor {
private characterCount = 0;
private characterLimit = 0;
private lastCheck = 0;
constructor(
private client: ElevenLabsClient,
private warningThresholdPct = 80,
private checkIntervalMs = 60_000
) {}
async check(): Promise<{
used: number;
limit: number;
remaining: number;
pctUsed: number;
warning: boolean;
}> {
const now = Date.now();
if (now - this.lastCheck > this.checkIntervalMs) {
const user = await this.client.user.get();
this.characterCount = user.subscription.character_count;
this.characterLimit = user.subscription.character_limit;
this.lastCheck = now;
}
const remaining = this.characterLimit - this.characterCount;
const pctUsed = (this.characterCount / this.characterLimit) * 100;
return {
used: this.characterCount,
limit: this.characterLimit,
remaining,
pctUsed: Math.round(pctUsed * 10) / 10,
warning: pctUsed >= this.warningThresholdPct,
};
}
async guardRequest(textLength: number): Promise<void> {
const quota = await this.check();
if (textLength > quota.remaining) {
throw new Error(
`Insufficient quota: need ${textLength} chars, have ${quota.remaining} remaining (${quota.pctUsed}% used)`
);
}
if (quota.warning) {
console.warn(`[ElevenLabs] Quota warning: ${quota.pctUsed}% used (${quota.remaining} chars remaining)`);
}
}
}
// src/elevenlabs/resilient-client.ts
import { ElevenLabsClient } from "@elevenlabs/elevenlabs-js";
import { createRequestQueue } from "./rate-limiter";
import { withBackoff } from "./backoff";
import { QuotaMonitor } from "./quota-monitor";
export function createResilientClient(plan: "free" | "starter" | "creator" | "pro" | "scale" = "pro") {
const client = new ElevenLabsClient({ maxRetries: 0 }); // We handle retries
const queue = createRequestQueue(plan);
const quota = new QuotaMonitor(client);
return {
async generateSpeech(voiceId: string, text: string, modelId = "eleven_multilingual_v2") {
await quota.guardRequest(text.length);
return queue.add(() =>
withBackoff(() =>
client.textToSpeech.convert(voiceId, {
text,
model_id: modelId,
})
)
);
},
getQueueStats() {
return {
pending: queue.pending,
size: queue.size,
};
},
checkQuota: () => quota.check(),
};
}
| Model | Credits per Character | 10,000 Chars Cost |
|---|---|---|
eleven_v3 | 1.0 | 10,000 credits |
eleven_multilingual_v2 | 1.0 | 10,000 credits |
eleven_flash_v2_5 | 0.5 | 5,000 credits |
eleven_turbo_v2_5 | 0.5 | 5,000 credits |
Use Flash/Turbo models during development to conserve quota.
| Scenario | Detection | Response |
|---|---|---|
| Concurrent limit hit | 429 + too_many_concurrent_requests | Queue; retry after ~50ms per queued request |
| System busy | 429 + system_busy | Exponential backoff (1s, 2s, 4s, 8s...) |
| Quota exhausted | 401 + quota_exceeded | Stop requests; alert; wait for reset |
| Server error | 500-599 | Exponential backoff; max 5 retries |
For security configuration, see elevenlabs-security-basics.
npx claudepluginhub jeremylongshore/claude-code-plugins-plus-skills --plugin elevenlabs-packProduction deployment patterns for ElevenLabs API including rate limiting, error handling, monitoring, and testing. Use when deploying to production, implementing rate limiting, setting up monitoring, handling errors, testing concurrency, or when user mentions production deployment, rate limits, error handling, monitoring, ElevenLabs production.
Applies production-ready ElevenLabs SDK patterns for TypeScript and Python, including singleton clients, type-safe TTS with voice presets, retries, and error handling for audio AI integrations.
Implements concurrency queues with p-limit, stats, and backoff for Deepgram API to handle 429 rate limits and quotas.