From retellai-pack
Choose and implement Retell AI validated architecture blueprints for different scales. Use when designing new Retell AI integrations, choosing between monolith/service/microservice architectures, or planning migration paths for Retell AI applications. Trigger with phrases like "retellai architecture", "retellai blueprint", "how to structure retellai", "retellai project layout", "retellai microservice".
How this skill is triggered — by the user, by Claude, or both
Slash command
/retellai-pack:retellai-architecture-variantsThis skill is limited to the following tools:
The summary Claude sees in its skill listing — used to decide when to auto-load this skill
Deployment architectures for Retell AI voice agents at different scales. Voice AI systems require real-time processing with strict latency budgets -- architecture choices directly impact call quality.
Deployment architectures for Retell AI voice agents at different scales. Voice AI systems require real-time processing with strict latency budgets -- architecture choices directly impact call quality.
Best for: Prototyping, < 10 concurrent calls, single agent.
set -euo pipefail
Retell Platform -> WebSocket -> Your Webhook Server -> LLM API
|
Local State (memory)
import express from 'express';
const app = express();
const callState = new Map();
app.post('/retell-webhook', async (req, res) => {
const { call_id, transcript } = req.body;
const state = callState.get(call_id) || { history: [] };
state.history.push(transcript);
const response = await generateResponse(state);
callState.set(call_id, state);
res.json({ response }); // Must respond < 1 second
});
Best for: 10-100 concurrent calls, multiple agents, production.
set -euo pipefail
Retell Platform -> Load Balancer -> Webhook Server 1
-> Webhook Server 2
-> Webhook Server 3
|
Redis (shared state)
|
LLM API (cached)
class DistributedCallHandler {
constructor(private redis: Redis, private llm: LLMClient) {}
async handleTurn(callId: string, transcript: string) {
const state = await this.redis.get(`call:${callId}`);
const context = JSON.parse(state || '{"history":[]}');
context.history.push(transcript);
// Cache common responses for < 100ms latency
const cacheKey = `response:${this.hash(transcript)}`;
let response = await this.redis.get(cacheKey);
if (!response) {
response = await this.llm.generate(context);
await this.redis.setex(cacheKey, 3600, response); # 3600: timeout: 1 hour
}
await this.redis.setex(`call:${callId}`, 3600, JSON.stringify(context)); # timeout: 1 hour
return response;
}
}
Best for: 100+ concurrent calls, complex flows, analytics.
set -euo pipefail
Retell Platform -> API Gateway -> Webhook Service -> Redis (state)
-> Event Bus (Kafka)
|
+--------------+------------+
| | |
Analytics Transcription Escalation
Service Archive Handler
class VoicePipeline {
async handleCall(event: RetellEvent) {
// Fast response path (< 500ms budget)
const response = await this.generateFast(event);
// Async: emit events for downstream processing
await this.eventBus.emit('call.turn', {
callId: event.call_id,
transcript: event.transcript,
response: response
});
return response;
}
}
| Factor | Single Server | Distributed | Event-Driven |
|---|---|---|---|
| Concurrent Calls | < 10 | 10-100 | 100+ |
| Latency Budget | 800ms | 500ms | 300ms |
| State | In-memory | Redis | Redis + Events |
| Scaling | Vertical | Horizontal | Auto-scaling |
| Issue | Cause | Solution |
|---|---|---|
| Calls drop under load | Single server bottleneck | Scale to distributed architecture |
| Lost call state | Server restart | Move state to Redis |
| High latency | LLM response too slow | Pre-cache common responses |
Basic usage: Apply retellai architecture variants to a standard project setup with default configuration options.
Advanced scenario: Customize retellai architecture variants for production environments with multiple constraints and team-specific requirements.
npx claudepluginhub nickloveinvesting/nick-love-plugins --plugin retellai-packProvides Retell AI SDK patterns, architecture variants, and error handling for voice agents and phone call automation.
Creates, edits, and optimizes skills for Claude Code, including drafting, evaluating with test prompts, iterating on performance, and improving skill descriptions for better triggering accuracy.