retellai-architecture-variants | retellai-pack

Stats

Actions

Tags

retellai-architecture-variants | retellai-pack

Retell AI Architecture Variants

Overview

Deployment architectures for Retell AI voice agents at different scales. Voice AI systems require real-time processing with strict latency budgets -- architecture choices directly impact call quality.

Prerequisites

Retell AI account with agent configured
Understanding of WebSocket real-time communication
Infrastructure for voice processing latency requirements

Instructions

Step 1: Single Webhook Server (Simple)

Best for: Prototyping, < 10 concurrent calls, single agent.

set -euo pipefail
Retell Platform -> WebSocket -> Your Webhook Server -> LLM API
                                       |
                                  Local State (memory)

import express from 'express';
const app = express();
const callState = new Map();

app.post('/retell-webhook', async (req, res) => {
  const { call_id, transcript } = req.body;
  const state = callState.get(call_id) || { history: [] };
  state.history.push(transcript);
  const response = await generateResponse(state);
  callState.set(call_id, state);
  res.json({ response });  // Must respond < 1 second
});

Step 2: Distributed Webhook with Shared State (Production)

Best for: 10-100 concurrent calls, multiple agents, production.

set -euo pipefail
Retell Platform -> Load Balancer -> Webhook Server 1
                                 -> Webhook Server 2
                                 -> Webhook Server 3
                                         |
                                    Redis (shared state)
                                         |
                                    LLM API (cached)

class DistributedCallHandler {
  constructor(private redis: Redis, private llm: LLMClient) {}

  async handleTurn(callId: string, transcript: string) {
    const state = await this.redis.get(`call:${callId}`);
    const context = JSON.parse(state || '{"history":[]}');
    context.history.push(transcript);

    // Cache common responses for < 100ms latency
    const cacheKey = `response:${this.hash(transcript)}`;
    let response = await this.redis.get(cacheKey);
    if (!response) {
      response = await this.llm.generate(context);
      await this.redis.setex(cacheKey, 3600, response);  # 3600: timeout: 1 hour
    }
    await this.redis.setex(`call:${callId}`, 3600, JSON.stringify(context));  # timeout: 1 hour
    return response;
  }
}

Step 3: Event-Driven Voice Pipeline (Scale)

Best for: 100+ concurrent calls, complex flows, analytics.

set -euo pipefail
Retell Platform -> API Gateway -> Webhook Service -> Redis (state)
                                                  -> Event Bus (Kafka)
                                                         |
                                          +--------------+------------+
                                          |              |            |
                                    Analytics      Transcription   Escalation
                                     Service        Archive       Handler

class VoicePipeline {
  async handleCall(event: RetellEvent) {
    // Fast response path (< 500ms budget)
    const response = await this.generateFast(event);
    // Async: emit events for downstream processing
    await this.eventBus.emit('call.turn', {
      callId: event.call_id,
      transcript: event.transcript,
      response: response
    });
    return response;
  }
}

Decision Matrix

Factor	Single Server	Distributed	Event-Driven
Concurrent Calls	< 10	10-100	100+
Latency Budget	800ms	500ms	300ms
State	In-memory	Redis	Redis + Events
Scaling	Vertical	Horizontal	Auto-scaling

Error Handling

Issue	Cause	Solution
Calls drop under load	Single server bottleneck	Scale to distributed architecture
Lost call state	Server restart	Move state to Redis
High latency	LLM response too slow	Pre-cache common responses

Resources

Output

Configuration files or code changes applied to the project
Validation report confirming correct implementation
Summary of changes made and their rationale

Examples

Basic usage: Apply retellai architecture variants to a standard project setup with default configuration options.

Advanced scenario: Customize retellai architecture variants for production environments with multiple constraints and team-specific requirements.