agent-runtime-config | adk-deployment

Stats

Actions

Tags

agent-runtime-config | adk-deployment

agent-runtime-config

Configure the ADK 2.0 runtime for production: concurrency, retries, timeouts, callbacks, custom services.

Runner with full config

from google.adk.runners import Runner, RunnerConfig
from google.adk.sessions import VertexAiSessionService
from google.adk.artifacts import GcsArtifactService

session_service = VertexAiSessionService(project=..., location=...)
artifact_service = GcsArtifactService(bucket_name="my-artifacts")

runner = Runner(
    agent=root_agent,
    session_service=session_service,
    artifact_service=artifact_service,
    config=RunnerConfig(
        max_concurrent_invocations=50,
        per_invocation_timeout_seconds=120,
        tool_call_timeout_seconds=30,
        model_call_timeout_seconds=60,
        retry_policy={
            "max_retries": 3,
            "backoff_factor": 2.0,
            "retryable_errors": ["RateLimitError", "ServiceUnavailable"],
        },
    ),
)

Custom FastAPI service registration

ADK 2.0 supports injecting custom services into the FastAPI server:

from google.adk.web import create_app

class AuditService:
    def log(self, msg): print(f"[AUDIT] {msg}")

app = create_app(
    agent=root_agent,
    session_service=session_service,
    custom_services={"audit": AuditService()},
)

Tools and callbacks can request audit via dependency injection.

Callback registration

from google.adk.callbacks import (
    on_before_model_call,
    on_after_model_call,
    on_before_tool_call,
    on_after_tool_call,
    on_session_created,
)

@on_before_model_call
async def trim_long_history(ctx, request):
    # See context-cache-compress skill
    return request

@on_after_tool_call
async def log_tool_usage(ctx, tool_name, args, result):
    print(f"{tool_name}({args}) -> {result}")

# Register with runner
runner = Runner(
    agent=root_agent,
    callbacks=[trim_long_history, log_tool_usage],
)

Code executor (Vertex sandbox)

from google.adk.tools import AgentEngineSandboxCodeExecutor

code_exec = AgentEngineSandboxCodeExecutor(
    project="my-project",
    location="us-central1",
)

root_agent = LlmAgent(
    name="data_analyst",
    model="gemini-2.5-pro",
    instruction="Use the code_executor tool to run analysis on uploaded CSVs.",
    tools=[code_exec],
)

Runs LLM-generated code inside a Vertex sandbox — safe execution without local risk.

Concurrency profiles

Workload	`max_concurrent`	`per_inv_timeout`
Interactive chat (1-user)	5	120s
Multi-tenant API	50-200	60s
Batch processing	10-20	600s
Streaming voice	depends on connections	n/a (long-lived)

Validation

Concurrent load test confirms max_concurrent_invocations is respected
Timeouts trigger cleanly without resource leaks
Retry policy fires on transient errors (test by mocking)
Callbacks execute in the right order (before vs after; sync vs async)
Custom services accessible from tools

See also

cloud-run-deployer / gke-deployer / vertex-agent-engine-deployer for the host
logging-callback-setup for observability callbacks