From adk-evaluation
Use this skill to run user simulations against an ADK 2.0 agent — an LLM plays the role of a user with a goal, exercising multi-turn behavior. Triggers on: "ADK user simulation", "simulate users for ADK", "ADK red team simulator", "ADK conversation tester", "synthetic user ADK", "multi-turn eval ADK", "persona-based testing ADK". Generates a user-simulator agent paired with the system-under-test plus a harness that runs N personas and scores success.
How this skill is triggered — by the user, by Claude, or both
Slash command
/adk-evaluation:user-simulation-runnerThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
Run synthetic users against your ADK 2.0 agent to test multi-turn behavior. An LLM plays each user with a goal, persona, and termination criterion.
Run synthetic users against your ADK 2.0 agent to test multi-turn behavior. An LLM plays each user with a goal, persona, and termination criterion.
.evalset.json is too rigidfrom google.adk.agents import LlmAgent
def make_user_simulator(persona: str, goal: str, max_turns: int = 10):
return LlmAgent(
name="simulated_user",
model="gemini-2.5-flash",
instruction=(
f"You are roleplaying a user. Persona: {persona}\n"
f"Goal: {goal}\n"
f"Stop after {max_turns} turns or when your goal is met. "
"Speak naturally — DO NOT mention you are an AI or that this is a simulation. "
"Output ONLY what the user would type. When done, output exactly 'END'."
),
)
import asyncio
from google.adk.runners import Runner
async def simulate(system_agent, user_agent, max_turns=10):
transcript = []
user_msg = await user_agent.first_turn() # opener
for _ in range(max_turns):
if user_msg.strip() == "END":
break
transcript.append(("user", user_msg))
sys_resp = await system_agent.run(user_msg)
transcript.append(("system", sys_resp))
user_msg = await user_agent.next(system_response=sys_resp)
return transcript
# Run N personas
personas = [
{"persona": "elderly first-time user", "goal": "book a doctor's appointment"},
{"persona": "frustrated expert", "goal": "cancel a subscription"},
{"persona": "non-native English speaker", "goal": "reset password"},
]
async def run_all():
results = []
for p in personas:
user = make_user_simulator(**p)
transcript = await simulate(system_under_test, user)
results.append({"persona": p["persona"], "transcript": transcript})
return results
results = asyncio.run(run_all())
After each simulation, run a judge:
async def score_goal_completion(transcript, goal):
judge = LiteLlm(model="gemini-2.5-pro")
prompt = (
f"Goal: {goal}\nTranscript:\n{format_transcript(transcript)}\n"
"Was the goal achieved? Output JSON: {achieved: bool, reasoning: str}."
)
return await judge.complete(prompt)
Maintain personas.json:
[
{"id": "p001", "persona": "elderly first-time user", "goal": "..."},
{"id": "p002", "persona": "frustrated expert", "goal": "..."}
]
Iterate on it as bugs surface — bug → new persona → eval coverage.
END token reliably emitted when goal metenvironment-simulation for stateful environments (mock APIs, mock DBs)custom-metric-builder to score along your rubricnpx claudepluginhub healthcare-ai-consulting-llc/adk-2-toolkit --plugin adk-evaluationCreates, edits, and optimizes skills for Claude Code, including drafting, evaluating with test prompts, iterating on performance, and improving skill descriptions for better triggering accuracy.