From okareo
Stress-test an agent or chatbot before production by running simulated multi-turn conversations against it with Okareo. Use this skill whenever the user wants to simulate users, run synthetic conversations, red-team an agent, probe for failure modes, or check how an agent behaves across many personas — including requests like "simulate users talking to my agent", "how does the bot handle an angry customer", "find where the agent breaks", or "run conversations before we ship". Use it even when the user does not say "Okareo" but is clearly trying to exercise an agent with synthetic conversations.
How this skill is triggered — by the user, by Claude, or both
Slash command
/okareo:agent-simulationThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
This skill exercises an agent the way real users would — across many
This skill exercises an agent the way real users would — across many personas, many goals, many turns — so failures surface in simulation rather than in production.
It is the pre-production counterpart to monitoring. Where
monitoring watches live traffic, simulation generates traffic on purpose.
When a simulation surfaces failures worth locking in as tests, hand off to
scenario-from-traces.
Use it when the goal is to generate conversations against an agent —
probing, red-teaming, or coverage testing before a release. If the user
already has real transcripts and wants to score them as-is, that is
evaluation — capture them as a scenario set with scenario-from-traces
and run checks over that set, rather than simulating new conversations.
Okareo's MCP server provides the tools; this skill provides the method. Never call the Okareo HTTP API directly and never invent simulation transcripts — if a needed tool is unavailable, say so and stop.
| Step | MCP tool | Purpose |
|---|---|---|
| Register the agent | create_or_update_target | Point Okareo at the agent under test |
| Design the persona | create_or_update_driver | Define the simulated user — persona and goal |
| Define the test cases | save_scenario | Scenario rows — per-conversation goals/seeds |
| Run | run_simulation | Execute simulated multi-turn runs |
| Read outcomes | get_test_run_results | Pull success rates and check results |
| Read transcripts | get_conversation_transcript | Inspect an individual conversation |
Okareo has no single "create simulation" tool — a simulation is a target
(the agent), a driver (the simulated user persona), and a scenario
(per-conversation goals) run together. Discover existing pieces with
list_targets, list_drivers, list_driver_voices, and list_simulations.
Follow these steps in order — persona and goal design before running, run before analysis.
Establish three things first:
A simulation is only as revealing as its personas. See references/persona-design.md for how to build a persona set with real coverage.
In short: vary the user, not just the words. Cover cooperative and difficult users, clear and vague requests, in-scope and out-of-scope goals, and at least one adversarial persona that actively tries to break policy. Give each persona a concrete goal so the simulated conversation has a point.
A multi-turn simulation needs to know when a conversation ends — goal achieved, an explicit max-turn cap, or a failure state. Without a turn cap, a stuck agent produces an endless transcript. Always set one.
Register the agent under test with create_or_update_target. How depends on
what the agent is:
custom_endpoint target, where you describe
how Okareo calls the agent's API (including streaming/SSE endpoints).generation target, when the thing being
exercised is a prompt against a model rather than a deployed service.See references/targets.md for how to configure each target type, including endpoint auth and streaming.
create_or_update_driver, and the per-conversation goals/seeds with
save_scenario. The checks that define failure are attached at run time.run_simulation. Simulations run many conversations and
take time; poll get_test_run_results rather than assuming failure.For failures worth preventing permanently, hand off to
scenario-from-traces — it turns the failing transcripts into a
durable scenario set you can re-run on every change.
## Simulation: <agent under test>
Personas: <count> across <N> persona types
Outcome: <success rate> — <ready to ship? y/n>
### Failure modes
- <mode> — <which personas / turn depth> — <suggested fix>
- ...
### Next step
Lock failing transcripts into a scenario set via scenario-from-traces.
get_test_run_results or get_conversation_transcript call.Provides a checklist for code reviews covering functionality, security, performance, maintainability, tests, and quality. Use for pull requests, audits, team standards, and developer training.
npx claudepluginhub okareo-ai/okareo-tools --plugin okareo