From okareo
Design a synthetic test scenario set from scratch with Okareo — diverse, edge-case inputs covering real workflows, user roles, and stress conditions. Use this skill whenever the user wants to create a test set from scratch, expand coverage, or generate scenarios — including requests like "create a test set for my agent", "generate scenarios to test this", "we need more test coverage", "build evals from scratch", or "what cases should I test". Use it even when the user does not say "Okareo" but is clearly trying to build synthetic test cases for an LLM app or agent. Not for converting production traffic into a test set (use `scenario-from-traces`) and not for running the set (use `evaluation`).
How this skill is triggered — by the user, by Claude, or both
Slash command
/okareo:scenario-designThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
This skill builds a synthetic scenario set — test cases composed from
This skill builds a synthetic scenario set — test cases composed from scratch — so an LLM app or agent can be exercised against deliberate, balanced coverage instead of whatever inputs happen to be lying around.
It is the front half of a testing workflow: this skill builds the set.
Running it — registering a target, choosing checks, scoring — is the
separate evaluation step that follows. For test cases drawn from real
production traffic instead of composed from scratch, use
scenario-from-traces.
Use it when the raw material is intent, not data: the user describes what
the system should do and wants a test set that covers it. If the user
already has production traces, logs, or incidents to turn into tests, that
is scenario-from-traces. If the set already exists and they want to score
it, that is evaluation.
Okareo's MCP server provides the tools; this skill provides the method. Never call the Okareo HTTP API directly — if a needed tool is unavailable, say so and stop.
| Step | MCP tool | Purpose |
|---|---|---|
| Find existing sets | list_scenarios | Check for a set to extend rather than fork |
| Inspect a set | get_scenario | Read an existing set's rows and shape |
| Create a set | save_scenario | Persist the composed rows as a set |
| Extend a set | create_scenario_version | Append rows as a new version of a set |
There is no MCP tool that generates scenario rows. Generation is this
skill's job: you compose diverse, realistic rows from the user's
description of the system, and save_scenario persists them. Treat the
rows as authored test data — design them deliberately, do not pad the set.
Follow these steps in order — scope before coverage, coverage before composing rows, composing before persisting.
Establish, asking only what you cannot infer from the conversation or repo:
input (a question, a task,
a JSON object) and what the expected result should capture (a correct
answer, a property the output must satisfy, a target end state).A scenario set is only as good as its spread. Decide the coverage axes — workflows, user roles, input difficulty, edge and stress conditions — before writing any rows, so the set is balanced by design rather than by accident.
See references/coverage.md for the axes to cover and how to keep the set balanced across them.
Write each row as an input and an expected result:
result as the correct behavior: the right answer,
or the property the output must satisfy. Be specific enough that a check
can score against it.Call list_scenarios first. Prefer extending one coherent set for a
given system with create_scenario_version over scattering near-duplicate
sets — evaluations only compare when they run against a stable set. Persist
a new set with save_scenario. Keep the returned scenario ID.
The set exists but has not been run. Report what was built — row count,
coverage cells — and hand off to evaluation: register a target, choose
checks that target the behaviors this set exercises, and run it.
## Scenario set: <name> (scenario id: <id>)
Built from: <the system description>
Rows: <count> across <N> coverage cells
### Coverage
- <axis / cell> — <row count> — <what these rows probe>
- ...
### Next step
Run as an evaluation; suggested checks: <...>
references/coverage.md.npx claudepluginhub okareo-ai/okareo-tools --plugin okareoProvides a checklist for code reviews covering functionality, security, performance, maintainability, tests, and quality. Use for pull requests, audits, team standards, and developer training.