By okareo-ai
Okareo MCP server plus Skills for evaluating LLM apps and agents. Bundles the connector and the playbooks that drive it.
Configure or review Okareo production monitoring — checks on live traffic, baselines, drift, and alerts, for a text or voice agent.
Guided onboarding to Okareo — verify the connection, explain the model, and walk through a first simulation or evaluation. For users new to Okareo or asking how to get started.
Build an Okareo scenario set — either composed synthetically from scratch or captured from production traces, logs, and incidents.
Start an Okareo simulation against an agent — simulated multi-turn conversations against a text agent, or simulated calls against a voice agent.
Stress-test an agent or chatbot before production by running simulated multi-turn conversations against it with Okareo. Use this skill whenever the user wants to simulate users, run synthetic conversations, red-team an agent, probe for failure modes, or check how an agent behaves across many personas — including requests like "simulate users talking to my agent", "how does the bot handle an angry customer", "find where the agent breaks", or "run conversations before we ship". Use it even when the user does not say "Okareo" but is clearly trying to exercise an agent with synthetic conversations.
Design, run, and analyze evaluations of an LLM app's generated output with Okareo — scoring a model or prompt against a scenario set with checks. Use this skill whenever the user wants to test or benchmark output quality, catch regressions, or compare prompt or model versions — including requests like "evaluate my model", "benchmark output quality", "did this prompt change make things worse", "compare GPT-4 and Claude on our test set", or "run our evals". Use it even when the user does not say "Okareo" but is clearly trying to score an LLM's output against expected results. For generating synthetic conversations against an agent use `agent-simulation`; to build the scenario set first use `scenario-design` or `scenario-from-traces`.
Set up and interpret production monitoring of an LLM app, agent, or voice agent with Okareo — checks running on live traffic, quality baselines, and alerts on regressions or drift. Use this skill whenever the user wants to monitor production, watch live quality, catch regressions in the wild, set up alerts, or investigate a drift in behavior — including requests like "monitor my agent in production", "watch my voice agent's live calls", "alert me when quality drops", "why did responses get worse this week", or "track our model in the wild". Use it even when the user does not say "Okareo" but is clearly trying to observe a live LLM or voice system.
Onboard a developer to Okareo — verify the MCP connection works, explain the Target / Driver / Scenario model, and walk through a first simulation or evaluation. Use this skill whenever the user is new to Okareo, asks how to get started or set up, or asks what Okareo can do — including requests like "get me started with Okareo", "how do I set up Okareo", "what can Okareo do", "I want to try Okareo", or "help me run my first test". Use it even when the user does not say "Okareo" but is clearly new to AI evaluation and trying to test an agent or model for the first time. Not for a user who already has a configured target and a specific task — route straight to the matching skill instead.
Design a synthetic test scenario set from scratch with Okareo — diverse, edge-case inputs covering real workflows, user roles, and stress conditions. Use this skill whenever the user wants to create a test set from scratch, expand coverage, or generate scenarios — including requests like "create a test set for my agent", "generate scenarios to test this", "we need more test coverage", "build evals from scratch", or "what cases should I test". Use it even when the user does not say "Okareo" but is clearly trying to build synthetic test cases for an LLM app or agent. Not for converting production traffic into a test set (use `scenario-from-traces`) and not for running the set (use `evaluation`).
External network access
Connects to servers outside your machine
Own this plugin?
Verify ownership to unlock analytics, metadata editing, and a verified badge. GitHub access is read-only (username + org membership).
Sign in to claimOwn this plugin?
Verify ownership to unlock analytics, metadata editing, and a verified badge. GitHub access is read-only (username + org membership).
Sign in to claimBased on adoption, maintenance, documentation, and repository signals. Not a security audit or endorsement.
Official Okareo tooling for Claude: the Okareo MCP server plus a set of Agent Skills that teach Claude how to simulate, evaluate, and monitor LLM apps and agents with Okareo.
The MCP server gives Claude the tools (the callable actions against Okareo). The skills give Claude the method — when to reach for those tools and how to run a real workflow with them. They are designed to be installed together.
Seven skills and four slash commands, one MCP server, bundled as a single installable plugin:
| Skill | What it does |
|---|---|
quickstart | Onboard a new user; verify the connection; first run |
scenario-design | Compose a synthetic test scenario set from scratch |
scenario-from-traces | Turn production traces and issues into a test set |
agent-simulation | Stress-test a text agent with simulated multi-turn users |
voice-simulation | Run simulated calls against a voice agent |
evaluation | Score a model or prompt against a scenario set |
monitoring | Monitor live text or voice traffic; catch drift |
The commands — /okareo:quickstart, /okareo:scenario, /okareo:simulate,
/okareo:monitor — are thin entry points that frame a task and route to the
skill that does the work.
quickstart is the on-ramp. The rest compose into a lifecycle: build a
scenario set (scenario-design or scenario-from-traces), exercise an agent
before release (agent-simulation, voice-simulation), score it
(evaluation), and watch it in production (monitoring) — where any failure
flows back into a scenario set that is re-run on every change. More skills
and commands are planned — see ROADMAP.md.
okareo-tools/
│
├── .claude-plugin/
│ └── marketplace.json Claude Code marketplace catalog. Lists
│ the okareo plugin and where to find it.
│
├── plugins/
│ └── okareo/ ONE installable plugin = MCP + skills.
│ ├── .claude-plugin/
│ │ └── plugin.json Plugin manifest. The release version
│ │ (semver) lives here.
│ ├── .mcp.json Okareo MCP server config. Auto-loaded
│ │ by Claude Code when the plugin installs.
│ ├── commands/ Slash commands (/okareo:<name>). Thin
│ │ entry points that route to a skill.
│ └── skills/ One folder per skill. Each is a
│ │ self-contained Agent Skill.
│ ├── agent-simulation/
│ │ ├── SKILL.md Instructions + YAML frontmatter.
│ │ └── references/ Extra docs, loaded only when needed.
│ ├── evaluation/
│ ├── monitoring/
│ ├── quickstart/
│ ├── scenario-design/
│ ├── scenario-from-traces/
│ └── voice-simulation/
│
├── skill-template/ Copy-to-author scaffold for a new
│ skill. Lives outside skills/ so it is
│ never packaged.
├── command-template.md Copy-to-author scaffold for a new
│ slash command.
│
├── scripts/
│ ├── build.sh Packages each skill into a .skill file.
│ ├── release.sh Builds, then publishes to all 3 surfaces.
│ ├── install.sh Consumer-side installer.
│ └── validate_skills.py Checks every skill before packaging.
│
├── .github/
│ └── workflows/
│ └── release.yml CI: validate + build + publish on a v* tag.
│
├── dist/ Build output (.skill files). Gitignored.
├── skill-ids.json Claude API skill ids, managed by release.sh.
├── CONTRIBUTING.md How to author a skill.
├── ROADMAP.md Shipping and planned skills.
├── CLAUDE.md.snippet Drop-in dependency hint for consuming repos.
├── LICENSE
├── .gitignore
└── README.md
Two structural rules to keep in mind:
plugin.json and marketplace.json go inside .claude-plugin/.
Everything else in a plugin (skills/, .mcp.json) sits in the plugin
root, not in .claude-plugin/.SKILL.md at its top level. Adding a
skill is just adding a folder under plugins/okareo/skills/; the build and
release scripts pick it up automatically..skill package isnpx claudepluginhub okareo-ai/okareo-tools --plugin okareo20 SEO/GEO skills and 5 commands on one shared contract for keyword research, content creation, technical audits, schema markup, monitoring, quality gates, entity truth, and campaign memory.
Comprehensive skill pack with 66 specialized skills for full-stack developers: 12 language experts (Python, TypeScript, Go, Rust, C++, Swift, Kotlin, C#, PHP, Java, SQL, JavaScript), 10 backend frameworks, 6 frontend/mobile, plus infrastructure, DevOps, security, and testing. Features progressive disclosure architecture for 50% faster loading.
Tools to maintain and improve CLAUDE.md files - audit quality, capture session learnings, and keep project memory current.
Unity Development Toolkit - Expert agents for scripting/refactoring/optimization, script templates, and Agent Skills for Unity C# development
Complete creative writing suite with 10 specialized agents covering the full writing process: research gathering, character development, story architecture, world-building, dialogue coaching, editing/review, outlining, content strategy, believability auditing, and prose style/voice analysis. Includes genre-specific guides, templates, and quality checklists.
Comprehensive .NET development skills for modern C#, ASP.NET, MAUI, Blazor, Aspire, EF Core, Native AOT, testing, security, performance optimization, CI/CD, and cloud-native applications