From observo-qa-toolkit
Generate test cases for a feature, requirement, or PRD and push them to the Observo test management platform with sensible defaults — status=IN_REVIEW and assignee = current user. Use when the user asks to "create test cases", "write test cases", "push test cases to Observo", "test cases for <module/requirement>", or similar. If the user did not explicitly say whether they want Observo records or local Jest/RTL/Playwright code, ask first via AskUserQuestion before doing anything.
How this skill is triggered — by the user, by Claude, or both
Slash command
/observo-qa-toolkit:observo-test-casesThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
Workflow for creating test cases in **Observo** (a test management platform — see https://observoai.co). The deliverable is *records in Observo*, created through the Observo MCP server. If the user's intent is ambiguous between Observo records and local test code (Jest / RTL / Playwright / Vitest / etc.), this skill **asks first** rather than assuming — see *Disambiguation* below. Local test co...
Workflow for creating test cases in Observo (a test management platform — see https://observoai.co). The deliverable is records in Observo, created through the Observo MCP server. If the user's intent is ambiguous between Observo records and local test code (Jest / RTL / Playwright / Vitest / etc.), this skill asks first rather than assuming — see Disambiguation below. Local test code generation is out of scope for this skill — the user can invoke whatever QA / test-scaffolding skill they normally use for that.
The user asks to write/create/push test cases for a feature, requirement, PRD, or module. Phrases like:
"Test cases" can mean two different deliverables:
Before touching anything, if the user did NOT explicitly say which one, ask once via AskUserQuestion — two options:
Skip the question only if the user already named the target explicitly. Strong signals for this skill: "Observo", "test management", "IN_REVIEW", "assign to me", "push to platform". Strong signals for local-code path: "Jest", "Playwright", "Vitest", "spec file", ".test.ts", ".spec.ts", "scaffold tests", "unit tests in the repo".
Do NOT assume the default. Asking is cheap; pushing 50 cases to Observo when the user wanted local Jest stubs (or vice versa) is expensive to undo.
The "feature/requirement" usually lives under your repo's requirements / PRD folder — configurable via requirements_dir in .observo-toolkit.json (typical examples: ./docs/requirements/, ./docs/PRDs/, ./spec/). Read the doc end-to-end. The Acceptance Criteria list is the primary source of test scenarios.
A weak / ambiguous / incomplete requirement produces weak test cases — "garbage in, garbage out". Before designing scenarios, evaluate whether the source itself needs work:
If the skill requirements-testing appears in this session's available-skills list, and either:
then invoke requirements-testing via the Skill tool before step 2, passing as handoff the source path / Jira key / inline text.
If requirements-testing finds blocker defects (conflicts, fundamental clarity gaps), stop and surface them — ask the user via AskUserQuestion whether to:
If only major / minor defects are found, surface them in passing but continue to step 2 — they're polish, not blockers.
If requirements-testing is NOT in the available-skills list, skip this gate and proceed; mention in the final summary that the gate was skipped and the plugin's requirements-testing skill can be installed to enable it.
Cover every distinct Acceptance Criterion as its own scenario. Heuristics:
pre_conditions if needed.One scenario = one test case. Atomic. If a scenario has multiple ordered steps, those go into the case's steps[] array — they are NOT separate cases. The test of "atomic" is: each case should fail for exactly one reason.
Caveat — PRD ≠ code. As-built / reverse-engineered docs drift from real handler code over time, especially literal error message strings, endpoint paths, and validation thresholds. Step 2a below grounds the design in code when possible. Without it, scenarios reflect what the PRD claims, not what the code actually does — automation built on top will brittle-fail on mismatches.
After sketching scenarios in step 2 and BEFORE pushing to Observo, try to ground them in the implementation:
observo-code-verifier appears in this session's available-skills list → invoke it via the Skill tool, passing as handoff:
ok / string-drift / endpoint-mismatch / missing / skipped) and suggested_corrections with was / now / evidence per drift.ok → no change.string-drift (error message wording differs) → apply the suggested correction silently (it's a literal fact pulled from code). Mention count in summary.endpoint-mismatch (path doesn't exist) → DON'T silently fix. Ask the user once via AskUserQuestion whether to use the verifier's suggestion or drop the scenario.missing (claim couldn't be located in code) → keep the scenario but flag it for the user in the summary so they can verify manually.skipped (no code access) → proceed with the original scenarios; flag in the summary that verification was skipped and what would help next time (e.g. "run from project root, or connect a repo MCP server").observo-code-verifier is NOT in the available-skills list → proceed without verification, but add to the summary: "Code verification skipped — install the observo-code-verifier skill (ships with this plugin) for grounded test cases."Never block creation on verifier output. Worst case (skipped + many missing) → create cases anyway, flag everything in the summary so the user can review.
mcp__observo__list_projects. Default to "Observo E2E" (OB) unless told otherwise.mcp__observo__list_suites for the project. If a suite matching the module already exists, reuse its suite_id. Otherwise mcp__observo__create_suite with name = module title (e.g. "Auth & Accounts") and a short description linking to the source doc.mcp__observo__list_test_cases with suite_id and pull names + descriptions of all existing cases in that suite. For each newly-designed scenario, compare intent — not just the literal name string — against the existing set. Two cases are duplicates if they verify the same behavior on the same area, even when worded differently (e.g. "Login with unknown email returns 'email not found'" ≈ "Reject login when email is not registered"). Rules:
Every case must be assigned to a user. The default assignee is the current user themselves (they want to review what's generated). Resolution chain:
mcp__observo__list_account_users with search=<email> and take the matching user's id. Use the resolved UUID for both owner_id and reviewer_id.AskUserQuestion for their Observo email, then save it to a memory entry so future sessions don't need to ask again.list_account_users is not registered in this session (older MCP server build) and the user can't paste their UUID, create the cases without owner_id / reviewer_id and clearly flag in the summary that assignee was skipped. Do not silently drop the requirement.Use mcp__observo__bulk_create_test_cases (single call for the whole batch). Per-case payload:
| Field | Value |
|---|---|
name | short scenario title (≤80 chars). Format: "<area> — <expected outcome>" |
description | 1–2 sentences on what's tested and why |
suite_id | from step 3 |
status | STATUS_IN_REVIEW by default. If the user explicitly named a different status (STATUS_DRAFT / STATUS_APPROVED / STATUS_CHANGES_REQUESTED / STATUS_DEPRECATED), use that. If status was not mentioned at all, ask once via AskUserQuestion with STATUS_IN_REVIEW as the recommended option — see Status policy below |
owner_id, reviewer_id | UUID from step 4 |
severity | SEVERITY_BLOCKER | SEVERITY_CRITICAL | SEVERITY_NORMAL | SEVERITY_MINOR | SEVERITY_TRIVIAL — judged from AC criticality |
priority | PRIORITY_HIGH | PRIORITY_MEDIUM | PRIORITY_LOW |
layer | LAYER_E2E | LAYER_API | LAYER_UNIT (mostly E2E or API for Observo records) |
type | CASE_TYPE_FUNCTIONAL | CASE_TYPE_SECURITY | CASE_TYPE_REGRESSION | CASE_TYPE_INTEGRATION | CASE_TYPE_SMOKE | CASE_TYPE_ACCEPTANCE | CASE_TYPE_USABILITY | CASE_TYPE_PERFORMANCE | CASE_TYPE_COMPATIBILITY | CASE_TYPE_EXPLORATORY | CASE_TYPE_OTHER |
behavior | BEHAVIOR_POSITIVE | BEHAVIOR_NEGATIVE | BEHAVIOR_DESTRUCTIVE |
automation_status | AUTOMATION_STATUS_MANUAL by default; AUTOMATION_STATUS_AUTOMATED once a spec covers it |
pre_conditions, post_conditions | when material |
steps | ordered array of {action, data, expected} — Action / Data / Expected. Atomic. |
Use the prefixed enum forms above (e.g. PRIORITY_HIGH, not HIGH) — they are the canonical proto names accepted by the backend. The MCP layer also accepts short forms via a normalizer (added in OB-241), but the prefixed form is what the API persists, so passing it directly avoids any normalization edge case and matches what you'll see when you read a case back.
Default status for every freshly generated case is STATUS_IN_REVIEW — the user needs to review batched output before promoting it. But this is a default, not a hard rule:
status=STATUS_APPROVED, "create as draft", "put it in approved", etc.) → honour it.AskUserQuestion with options:
STATUS_IN_REVIEW (Recommended) — needs my review before approvalSTATUS_DRAFT — work-in-progress, not ready for reviewSTATUS_APPROVED — skip review (use sparingly)Ask the status question only when intent is otherwise unambiguous (i.e. you already know it's Observo records, not Jest code). If you're already disambiguating Observo-vs-Jest, you can combine both questions into a single AskUserQuestion call with two questions, to avoid two pop-ups in a row.
OB-241 (silent drop of priority / type / behavior on create, invalid parameters on update) was fixed on 2026-05-11 by adding short→prefixed normalization in the MCP layer. With the field table above using prefixed forms, the call goes straight through without relying on normalization.
After bulk_create_test_cases / create_test_case, read back one or two cases and confirm priority, type, behavior came back as the values you sent (not PRIORITY_NOT_SET / CASE_TYPE_OTHER / BEHAVIOR_NOT_SET). If they didn't, either:
mcp/internal/tools/enum_normalize.go → flag for fix.Never imply success on a field that didn't apply.
After creation, report:
Skip any prose narration about what you did internally — the user sees the diff in Observo. Focus the summary on what they need to review.
The natural follow-up after Observo cases land is to automate them as local test code. This skill actively offers and invokes that handoff — but lets the user choose, and stays portable.
Steps:
Ask once via AskUserQuestion whether to automate now. Three options:
Skip this question (default to "wait") if the chosen status was DRAFT (cases aren't ready for automation).
Group the created cases by layer (and type for special cases) — different layers need different scaffolders. Don't route everything to one skill.
From the bulk_create_test_cases response, count cases per layer (LAYER_E2E / LAYER_API / LAYER_UNIT). Also flag any case where type is CASE_TYPE_PERFORMANCE / CASE_TYPE_SECURITY / CASE_TYPE_USABILITY / CASE_TYPE_COMPATIBILITY / CASE_TYPE_EXPLORATORY — those typically need a non-Playwright path.
Pick a scaffolding skill per layer-group, with the candidate ordered preferred → fallback. Always check availability against the current session's available-skills list before suggesting — never assume any skill exists.
| Layer / type group | Preferred candidate | Fallback candidates |
|---|---|---|
LAYER_E2E (UI flows) | pw-generate — same plugin as this one, Observo-aware (@observo:<code> tag), repo-agnostic discovery, knows the .observo-pw.json config | engineering-skills:senior-qa (Playwright + Jest + RTL), or pw:generate from the external pw plugin if installed |
LAYER_API (backend endpoint / contract tests) | engineering-advanced-skills:api-test-suite-builder — built for REST/contract testing | engineering-skills:senior-qa (covers API testing), or engineering-skills:senior-backend |
LAYER_UNIT (component / function unit tests) | engineering-skills:senior-qa — Jest + RTL focus | engineering-skills:tdd-guide (TDD-focused, multi-framework) |
type=CASE_TYPE_PERFORMANCE | no plugin scaffolder — usually manual k6 / JMeter / custom bench setup | offer "skip — wire performance tests manually" |
type=CASE_TYPE_SECURITY | engineering-skills:senior-security if available | offer "skip" |
type=CASE_TYPE_USABILITY / EXPLORATORY | no automation candidate — these are manual by nature | offer "skip" |
| Other / unrecognized | no preferred | AskUserQuestion with text input for the skill name, or "skip this group" |
Decision flow:
AskUserQuestion with preferred + fallback + "skip" options.AskUserQuestion per group, OR a single batched AskUserQuestion with up to 4 sub-questions (one per group, since AskUserQuestion allows 1-4 questions). Prefer batched to minimize prompts.Invoke chosen skill(s) via the Skill tool — one invocation per non-skipped group:
Skill(skill="<chosen-skill-name>")
Before each invocation, prepare a tight handoff brief in plain text — keep it under ~10 lines:
layer) — e.g. OB-12..OB-23 for E2E, OB-24..OB-31 for APIpw-generate: mention .observo-pw.json exists if it does (skill will auto-detect, just informational)create_run / update_case_in_run / update_step_in_run) — pw-generate's sibling pw-run skill (when it lands) won't help these layers, so the chosen scaffolder needs its own writeback pathAfter all chosen scaffolders return, summarize per-group:
E2E layer: 12 cases → 4 spec files via pw-generate (paths: ...)API layer: 8 cases → 2 spec files via senior-qa (paths: ...)UNIT layer: 5 cases → skipped (user choice — will wire manually)Performance cases: 2 → skipped (no plugin scaffolder; manual k6 setup needed)Confirm the user can run / lint the generated code in each group.
If the user chose "wait" or "no" in step 1: end with one short line — "Automation skipped; re-trigger when you're ready by asking to 'automate the approved cases'. The skill will then group by layer (E2E / API / UNIT / etc.) and route to the right scaffolder per group."
APPROVED or DRAFT without checking with the user. Default is IN_REVIEW; other statuses require either an explicit user instruction or an answer from the disambiguation question.priority / type / behavior persisted when they didn't — always sanity-check by reading one or two cases back (see step 6).HIGH / FUNCTIONAL / POSITIVE in payloads. They work today via the MCP normalizer (OB-241), but the prefixed form (PRIORITY_HIGH, CASE_TYPE_FUNCTIONAL, BEHAVIOR_POSITIVE) is the canonical wire format and avoids any failure if the deployed MCP build hasn't picked up the normalizer yet.Searches MemPalace before answering questions about past work, people, projects, or prior decisions. Returns verbatim stored content instead of guessing from model memory.
Guides Payload CMS config (payload.config.ts), collections, fields, hooks, access control, APIs. Debugs validation errors, security, relationships, queries, transactions, hook behavior.
Implements vector databases with Pinecone, Weaviate, Qdrant, Milvus, pgvector for semantic search, RAG, recommendations, and similarity systems. Optimizes embeddings, indexing, and hybrid search.
npx claudepluginhub observo-ai/claude-plugins --plugin observo-qa-toolkit