From observo-qa-toolkit
Review one or more Observo test cases against a 15-criteria quality checklist, post per-issue review comments via MCP (scope CASE / FIELD / STEP), assign a compact 0–10 score, and flip status to STATUS_CHANGES_REQUESTED when comments were actually created. Use when the user asks to "review test case", "rate the quality of cases in suite X", "score test cases", or otherwise wants a systematic QA pass on an Observo case.
How this skill is triggered — by the user, by Claude, or both
Slash command
/observo-qa-toolkit:observo-review-test-caseThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
Systematic quality review of test cases stored in **Observo** (a test management platform — see https://observoai.co). The deliverable is *review comments in Observo plus a status bump*, posted through the Observo MCP server. The skill is read-mostly: it never edits the case body, never resolves comments, never approves cases — those remain human actions.
Systematic quality review of test cases stored in Observo (a test management platform — see https://observoai.co). The deliverable is review comments in Observo plus a status bump, posted through the Observo MCP server. The skill is read-mostly: it never edits the case body, never resolves comments, never approves cases — those remain human actions.
The user asks to QA-review one or more test cases. Phrases like:
If the target is unclear, ask once via AskUserQuestion. Options:
Skip the question only when the user explicitly named a single case or a suite.
mcp__observo__get_test_case with the supplied short code (e.g. DEMO-37). On UUID, call the same tool — short-code resolution is server-side and handles both shapes. Keep the returned id (UUID) and full case body (name, description, pre_conditions, post_conditions, priority, type, layer, behavior, severity, status, steps[], owner_id, reviewer_id) in memory for the rest of the workflow.mcp__observo__list_test_cases with suite_id. Collect every case. Cap batch size at 10–20 cases per pass — if the suite has more, ask the user once whether to continue with all of them (token cost can be material).Before evaluating anything, call mcp__observo__list_review_comments for the case. Keep the returned comments[] keyed by (scope, field_name, step_id) so step 7 can deduplicate. A re-run of the skill on the same case must NOT produce duplicate OPEN comments.
Score each criterion as pass / fail. Map every fail to a planned comment with the correct scope. Be specific — say what's wrong AND give a concrete reformulation.
| # | Criterion | Comment scope | field_name / target |
|---|---|---|---|
| 1 | Clear, behavior-describing title (not "Check login") | FIELD | name |
| 2 | Atomic — one case, one main check | CASE | — |
| 3 | Preconditions explicit (user role, system state, test data) | FIELD | pre_conditions |
| 4 | Steps are concrete user actions (no "verify system") | STEP | per offending step |
| 5 | Test data explicit (or templated with {{variable}}) | STEP (offending step) | — |
| 6 | Expected result is specific (UI message / status / state) | STEP (or FIELD expected at case-level) | per offending step |
| 7 | Action / Data / Expected structure per step | STEP | per offending step |
| 8 | Covers positive, negative, edge cases (suite-level signal — fix as comment on the case) | CASE | — |
| 9 | Not a duplicate of another case — see §3.9 below for the embedding-based procedure | CASE | reference the other short code |
| 10 | Automation-ready (deterministic, stable selectors/fields) | STEP or CASE | per offending area |
| 11 | priority and type are set correctly | FIELD | priority / type |
| 12 | Traceability to requirement / PRD / ticket | CASE | — |
| 13 | No hidden assumptions (e.g. "admin" referenced but not in preconditions) | FIELD | pre_conditions |
| 14 | For complex flows: expected covers UI and backend state | STEP or FIELD expected | — |
| 15 | Not flaky by design (no bare wait N seconds, no shared state, no order dependency) | STEP | per offending step |
Call mcp__observo__find_similar_cases with:
test_case_id: the case being reviewedscope: "project" (default — duplicates from another project are almost never actionable)min_similarity: 0.72 (calibrated from OB-250 smoke against text-embedding-3-small — real paraphrase pairs sit around cosine 0.65–0.80; 0.85 missed near-duplicates)How to interpret the response:
Empty results list → criterion #9 passes. No comment.
Non-empty list → criterion #9 fails. Take the top 3 hits (already ordered by cosine desc) and post one CASE-scoped comment per hit using this exact format:
Possible duplicate of
{{short_code}}(cosine{{similarity}}). Decide manually whether this is a true dup before merging.
Do not collapse multiple hits into a single comment — the user resolves them independently.
Failure / fallback path (do NOT surface an error to the user):
results is empty AND the source case is freshly created (< 30s old) → embedding worker is likely still catching up. Skip the criterion this run; the next review will catch it.Weights (from PRD):
Clear title 1
Atomic scenario 1
Clear preconditions 1
Executable steps 1
Explicit test data 1
Specific expected result 2 ← doubled
Covers relevant risk 1
Automation-ready 1
Linked to requirement 1
─────────────────────────────
Maximum 10
Interpretation:
For each failed criterion, draft a single comment with:
scope = from the table in step 3field_name when scope=FIELDstep_id when scope=STEP (the UUID of the specific offending step from the case body)message formatted as:
Default language for the comment text is English (matches the case content convention). Default tone is imperative + concrete reformulation, never "I suggest" or "maybe consider".
For each planned comment, check the pre-read list from step 2: if there is already an OPEN comment in the same (scope, field_name, step_id) tuple whose message substantially overlaps the new draft, skip it. Don't post the same finding twice. Track the skipped count for the summary.
If the existing comment is RESOLVED but the issue still applies → post a new one (the human marked the previous discussion done, so this is a regression signal).
For each surviving comment in the plan, call mcp__observo__add_review_comment. One call per comment — don't batch. Capture the returned comment.id from each call so the chat summary can list them.
status ≠ STATUS_CHANGES_REQUESTED → call mcp__observo__update_test_case with payload.status = "STATUS_CHANGES_REQUESTED". Do NOT change any other field.STATUS_APPROVED no matter the score. Approval is a human action.Markdown structure:
Score: <X>/10 — <strong | acceptable | weak | not review-ready>
| # | Criterion | Pass/Fail | Comment ID (if posted) |
|---|---|---|---|
| 1 | Clear title | … | … |
…
Comments posted:
- <comment-id>: <short quote of the message>
- …
Comments skipped (duplicates of existing OPEN):
- <count>
Status:
- Before: <STATUS_…>
- After: <STATUS_… or unchanged>
For suite-mode, prepend an aggregate header:
Suite <SHORT-CODE>: N cases reviewed
Strong (≥9): <count>
Acceptable (7–8): <count>
Weak (5–6): <count>
Not review-ready (<5): <count>
Skip prose narration about what you did internally — the user sees the diff in Observo.
mcp__observo__delete_review_comment ever. Delete is a human action; this skill doesn't invoke it.mcp__observo__update_review_comment to flip status → RESOLVED on existing comments. Resolve is the author's call in the UI.mcp__observo__update_test_case with any field other than status. The skill rewrites nothing in the case body.STATUS_APPROVED. Approval is a human action.list_review_comments). Re-runs would otherwise spam duplicate OPEN comments and look broken.short_code reference. A bare cosine score is not actionable feedback.| Tool | When |
|---|---|
mcp__observo__get_test_case | Step 1 (single case) |
mcp__observo__list_test_cases | Step 1 (suite mode) |
mcp__observo__list_review_comments | Step 2 (idempotency) |
mcp__observo__find_similar_cases | Step 3.9 (semantic dedup, criterion #9) |
mcp__observo__add_review_comment | Step 7 (post each finding) |
mcp__observo__update_test_case | Step 8 (status bump only) |
AskUserQuestion | Disambiguation, suite-size confirm |
Never invoked by this skill: delete_review_comment, update_review_comment, delete_test_case, update_steps, add_steps, delete_step, bulk_create_test_cases, create_test_case.
mcp__observo__find_similar_cases (OB-250). The skill falls back to the legacy title-similarity heuristic when the embedding row is missing or the RPC fails — see §3.9.npx claudepluginhub observo-ai/claude-plugins --plugin observo-qa-toolkitSearches MemPalace before answering questions about past work, people, projects, or prior decisions. Returns verbatim stored content instead of guessing from model memory.
Guides Payload CMS config (payload.config.ts), collections, fields, hooks, access control, APIs. Debugs validation errors, security, relationships, queries, transactions, hook behavior.
Implements vector databases with Pinecone, Weaviate, Qdrant, Milvus, pgvector for semantic search, RAG, recommendations, and similarity systems. Optimizes embeddings, indexing, and hybrid search.