From kagura-memory
Executes a smoke test of all registered MCP tools via live connection, verifying context, memory, edge, analysis, and sleep tools plus optional PRO-tier features with coverage cross-check.
How this command is triggered — by the user, by Claude, or both
Slash command
/kagura-memory:smoke-testThe summary Claude sees in its command listing — used to decide when to auto-load this command
Verify MCP tools work correctly by executing them in sequence against temporary test contexts. Exercises the core memory/edge/context/tag/analysis/sleep tools, the **agent-memory-substrate** lane (`delivery_mode` pinning + `load_pinned`, the agent session-state lane, retrieval `feedback`, and the `trust_tier` recall filter), and owner-scoped binding introspection. Optionally exercises PRO-only resource rows (setup_resource, ingest_events, get_resource_impact, get_resource_schema, list_resource_tokens, plus delete_context cleanup) if the workspace has a PRO plan. The canonical tool registry...
Verify MCP tools work correctly by executing them in sequence against temporary test contexts.
Exercises the core memory/edge/context/tag/analysis/sleep tools, the agent-memory-substrate
lane (delivery_mode pinning + load_pinned, the agent session-state lane, retrieval feedback,
and the trust_tier recall filter), and owner-scoped binding introspection.
Optionally exercises PRO-only resource rows (setup_resource, ingest_events, get_resource_impact, get_resource_schema, list_resource_tokens, plus delete_context cleanup) if the workspace has a PRO plan.
The canonical tool registry is backend/src/mcp_server/tools/__init__.py (45 tools). The
Coverage cross-check section near the end mirrors that registry so the "all MCP tools" claim
stays honest — every registered tool is either exercised here or listed there with a reason. This
is a live runbook, not a pytest suite, so the cross-check is a manual reconciliation step: when a
tool is added to the registry, this skill must gain a row or a documented exclusion.
Excluded by design (each documented in the Coverage cross-check):
analyze_context — requires billing, BYOK, workspace owner role, and Pro-tier feature access.init_file_upload, complete_file_upload, get_file_download_url, delete_file, list_files) — require multipart S3/R2 upload flows that can't be exercised inline; cover them separately.setup_connector — provisions an external connector (Slack/Discord/Teams) needing platform credentials and a real target; gated-skip inline.Use this after deployments, tool description changes, or MCP server updates.
Prerequisite: MCP server must be running and connected.
Create a temporary test context for isolation:
list_contexts()
-> Verify: returns a list with count >= 0
create_context(name="smoke-test-{unix_timestamp}", description="Temporary context for MCP smoke test. Safe to delete.")
-> Verify: returns context with id (UUID format)
-> Save returned context_id for all subsequent steps
get_context_info(context_id=..., include_details=true)
-> Verify: returns context.name matching "smoke-test-*", context.id matches
update_context(context_id=..., display_name="Smoke Test", summary="Temporary smoke test context")
-> Verify: success response
update_search_config(context_id=..., semantic_weight=0.6, bm25_weight=0.4)
-> Verify: success response
remember(
context_id=...,
summary="MCP smoke test memory — testing remember tool",
content="This is a test memory created by smoke-test. If you see this, the remember tool is working correctly.",
type="note",
importance=0.5,
tags=["smoke-test", "automated"],
context_summary="Created during automated MCP smoke test for verification purposes.",
source_uri="file:///smoke-test/test-memory.md",
source_type="file"
)
-> Verify: returns a success response containing memory_id (UUID format)
-> Save returned memory_id
-> Note: source_uri/source_type are persisted but not in the remember response; validated via recall filters in step 4
remember(
context_id=...,
summary="MCP smoke test — time memory for recall_upcoming",
content="Scheduled event seeded by smoke-test to verify recall_upcoming.",
type="time",
importance=0.5,
tags=["smoke-test", "automated"],
details={"trigger": {"year": 2099, "month": 1}}
)
-> Verify: returns a success response containing memory_id (UUID format)
-> Save returned time_memory_id
remember(
context_id=...,
summary="MCP smoke test — pinned memory for load_pinned (delivery_mode=always)",
content="Goal/guardrail-style memory that must load deterministically every turn.",
type="note",
importance=0.6,
tags=["smoke-test", "automated"],
delivery_mode="always"
)
-> Verify: returns a success response containing memory_id (UUID format)
-> Verify: scope is "persistent" (delivery_mode="always" pins to persistent on write, no Sleep wait)
-> Save returned memory_id as pinned_memory_id
-> Note: this memory is unpinned then deleted in Cleanup so it does not leak into later runs
recall(context_id=..., query="smoke test memory", k=5)
-> Verify: returns results array with length >= 1
-> Verify: at least one result matches the memory created in step 3
recall(context_id=..., query="smoke test memory", k=5, include_explore_hints=true)
-> Verify: response contains explore_hints field (array)
-> Verify: if explore_hints is non-empty, at least one hint has reason "top_result"
-> Verify: empty explore_hints is acceptable (best-effort generation) and should not fail the smoke test
recall(context_id=..., query="smoke test memory", k=5, filters={"source_uri_prefix": "file:///smoke-test/"})
-> Verify: results contain the memory_id from step 3 (confirms source_uri filter works)
recall(context_id=..., query="smoke test memory", k=5, filters={"source_type": "file"})
-> Verify: results contain the memory_id from step 3 (confirms source_type filter works)
recall(context_id=..., query="smoke test memory", k=5, filters={"trust_tier": "trusted"})
-> Verify: status=success; results array returned
-> Verify: the memory from step 3 is present — manual writes are trusted-tier, and the filter
excludes external/connector-ingested memories (this context has none, so trusted == all here)
reference(memory_id=..., context_id=...)
-> Verify: returns full memory with summary, content, tags matching step 3
explore(memory_id=..., context_id=..., depth=2, min_weight=0.0)
-> Verify: returns response (total_activated >= 0, no error)
recall_upcoming(context_id=..., from="now")
-> Verify: status=success
-> Verify: results array contains the time memory seeded in step 3 (time_memory_id present)
-> Verify: all returned results have type="time"
load_pinned(context_id=...)
-> Verify: status=success; returns the COMPLETE unranked set for delivery_mode="always"
(deterministic counterpart to recall — no semantic search, no ranking, no rerank)
-> Verify: pinned_memory_id (from step 3) is present
-> Verify: truncated=false and total_available matches the returned count (small pinned set)
feedback(context_id=..., memory_id=<memory_id>, helpful=true, query="smoke test memory")
-> Verify: success response (append-only usefulness signal accepted)
-> Note: feedback is NOT embedded and is structurally excluded from recall(), so rating a result
never pollutes the knowledge search space — there is nothing to assert in a later recall
TTL-bounded run-state, structurally excluded from recall(). Round-trip with a short TTL:
set_state(context_id=..., key="smoke-test-step", value={"phase": "running", "n": 1}, ttl_seconds=300)
-> Verify: success response (upsert accepted)
get_state(context_id=..., key="smoke-test-step")
-> Verify: returns value {"phase": "running", "n": 1} (round-trip intact)
set_state(context_id=..., key="smoke-test-step", value={"phase": "running", "n": 2}, ttl_seconds=300)
-> Verify: success (re-using the key overwrites the value)
get_state(context_id=...)
-> Verify: omitting key lists all live entries; "smoke-test-step" present with value n=2 (overwrite confirmed)
-> Verify: no expired entries are returned
-> Note: state is scoped to the context — it is removed when the context is deleted in Cleanup,
and the 300s TTL expires it regardless; it never appears in recall()
update_memory(memory_id=..., context_id=..., summary="MCP smoke test memory — UPDATED", importance=0.7)
-> Verify: success response
recall(context_id=..., query="smoke test UPDATED", k=5)
-> Verify: returns updated memory with new summary
First, create a second test memory for edge testing (self-loops are not allowed). Use linked_memory_ids to create a declared link at creation time (post-#741 this is stored as origin="declared", not a declared_link edge_type — see the note on the verify step below):
remember(
context_id=...,
summary="MCP smoke test memory 2 — edge target",
content="Second test memory for edge CRUD testing.",
type="note",
importance=0.5,
tags=["smoke-test", "automated"],
linked_memory_ids=[<memory_id>],
linked_source_uris=["file:///smoke-test/test-memory.md"]
)
-> Save returned memory_id as memory_id_2
-> Verify: list_edges(context_id=..., memory_id=<memory_id_2>) returns an outgoing edge whose
target_id == <memory_id> (the linked memory), with weight=1.0 and confidence=1.0
-> Note (#741/#925): the linked_memory_ids declared link is stored as origin="declared" with
edge_type="neural_association" — NOT a "declared_link" edge_type (that discriminator was removed
in #741, which pivoted to the relation/origin two-axis model). MCP list_edges does not expose the
origin axis, so the post-#741 verifiable signal is the edge to the linked target carrying the fixed
declared weight/confidence of 1.0/1.0. A freshly created memory has had no recall co-activation, so
this declared edge is the only edge present on memory_id_2 at this point. (The full declared-link
reference surface — outgoing_links/incoming_links from #440 — is REST-only and not carried by the
MCP reference() tool, so it cannot be asserted from this runbook.)
create_edge(context_id=..., source_id=<memory_id>, target_id=<memory_id_2>, edge_type="related_to")
-> Verify: returns edge with weight=0.5, edge_type="related_to"
list_edges(context_id=..., memory_id=<memory_id>)
-> Verify: returns edges array with count >= 1
update_edge(context_id=..., source_id=<memory_id>, target_id=<memory_id_2>, weight=0.8)
-> Verify: returns updated edge with weight=0.8
delete_edge(context_id=..., source_id=<memory_id>, target_id=<memory_id_2>)
-> Verify: success response (edge deleted)
list_tags(context_id=...)
-> Verify: returns {status: "success", tags: [...], total: N} with N >= 1
-> Verify: at least one entry has tag="smoke-test" (created via remember in step 3)
list_tags(context_id=..., prefix="smoke")
-> Verify: every returned tag, lowercased, starts with "smoke" (the prefix
filter is case-insensitive per the MCP schema, so this stays correct even
if a tag was stored as "Smoke-Foo")
Owner-scoped API-key binding introspection. No resource setup required — these are read-only:
list_my_bindings()
-> Verify: status=success; returns a bindings array (may be empty; count >= 0)
-> Save the first binding's id as binding_id, if any
describe_binding(binding_id=<binding_id from list_my_bindings>)
-> Verify: if list_my_bindings returned >= 1 binding, describe the first → success with its details
-> Verify: if no bindings exist, call describe_binding with a fake UUID
("00000000-0000-0000-0000-000000000000") and verify a not_found / permission_denied error
instead (no side effects either way)
Create a second temporary context, then test merge and usage:
create_context(name="smoke-test-merge-{unix_timestamp}", description="Merge target for smoke test.")
-> Save returned context_id as merge_target_id
merge_contexts(source_context_id=<context_id>, target_context_id=<merge_target_id>)
-> Verify: success response with merged memory count
get_usage()
-> Verify: returns plan, memories.used, contexts.used (no error)
Note: analyze_context is not included because it requires workspace owner role, Pro-tier feature access, billing, a configured BYOK key, and per-day quota availability.
Pre-condition: list_analyses and get_active_analysis require the workspace owner role and the analysis feature to be available. Valid responses include gate errors (permission_denied, feature_not_available) — the smoke test should treat these as acceptable outcomes, not failures.
list_analyses(context_id=...)
-> Verify: returns items array, or gate error (`permission_denied`, `feature_not_available`) — treat all as PASS
get_active_analysis(context_id=...)
-> Verify: returns analysis run object, `no_succeeded_run`, or gate error (`permission_denied`, `feature_not_available`) — treat all as PASS
get_analysis(run_id="00000000-0000-0000-0000-000000000000")
-> Verify: returns `run_not_found` error (expected — fake run_id)
get_analysis(run_id="this-is-not-a-uuid")
-> Verify: returns error response (invalid UUID format)
get_cluster(run_id="00000000-0000-0000-0000-000000000000", cluster_index=0)
-> Verify: returns `cluster_not_found` error (expected — fake run_id)
get_cluster(run_id="this-is-not-a-uuid", cluster_index=0)
-> Verify: returns error response (invalid UUID format)
Note: get_sleep_history and get_sleep_report are read-only inspection tools. rollback_sleep_run is mutating — fake IDs verify error handling without side effects.
get_sleep_history(context_id=...)
-> Verify: returns `{reports: [...], count: ...}` (no error; may be empty)
get_sleep_history(context_id=..., limit=3)
-> Verify: returns at most 3 reports in the `reports` array (no error)
get_sleep_report(report_id="00000000-0000-0000-0000-000000000000")
-> Verify: returns `report_not_found` error (expected — fake report_id)
get_sleep_report(report_id="this-is-not-a-uuid")
-> Verify: returns `invalid_report_id` error (invalid UUID format)
rollback_sleep_run(report_id="00000000-0000-0000-0000-000000000000")
-> Verify: returns `report_not_found` error (expected — fake report_id)
rollback_sleep_run(report_id="this-is-not-a-uuid")
-> Verify: returns `invalid_report_id` error (invalid UUID format)
Pre-check: Call get_usage() and check the plan. If the plan is free or basic, skip this section entirely and note "Resource tools skipped — PRO plan required" in the report.
setup_resource(name="smoke-test-resource-{unix_timestamp}", resource_id="smoke_test_{unix_timestamp}")
-> Verify: returns context_id (UUID), resource_id, token (plaintext), token_id
-> Save context_id as resource_context_id, resource_id, and token
ingest_events(resource_id=<resource_id>, events=[
{"op": "upsert", "doc_id": "TEST-001", "version": 1, "payload": {"name": "Test Product", "price": 1000}},
{"op": "upsert", "doc_id": "TEST-002", "version": 1, "payload": {"name": "Test Product 2", "price": 2000}}
])
-> Verify: created_count=2, failed_count=0, event_ids has 2 entries
get_resource_impact(resource_id=<resource_id>)
-> Verify: token_count >= 1, current_schema_version is null (no schema created)
get_resource_schema(resource_id=<resource_id>)
-> Verify: returns schema_not_found error (expected — no schema exists yet)
list_resource_tokens(resource_id=<resource_id>)
-> Verify: returns tokens array with at least 1 token matching resource_id
Resource cleanup (runs even if some steps failed):
delete_context(context_id=<resource_context_id>)
-> Verify: success response (resource context soft-deleted)
setup_connector provisions an external connector (Slack/Discord/Teams) and requires platform
credentials plus a real connector target, so it is not exercised inline (it would create
external side effects):
setup_connector — SKIP (documented)
-> Reason: needs external connector platform credentials + a live target; cannot be exercised
inline without side effects. Covered separately in connector integration tests.
Unpin and delete the pinned memory first (so delivery_mode="always" state does not survive the run), then tear down the remaining artifacts. The agent-state entry (set_state) needs no explicit delete — it is removed with its context below and also expires via its TTL.
update_memory(memory_id=<pinned_memory_id>, context_id=..., delivery_mode="on_recall")
-> Verify: success response (pinned memory unpinned — no longer deterministically loaded)
forget(memory_id=<pinned_memory_id>, context_id=...)
-> Verify: success response (pinned memory deleted)
delete_context(context_id=<merge_target_id>)
-> Verify: success response (merge target soft-deleted, along with its memories)
forget(memory_id=<memory_id_2>, context_id=...)
-> Verify: success response (memory 2 deleted from source)
delete_context(context_id=...)
-> Verify: success response (source context deleted — also removes its agent-state entries)
Reconcile this skill against the canonical registry so the "all MCP tools" claim cannot silently
rot. The source of truth is backend/src/mcp_server/tools/__init__.py (45 tools). Every
registered tool must be in exactly one column below. If tools/__init__.py and this table
disagree, the skill is out of date — add a row (or a documented exclusion) before merging.
Optional live assertion: count the keys in the registry and confirm it equals 45 (the number this table is built for); if it differs, a tool was added/removed and this skill needs updating:
grep -cE '^\s*"[a-z_]+"\s*:\s*handle_' backend/src/mcp_server/tools/__init__.py
-> Verify: equals 45 (else: reconcile this section with the registry)
Exercised inline (33 core tools): list_contexts, create_context, get_context_info,
update_context, update_search_config, remember (incl. delivery_mode="always"), recall (incl.
source_uri_prefix / source_type / trust_tier filters), recall_upcoming, load_pinned,
feedback, set_state, get_state, reference, explore, update_memory, forget, list_edges, create_edge,
update_edge, delete_edge, list_tags, list_my_bindings, describe_binding, merge_contexts, get_usage,
list_analyses, get_active_analysis, get_analysis, get_cluster, get_sleep_history, get_sleep_report,
rollback_sleep_run, delete_context.
Exercised only on PRO plan, else SKIP (5 tools): setup_resource, ingest_events, get_resource_impact, get_resource_schema, list_resource_tokens.
Documented exclusions / gated-skip (7 tools) — with reasons:
| Tool | Why not exercised inline |
|---|---|
analyze_context | Requires billing, BYOK key, workspace owner role, and Pro-tier feature access. |
setup_connector | Provisions an external connector (Slack/Discord/Teams); needs platform credentials + a live target. Covered in connector integration tests. |
init_file_upload | Multipart S3/R2 upload flow — can't be exercised inline. |
complete_file_upload | Completes a multipart upload begun by init_file_upload. |
get_file_download_url | Requires an uploaded object to produce a presigned URL. |
delete_file | Requires an uploaded object to delete. |
list_files | Grouped with the file-upload flow; covered in the file-tools test suite. |
33 + 5 + 7 = 45 — the full registry. The gap between "exercised inline" and the registry is only the 5 PRO-gated rows (run when PRO) and the 7 documented exclusions above.
Print a summary table (numbers are illustrative; the executed order follows the sections above):
## MCP Smoke Test Results
| # | Tool | Action | Result |
|---|------|--------|--------|
| 1 | list_contexts | List contexts | PASS/FAIL |
| 2 | create_context | Create test context | PASS/FAIL |
| 3 | get_context_info | Get context details | PASS/FAIL |
| 4 | update_context | Update display name | PASS/FAIL |
| 5 | update_search_config | Update search weights | PASS/FAIL |
| 6 | remember | Create test memory (source_uri, source_type) | PASS/FAIL |
| 7 | remember | Create time memory | PASS/FAIL |
| 8 | remember | Create pinned memory (delivery_mode="always") | PASS/FAIL |
| 9 | recall | Search for memory | PASS/FAIL |
| 10 | recall | Search with include_explore_hints=true | PASS/FAIL |
| 11 | recall | Search with source_uri_prefix filter | PASS/FAIL |
| 12 | recall | Search with source_type filter | PASS/FAIL |
| 13 | recall | Search with trust_tier="trusted" filter | PASS/FAIL |
| 14 | reference | Get full memory | PASS/FAIL |
| 15 | explore | Graph traversal | PASS/FAIL |
| 16 | recall_upcoming | List upcoming time memories | PASS/FAIL |
| 17 | load_pinned | Deterministic load of pinned set | PASS/FAIL |
| 18 | feedback | Record helpful signal on a recall result | PASS/FAIL |
| 19 | set_state | Set + overwrite agent run-state (TTL) | PASS/FAIL |
| 20 | get_state | Read one key + list all live keys | PASS/FAIL |
| 21 | update_memory | Update memory | PASS/FAIL |
| 22 | recall (verify) | Verify update | PASS/FAIL |
| 23 | remember | Create 2nd memory (linked_memory_ids, linked_source_uris) | PASS/FAIL |
| 24 | list_edges (verify) | Verify declared link (linked_memory_ids → target, weight/confidence 1.0) | PASS/FAIL |
| 25 | create_edge | Create test edge | PASS/FAIL |
| 26 | list_edges | List edges | PASS/FAIL |
| 27 | update_edge | Update edge weight | PASS/FAIL |
| 28 | delete_edge | Delete edge | PASS/FAIL |
| 29 | list_tags | List tags in context | PASS/FAIL |
| 30 | list_tags | List tags with prefix filter | PASS/FAIL |
| 31 | list_my_bindings | List owner-scoped API-key bindings | PASS/FAIL |
| 32 | describe_binding | Describe a binding (or fake-id error) | PASS/FAIL |
| 33 | create_context | Create merge target context | PASS/FAIL |
| 34 | merge_contexts | Merge source into target | PASS/FAIL |
| 35 | get_usage | Get workspace usage | PASS/FAIL |
| 36 | list_analyses | List analysis runs | PASS/FAIL |
| 37 | get_active_analysis | Get latest succeeded analysis | PASS/FAIL |
| 38 | get_analysis | Get analysis by run_id (fake ID, error handling) | PASS/FAIL |
| 39 | get_cluster | Get cluster detail (fake run_id, error handling) | PASS/FAIL |
| 40 | get_sleep_history | Get sleep maintenance history | PASS/FAIL |
| 41 | get_sleep_report | Get sleep report (fake ID, error handling) | PASS/FAIL |
| 42 | rollback_sleep_run | Rollback sleep run (fake ID, error handling) | PASS/FAIL |
| 43 | update_memory | Unpin pinned memory (delivery_mode="on_recall") | PASS/FAIL |
| 44 | forget | Delete pinned memory | PASS/FAIL |
| 45 | delete_context | Soft-delete merge target and its memories | PASS/FAIL |
| 46 | forget | Delete memory 2 | PASS/FAIL |
| 47 | delete_context | Delete source context (+ its agent-state) | PASS/FAIL |
| P1 | setup_resource | Create resource context + token (PRO only) | PASS/FAIL/SKIP |
| P2 | ingest_events | Batch ingest 2 test events (PRO only) | PASS/FAIL/SKIP |
| P3 | get_resource_impact | Get resource stats (PRO only) | PASS/FAIL/SKIP |
| P4 | get_resource_schema | Get schema (expect not_found) (PRO only) | PASS/FAIL/SKIP |
| P5 | list_resource_tokens | List tokens for resource (PRO only) | PASS/FAIL/SKIP |
| P6 | delete_context | Delete resource context (PRO only) | PASS/FAIL/SKIP |
**Result: N/47 core rows passed** (+ N/6 PRO resource rows passed, or SKIP if not PRO)
Note: the 47 rows are test *steps*, not distinct tools — several tools (remember, recall,
update_memory, forget, delete_context, list_edges, list_tags) are exercised in multiple rows.
Distinct-tool coverage is reconciled in the Coverage cross-check (33 core + 5 PRO + 7 documented-skip = 45).
Test context: smoke-test-{timestamp} (cleaned up)
Documented exclusions / gated-skip (see Coverage cross-check) — not counted as FAIL:
- analyze_context (billing + BYOK + owner + Pro-tier)
- setup_connector (external connector credentials + live target)
- File tools: init_file_upload, complete_file_upload, get_file_download_url, delete_file, list_files (multipart S3/R2)
Registry reconciliation: 33 core + 5 PRO + 7 documented-skip = 45 tools in tools/__init__.py.
If any step fails:
npx claudepluginhub kagura-ai/memory-cloud --plugin kagura-memory/smoke-testGenerates fast smoke tests (<5min) for critical paths like system health, authentication, and core features, outputting Markdown with Jest-style JavaScript test suite.
/smoke-testRuns smoke tests for common web user flows (login, signup, checkout, navigation, search) using personas. Requires --url and --preset; outputs reports with optional Playwright traces and GitHub PR comments.