From windags-skills
Architecture and systems design for building always-on AI agents with episodic memory. Covers the memory hierarchy (core/recall/archival), persistence layers, agent server infrastructure, vector stores, and framework selection. Provides concrete deployment patterns for agents that maintain identity and learn across sessions. Activate on: "always-on agent", "persistent agent architecture", "episodic memory system", "agent memory design", "long-running agent", "stateful agent", "agent that remembers", "MemGPT architecture", "Letta deployment", "/always-on-agent-architecture". NOT for: choosing what data to feed the agent (use always-on-agent-inputs), brainstorming applications (use always-on-agent-applications), safety and privacy concerns (use always-on-agent-safety), general agentic patterns (use agentic-patterns).
How this skill is triggered — by the user, by Claude, or both
Slash command
/windags-skills:always-on-agent-architectureThis skill is limited to the following tools:
The summary Claude sees in its skill listing — used to decide when to auto-load this skill
You are designing the architecture for an always-on AI agent with episodic memory. This is not a chatbot with a long context window. This is a system that persists state across sessions, manages its own memory hierarchy, runs as a service, and maintains identity over weeks and months. The core insight: treat the LLM as a CPU that operates on managed memory, not as a stateless function.
You are designing the architecture for an always-on AI agent with episodic memory. This is not a chatbot with a long context window. This is a system that persists state across sessions, manages its own memory hierarchy, runs as a service, and maintains identity over weeks and months. The core insight: treat the LLM as a CPU that operates on managed memory, not as a stateless function.
Q1: Do you want a full agent runtime (server, APIs, tools)?
├─ Yes → Use Letta (most complete, production-ready)
└─ No, I have my own agent loop
├─ Q2: Do you need temporal/relationship tracking?
│ ├─ Yes → Use Zep/Graphiti (best temporal knowledge graph)
│ └─ No → Go to Q3
│ ├─ Q3: Do you need graph + vector hybrid?
│ │ ├─ Yes → Use Mem0 (graph mode)
│ │ └─ No → Go to Q4
│ │ ├─ Q4: Already on LangGraph?
│ │ │ ├─ Yes → Use LangMem
│ │ │ └─ No → Use pgvector or Chroma
└─ Want zero dependencies? → Custom SQLite + local embeddings
| Trigger | Threshold | Action |
|---|---|---|
| Size Overflow | Core memory > 4KB | Summarize least-recent block, move summary to archival |
| Age Decay | Data unused > 30 days | Mark for compaction review |
| Relevance Drop | Access score < 0.3 | Move to archival memory with decay tag |
| User Override | User says "forget X" | Immediate removal + archival tombstone |
| Conflict Detection | Contradictory facts stored | Prompt agent to reconcile or ask user |
If query_latency_requirement < 10ms AND data_size > 100M vectors:
→ Use Qdrant (optimized for speed)
Else if already_using_postgresql:
→ Use pgvector (single DB, simpler ops)
Else if need_hybrid_search (keyword + semantic):
→ Use Weaviate (best hybrid)
Else if zero_ops_preferred:
→ Use Pinecone (fully managed)
Else:
→ Use Chroma (local-first, simple API)
Input: User message or agent observation
│
├─ Contains identity/preference update?
│ └─ Yes → Update core memory, persist immediately
├─ Requires conversation context?
│ └─ Yes → Search recall memory (conversation history)
├─ Needs factual knowledge?
│ └─ Yes → Search archival memory (vector store)
└─ External data needed?
└─ Yes → Use external tools (APIs, files, etc.)
Symptoms: Agent personality drift, contradictory responses, core memory conflicts Root Cause: Concurrent writes to core memory without locking, or failed partial updates Detection Rule: If core memory size suddenly drops >50% or contains malformed JSON/YAML Recovery Procedure:
Symptoms: Increasingly irrelevant search results, agent can't find recently stored facts Root Cause: Embedding model drift, index corruption, or no memory compaction Detection Rule: If average cosine similarity of top-3 results < 0.7 for known queries Recovery Procedure:
Symptoms: Agent hangs on memory operations, database connection timeouts Root Cause: Simultaneous read/write to same memory blocks, insufficient connection pooling Detection Rule: If memory operation takes >30s or database shows lock wait timeouts Recovery Procedure:
Symptoms: API costs spike, response latency increases, token limit errors Root Cause: Core memory bloat, retrieving too many archival chunks per query Detection Rule: If average tokens per request > 80% of model's context limit Recovery Procedure:
Symptoms: Database size grows linearly, search performance degrades over time Root Cause: No memory compaction, duplicate fact insertion, missing garbage collection Detection Rule: If total memory size grows >100MB/month with normal usage Recovery Procedure:
Scenario: Design architecture for an agent that helps with technical research, remembers your preferences, and builds knowledge over months.
Step 1 - Memory Tier Design
Core Memory (2KB):
- User name: "Sarah"
- Research domains: ["machine learning", "distributed systems"]
- Preferred paper sources: ["arxiv", "acm digital library"]
- Writing style: "detailed with code examples"
- Current project: "distributed training optimization"
Recall Memory:
- All conversations in PostgreSQL with full-text search
- 30-day retention window, then summarized
Archival Memory:
- Paper summaries, extracted insights, code snippets
- pgvector on PostgreSQL (already using it for recall)
- nomic-embed-text for local embedding (privacy + cost)
Step 2 - Framework Selection Decision Following decision tree:
Step 3 - Agent Loop Implementation
async def research_step(user_query: str):
# Load core memory
core = load_core_memory() # User prefs, active project
# Check if query relates to current project
if "optimization" in user_query.lower():
# Search archival for project-specific knowledge
relevant_papers = search_archival("distributed training optimization")
context = f"Current project context: {relevant_papers}"
else:
# Search for general domain knowledge
context = search_archival(user_query)
# Build prompt with core memory + retrieved context
system_prompt = f"""
You are Sarah's research assistant.
User preferences: {core['preferences']}
Current project: {core['current_project']}
Retrieved context: {context}
"""
response = await llm.chat([
{"role": "system", "content": system_prompt},
{"role": "user", "content": user_query}
])
# Persist interaction
save_to_recall(user_query, response)
return response
What a novice would miss:
What an expert catches:
Do NOT use this skill for:
/always-on-agent-inputs instead - that skill covers what data to feed the agent, this covers how to store and retrieve it/always-on-agent-applications instead - that skill covers use case ideation, this covers technical implementation/always-on-agent-safety instead - that skill covers data governance, consent, and security; this assumes those are already designed/agentic-patterns instead - that skill covers ReAct loops, tool use, planning; this covers the persistence layer underneath/agent-creator instead - if the agent doesn't need to remember across sessions, you don't need always-on architecturenpx claudepluginhub curiositech/windags-skills --plugin windags-skillsExplains agent memory architectures: short-term context window, long-term vector stores, CoALA cognitive types (semantic/episodic/procedural). Recommends frameworks like LangMem/MemGPT and stores like Pinecone/Qdrant.
Best practices for memory architecture design including user vs agent vs session memory patterns, vector vs graph memory tradeoffs, retention strategies, and performance optimization. Use when designing memory systems, architecting AI memory layers, choosing memory types, planning retention strategies, or when user mentions memory architecture, user memory, agent memory, session memory, memory patterns, vector storage, graph memory, or Mem0 architecture.
Guides AI agent development using ReAct, plan-and-execute, multi-agent architectures. Designs tools, memory systems, guardrails; orchestrates with LangChain, LlamaIndex, CrewAI, AutoGen.