By victoriakaey
15 engineering skills covering the full LLM agent development lifecycle: problem exploration, architecture decisions, prompt design, experiment-driven development, prompt change management, regression testing, critic/judge design, full-stack integration, AI system design, database design, model selection, memory systems, harness design, DevOps, and code review
Use at the start of any LLM agent project, or when reconsidering an existing architecture. Guides decisions across four layers: workflow vs agent, single-agent vs multi-agent, tool-use vs specialized nodes, and retrieval strategy. Each decision has concrete tradeoffs and a recommended default.
Use when connecting an LLM agent to a full-stack application, external API, or third-party platform. Covers four integration patterns (REST, WebSocket/SSE, Webhook, Message Queue), interface design, reliability, security, and observability. Framework-agnostic — guides you to the right pattern for your situation, then gives concrete implementation direction for your chosen stack.
Use when a user has an idea for a product or feature that might involve AI, but doesn't know where to start or how to design the system. Guides non-technical users through a conversational process to clarify their idea, decide where AI fits, and produce a system design they can understand — and that Claude can use to start building. One question at a time. Never assume technical knowledge.
Use when reviewing code — either Claude reviews your code and produces a structured report, or Claude guides you through reviewing someone else's code. Default mode: Claude performs the review and produces a report organized by severity. Second mode: guided self-review with a structured checklist and probing questions. Covers correctness, security, performance, maintainability, and AI-specific concerns for LLM applications.
Use when designing any LLM-as-Judge, Critic, or Evaluator node. Covers input structure, output schema, chain-of-thought ordering, single-pass vs multi-stage tradeoffs, and known failure modes. Prevents the most common design mistakes that cause Critic nodes to be unreliable.
Own this plugin?
Verify ownership to unlock analytics, metadata editing, and a verified badge. GitHub access is read-only (username + org membership).
Sign in to claimOwn this plugin?
Verify ownership to unlock analytics, metadata editing, and a verified badge. GitHub access is read-only (username + org membership).
Sign in to claimBased on adoption, maintenance, documentation, and repository signals. Not a security audit or endorsement.
An engineering operating system for building reliable LLM agents.
15 skills for Claude Code covering the full agent development lifecycle. Each skill was extracted from building a production LangGraph agent with a Critic-driven retrieval loop. Framework-agnostic — the principles apply to any agent architecture.
These are real failure modes that led to the skills in this collection.
Critic produced confident but wrong judgments. Input was organized per sub-question. When the answer appeared in a different sub-question's results, the Critic concluded "not found" — the input structure primed per-section reasoning. Fix was restructuring to a flat evidence pool, not changing the prompt. → critic-judge-design
Prompt change improved one case, regressed three others. No baseline existed. No regression batch ran before committing. The process failure wasn't the prompt edit — it was the lack of a safety net. → prompt-change-management, regression-testing
Agent looped because tool observations were too lossy. Search returned 50 raw rows. The LLM couldn't find the signal, so it retried with rephrased queries — indefinitely. The tool's output format made the correct next action unrecoverable. → harness-design
Verdict-before-reasoning caused post-hoc justification. The output schema put the boolean verdict before the reasoning field. LLMs generate left-to-right — the model committed to a verdict first, then rationalized. Moving one field fixed it. → critic-judge-design
Skills activate automatically based on what you're doing.
| Skill | When to use |
|---|---|
ai-system-design | Have an idea but don't know where to start |
problem-exploration | Facing a problem with multiple possible approaches |
agent-architecture | Deciding architecture — workflow vs agent, single vs multi |
prompt-design | Writing a new prompt, or diagnosing one producing wrong output |
experiment-driven-development | Starting any implementation task |
prompt-change-management | About to change a prompt |
regression-testing | Comparing two system versions |
critic-judge-design | Designing a Judge, Critic, or Evaluator component |
harness-design | Agent misbehaves — wrong tools, missed data, loops |
agent-integration | Connecting agent to a web app, API, or third-party platform |
database-design | Designing a schema (general or AI-specific) |
model-selection | Choosing a model, API vs local, fine-tuning decisions |
memory-system | Designing how an agent remembers across sessions |
devops | Deploying to production, CI/CD, monitoring |
code-review | Reviewing code with structured severity levels |
All skills read from and write to PROJECT.md — a shared state contract at the repo root that keeps decisions, progress, and known issues in one place across sessions.
Treat every change as an experiment, not a fix. Every change needs before/after measurement. Regressions are expected.
Structure beats instructions. Input organization and output schema shape LLM reasoning more than prompt rules. Fix structure before adding rules.
Evidence before code. Start with real system output. Write down the failure mode and hypothesis before touching any file.
One job per component. If it does two things, split it.
/plugin marketplace add Victoriakaey/build-reliable-agents
/plugin install build-reliable-agents@build-reliable-agents
git clone https://github.com/Victoriakaey/build-reliable-agents.git
cp -r build-reliable-agents/skills/ /mnt/skills/user/build-reliable-agents/
/plugin update build-reliable-agents
Contributions welcome. Each skill should be grounded in real observed failures, not theoretical best practices. Follow the existing SKILL.md structure.
MIT License. See LICENSE for details.
npx claudepluginhub victoriakaey/build-reliable-agents --plugin build-reliable-agentsDoc-first per-phase workflow skill: brainstorm → spec → plan → execute → dual-track smoke → handoff → PR → milestone-done. Includes cost-aware behaviors (offset+limit reads, compact output flags, codemap pattern, RTK/token-savior adoption guidance).
Comprehensive skill pack with 66 specialized skills for full-stack developers: 12 language experts (Python, TypeScript, Go, Rust, C++, Swift, Kotlin, C#, PHP, Java, SQL, JavaScript), 10 backend frameworks, 6 frontend/mobile, plus infrastructure, DevOps, security, and testing. Features progressive disclosure architecture for 50% faster loading.
Develop, test, build, and deploy Godot 4.x games with Claude Code. Includes GdUnit4 testing, web/desktop exports, CI/CD pipelines, and deployment to Vercel/GitHub Pages/itch.io.
A growing collection of Claude-compatible academic workflow bundles. Covers scientific figures, manuscript writing and polishing, reviewer assessment, citation retrieval, data availability, paper reading, literature search, response letters, paper-to-PPTX conversion, and evidence-grounded Chinese invention patent drafting. Rules are organized as reusable skill folders with explicit workflows and quality checks.
Comprehensive feature development workflow with specialized agents for codebase exploration, architecture design, and quality review
Intelligent draw.io diagramming plugin with AI-powered diagram generation, multi-platform embedding (GitHub, Confluence, Azure DevOps, Notion, Teams, Harness), conditional formatting, live data binding, and MCP server integration for programmatic diagram creation and management.
Access thousands of AI prompts and skills directly in your AI coding assistant. Search prompts, discover skills, save your own, and improve prompts with AI.