By pebeto
Agent-building practices from Anthropic's engineering blog, packaged as installable Claude Code skills, MCP servers, and a tool-gating hook. Covers the gaps no existing skill pack fills.
Set up and explain the lock-file autonomy loop for running unsupervised agents on one shared repo.
Run an orchestrator-worker research pass that decomposes a question into parts, dispatches parallel research-worker subagents, and synthesizes a cited answer.
Build automated evaluations for an AI agent from scratch: collecting tasks from real failures, choosing code/model/human graders, picking pass@k vs pass^k, building an isolated harness, and keeping the suite honest over time. Use this whenever someone wants to measure, benchmark, or regression-test an agent, write an eval harness for an LLM agent, decide how to grade non-deterministic output, set up an LLM-as-judge, or asks any version of "how do I know if my agent is actually getting better." Trigger even when they say "tests for my agent," "eval set," or "agent benchmark" rather than the word "evals." Not for container or resource limits making scores flaky across runs; that's calibrate-eval-infrastructure.
Stop the machine from deciding your benchmark. Configure and validate the container and runtime resources for an agentic coding eval so infrastructure noise stays inside statistical bounds instead of swinging scores more than the models do. Use this whenever someone runs SWE-bench or any agentic coding benchmark in containers, sees scores jump between runs for no code reason, suspects OOM kills or flaky infra are skewing results, sets container memory or CPU limits for an eval harness, or wants to trust a leaderboard delta. Trigger on "my benchmark scores are inconsistent," "OOM during eval," "how much memory should the eval container get," and similar. Not for designing the eval tasks or graders themselves; that's build-agent-evals.
Design the tool interface for a coding agent so the model stops misusing it. Covers the minimal two-tool scaffold (a bash tool plus a file editor), exact single-match string replacement, absolute-path rules, and error-proofing the tool descriptions so common model mistakes become impossible. Use this whenever someone is building a coding agent or SWE-bench-style harness, designing a bash or file-edit tool for an agent, deciding how much scaffolding to impose, or debugging an agent that keeps editing the wrong place, fumbling multi-line edits, or escaping shell commands wrong. Trigger on "build a coding agent," "str_replace tool," "agent keeps breaking the file," and similar. Not for general MCP or service tool design; this is the bash plus file-editor interface specifically.
Structure a long-lived agent service so any part can crash and resume. Decompose it into brain (model plus harness), hands (ephemeral sandbox and tools), and session (a durable event log), each replaceable on its own, with wake/resume semantics and credentials kept out of the execution environment. Use this whenever someone designs a production or long-running agent backend, asks how to make agents crash-recoverable or resumable, worries about losing session state when a container dies, needs to scale agents as a service, or asks where to keep credentials for an agent that runs code. Trigger on "agent infrastructure," "resume an agent after a crash," "agent runs for hours," "where do tokens live," and similar. Not for parallelizing work across agents or coordinating a shared repo; see multi-agent-orchestration and parallel-autonomous-agents.
Run an orchestrator-worker system for breadth-first research: a lead agent plans, spawns three to five subagents with their own context windows, and synthesizes their findings. Covers when multi-agent actually beats a single agent and when it just burns tokens, how to delegate so subagents do not overlap, broad-to-narrow search, writing findings to a filesystem, and how to evaluate the system. Use this whenever someone wants to parallelize research or exploration across agents, asks how to coordinate a lead and subagents, considers a multi-agent setup, or asks whether multi-agent is worth it for their task. Trigger on "orchestrator and workers," "parallel research agents," "lead agent spawns subagents," "should this be multi-agent," and similar.
Admin access level
Server config contains admin-level keywords
Executes bash commands
Hook triggers when Bash tool is used
Own this plugin?
Verify ownership to unlock analytics, metadata editing, and a verified badge. GitHub access is read-only (username + org membership).
Sign in to claimOwn this plugin?
Verify ownership to unlock analytics, metadata editing, and a verified badge. GitHub access is read-only (username + org membership).
Sign in to claimBased on adoption, maintenance, documentation, and repository signals. Not a security audit or endorsement.
Uses power tools
Uses Bash, Write, or Edit tools
Uses power tools
Uses Bash, Write, or Edit tools
A standard library for building agents.
Anthropic's engineering blog documents how to build, evaluate, and run agents in production. Most of that knowledge never ships as something you can install; it stays prose you reopen when you hit the problem it solves. agent-stdlib packages the parts nobody else has: Claude Code skills, a few MCP servers, and a tool-gating hook.
Each component names the article it comes from and says how it differs from any skill that already covers similar ground. The pack ships only what was missing. Topics that strong community skills already handle stay out, with pointers below.
| Skill | What it gives you | Source article |
|---|---|---|
build-agent-evals | Build automated evals for an agent: pick a grader, choose pass@k vs pass^k, run the zero-to-one roadmap | Demystifying evals for AI agents |
calibrate-eval-infrastructure | Stop container resource limits from swinging benchmark scores more than the models do | Quantifying infrastructure noise in agentic coding evals |
coding-agent-scaffold | Design the two-tool (bash + file editor) interface for a coding agent so the model stops misusing it | Raising the bar on SWE-bench Verified |
durable-agent-architecture | Split an agent service into brain, hands, and session so any part can crash and resume | Scaling Managed Agents |
sandboxing-agentic-systems | Contain an agent that runs code or reads untrusted content, layer by layer | How we contain Claude |
using-the-think-step | Decide when a mid-task reasoning step helps and how to prompt for it | The "think" tool |
multi-agent-orchestration | Run an orchestrator-worker research system with parallel subagents | How we built our multi-agent research system |
parallel-autonomous-agents | Coordinate unsupervised agents on one git repo with lock files and an autonomy loop | Building a C compiler with parallel Claudes |
Add the marketplace and install the plugin:
/plugin marketplace add pebeto/agent-stdlib
/plugin install agent-stdlib@agent-stdlib
Skills trigger themselves when a task matches their description. You can also load one explicitly with the Skill tool.
Three servers live under mcp-servers/, each paired with a skill. They need uv, which installs each server's one dependency from the script header on first run.
think tool, paired with using-the-think-step.search_tools and call_tool over a larger catalog, so the agent reaches many tools through two. Paired with the tool-scaling guidance in advanced-tool-use.mcp-servers/code-execution/README.md and the sandboxing-agentic-systems skill.think and tool-gateway are wired into the plugin's .mcp.json. To enable code-execution, point your client's MCP config at uv run .../mcp-servers/code-execution/server.py.
/research <question> runs the orchestrator-worker flow: it decomposes the question, dispatches research-worker subagents in parallel, and synthesizes a cited answer. Paired with multi-agent-orchestration./autonomous-loop sets up lock-file coordination for unsupervised agents on one repo, using scripts/locks.py and scripts/autonomy_loop.sh. Paired with parallel-autonomous-agents.PreToolUse hook that tiers Bash commands by risk and denies or asks on the dangerous ones. It stays off until you set AGENT_STDLIB_GATING=warn or enforce, and it only ever adds friction. See hooks/README.md.Most of this pack is not Claude-specific. The MCP servers speak the open MCP protocol, the scripts are plain Python and Bash, and the skill content is harness-neutral procedural knowledge. To use it in OpenCode, Cursor, Cline, or a custom agent on any model, see AGENTS.md, which maps each component to its portable form.
These topics from the same blog have solid community skills, so they stay out of this pack. Reach for these instead:
npx claudepluginhub pebeto/agent-stdlib --plugin agent-stdlibOpen-source, local-first Claude Code plugin for token reduction, context compression, and cost optimization using hybrid RAG retrieval (BM25 + vector search), reranking, AST-aware chunking, and compact context packets.
Complete AI coding workflow system. Self-correcting memory + persistent FTS5-indexed research wikis + auto-research loop + multi-LLM council on a single SQLite store. 33 skills, 8 agents, 22 commands, 37 hook scripts across 24 events. Cross-agent via SkillKit.
Intelligent draw.io diagramming plugin with AI-powered diagram generation, multi-platform embedding (GitHub, Confluence, Azure DevOps, Notion, Teams, Harness), conditional formatting, live data binding, and MCP server integration for programmatic diagram creation and management.
Give your AI a memory — mine projects and conversations into a searchable palace. 33 MCP tools, auto-save hooks, and guided setup.
MCP server that saves 98% of your context window with session continuity. Sandboxed code execution in 11 languages, FTS5 knowledge base with BM25 ranking, and automatic state restore across compactions.
Complete creative writing suite with 10 specialized agents covering the full writing process: research gathering, character development, story architecture, world-building, dialogue coaching, editing/review, outlining, content strategy, believability auditing, and prose style/voice analysis. Includes genre-specific guides, templates, and quality checklists.