Search everything...

Stats

Actions

Available In

agent-stdlib

Name: agent-stdlib
Author: pebeto

By pebeto

Agent-building practices from Anthropic's engineering blog, packaged as installable Claude Code skills, MCP servers, and a tool-gating hook. Covers the gaps no existing skill pack fills.

npx claudepluginhub pebeto/agent-stdlib --plugin agent-stdlib

Popularity

Stars

Above avg

Med: 0·Avg: 285

Installs

Med: 0·Avg: 1

What's Inside

Slash Commands2

Autonomous Loop

/autonomous-loop

Set up and explain the lock-file autonomy loop for running unsupervised agents on one shared repo.

Research

/research

Run an orchestrator-worker research pass that decomposes a question into parts, dispatches parallel research-worker subagents, and synthesizes a cited answer.

Agents1

research-worker

/research-worker

Worker subagent for an orchestrated research run. Takes one objective and a notes-file path, searches broad-to-narrow, writes cited findings to the file, and returns a short summary. Dispatched by the /research command, not for direct use.

Skills8

build-agent-evals

/build-agent-evals

Build automated evaluations for an AI agent from scratch: collecting tasks from real failures, choosing code/model/human graders, picking pass@k vs pass^k, building an isolated harness, and keeping the suite honest over time. Use this whenever someone wants to measure, benchmark, or regression-test an agent, write an eval harness for an LLM agent, decide how to grade non-deterministic output, set up an LLM-as-judge, or asks any version of "how do I know if my agent is actually getting better." Trigger even when they say "tests for my agent," "eval set," or "agent benchmark" rather than the word "evals." Not for container or resource limits making scores flaky across runs; that's calibrate-eval-infrastructure.

calibrate-eval-infrastructure

/calibrate-eval-infrastructure

Stop the machine from deciding your benchmark. Configure and validate the container and runtime resources for an agentic coding eval so infrastructure noise stays inside statistical bounds instead of swinging scores more than the models do. Use this whenever someone runs SWE-bench or any agentic coding benchmark in containers, sees scores jump between runs for no code reason, suspects OOM kills or flaky infra are skewing results, sets container memory or CPU limits for an eval harness, or wants to trust a leaderboard delta. Trigger on "my benchmark scores are inconsistent," "OOM during eval," "how much memory should the eval container get," and similar. Not for designing the eval tasks or graders themselves; that's build-agent-evals.

coding-agent-scaffold

/coding-agent-scaffold

Design the tool interface for a coding agent so the model stops misusing it. Covers the minimal two-tool scaffold (a bash tool plus a file editor), exact single-match string replacement, absolute-path rules, and error-proofing the tool descriptions so common model mistakes become impossible. Use this whenever someone is building a coding agent or SWE-bench-style harness, designing a bash or file-edit tool for an agent, deciding how much scaffolding to impose, or debugging an agent that keeps editing the wrong place, fumbling multi-line edits, or escaping shell commands wrong. Trigger on "build a coding agent," "str_replace tool," "agent keeps breaking the file," and similar. Not for general MCP or service tool design; this is the bash plus file-editor interface specifically.

durable-agent-architecture

/durable-agent-architecture

Structure a long-lived agent service so any part can crash and resume. Decompose it into brain (model plus harness), hands (ephemeral sandbox and tools), and session (a durable event log), each replaceable on its own, with wake/resume semantics and credentials kept out of the execution environment. Use this whenever someone designs a production or long-running agent backend, asks how to make agents crash-recoverable or resumable, worries about losing session state when a container dies, needs to scale agents as a service, or asks where to keep credentials for an agent that runs code. Trigger on "agent infrastructure," "resume an agent after a crash," "agent runs for hours," "where do tokens live," and similar. Not for parallelizing work across agents or coordinating a shared repo; see multi-agent-orchestration and parallel-autonomous-agents.

multi-agent-orchestration

/multi-agent-orchestration

Run an orchestrator-worker system for breadth-first research: a lead agent plans, spawns three to five subagents with their own context windows, and synthesizes their findings. Covers when multi-agent actually beats a single agent and when it just burns tokens, how to delegate so subagents do not overlap, broad-to-narrow search, writing findings to a filesystem, and how to evaluate the system. Use this whenever someone wants to parallelize research or exploration across agents, asks how to coordinate a lead and subagents, considers a multi-agent setup, or asks whether multi-agent is worth it for their task. Trigger on "orchestrator and workers," "parallel research agents," "lead agent spawns subagents," "should this be multi-agent," and similar.

Hooks1

Event Hooks

Bash

1 hook across 1 event

MCP Servers2

agent-stdlib-think

admin

agent-stdlib-tool-gateway

admin

Stats

Version0.3.0

LanguagePython

Stars1

MaintenanceExcellent

LicenseMIT

Last CommitJun 16, 2026

AddedJun 16, 2026

Actions

View on GitHub View README Plugin Marketplace JSON

Own this plugin?

Verify ownership to unlock analytics, metadata editing, and a verified badge. GitHub access is read-only (username + org membership).

Available In

agent-stdlib

Safety Signals

Critical

Admin access level

Server config contains admin-level keywords

Caution

Executes bash commands

Hook triggers when Bash tool is used

README

agent-stdlib

A standard library for building agents.

Anthropic's engineering blog documents how to build, evaluate, and run agents in production. Most of that knowledge never ships as something you can install; it stays prose you reopen when you hit the problem it solves. agent-stdlib packages the parts nobody else has: Claude Code skills, a few MCP servers, and a tool-gating hook.

Each component names the article it comes from and says how it differs from any skill that already covers similar ground. The pack ships only what was missing. Topics that strong community skills already handle stay out, with pointers below.

Skills

Skill	What it gives you	Source article
`build-agent-evals`	Build automated evals for an agent: pick a grader, choose pass@k vs pass^k, run the zero-to-one roadmap	Demystifying evals for AI agents
`calibrate-eval-infrastructure`	Stop container resource limits from swinging benchmark scores more than the models do	Quantifying infrastructure noise in agentic coding evals
`coding-agent-scaffold`	Design the two-tool (bash + file editor) interface for a coding agent so the model stops misusing it	Raising the bar on SWE-bench Verified
`durable-agent-architecture`	Split an agent service into brain, hands, and session so any part can crash and resume	Scaling Managed Agents
`sandboxing-agentic-systems`	Contain an agent that runs code or reads untrusted content, layer by layer	How we contain Claude
`using-the-think-step`	Decide when a mid-task reasoning step helps and how to prompt for it	The "think" tool
`multi-agent-orchestration`	Run an orchestrator-worker research system with parallel subagents	How we built our multi-agent research system
`parallel-autonomous-agents`	Coordinate unsupervised agents on one git repo with lock files and an autonomy loop	Building a C compiler with parallel Claudes

Install

Add the marketplace and install the plugin:

/plugin marketplace add pebeto/agent-stdlib
/plugin install agent-stdlib@agent-stdlib

Skills trigger themselves when a task matches their description. You can also load one explicitly with the Skill tool.

MCP servers

Three servers live under mcp-servers/, each paired with a skill. They need uv, which installs each server's one dependency from the script header on first run.

think (enabled). The no-op think tool, paired with using-the-think-step.
tool-gateway (enabled). search_tools and call_tool over a larger catalog, so the agent reaches many tools through two. Paired with the tool-scaling guidance in advanced-tool-use.
code-execution (opt-in). Presents tools as importable code and runs composed Python in a subprocess. It executes model-written code, so it is not enabled by default. Turn it on once you have wrapped it in real isolation; see mcp-servers/code-execution/README.md and the sandboxing-agentic-systems skill.

think and tool-gateway are wired into the plugin's .mcp.json. To enable code-execution, point your client's MCP config at uv run .../mcp-servers/code-execution/server.py.

Commands, agent, and gate

/research <question> runs the orchestrator-worker flow: it decomposes the question, dispatches research-worker subagents in parallel, and synthesizes a cited answer. Paired with multi-agent-orchestration.
/autonomous-loop sets up lock-file coordination for unsupervised agents on one repo, using scripts/locks.py and scripts/autonomy_loop.sh. Paired with parallel-autonomous-agents.
action-gating is a PreToolUse hook that tiers Bash commands by risk and denies or asks on the dangerous ones. It stays off until you set AGENT_STDLIB_GATING=warn or enforce, and it only ever adds friction. See hooks/README.md.

Beyond Claude Code

Most of this pack is not Claude-specific. The MCP servers speak the open MCP protocol, the scripts are plain Python and Bash, and the skill content is harness-neutral procedural knowledge. To use it in OpenCode, Cursor, Cline, or a custom agent on any model, see AGENTS.md, which maps each component to its portable form.

Already covered elsewhere

These topics from the same blog have solid community skills, so they stay out of this pack. Reach for these instead:

View full README on GitHub

agent-stdlib

Popularity

What's Inside

Confidence

README

agent-stdlib

Skills

Install

MCP servers

Commands, agent, and gate

Beyond Claude Code

Already covered elsewhere

Similar Plugins

claude-token-reducer

pro-workflow

drawio-diagramming

mempalace

agent-stdlib

Skills

Install

MCP servers

Commands, agent, and gate

Beyond Claude Code

Already covered elsewhere

Popularity

Health & Quality

Similar Plugins

claude-token-reducer

pro-workflow

drawio-diagramming

mempalace

context-mode

creative-writing