From tooling
Build MCP servers that AI agents actually want to use. Covers the full lifecycle — tool design (naming, schemas, descriptions), resource design (URIs, templates, subscriptions), project structure, transport selection (stdio vs Streamable HTTP), security, error handling, and testing. Use this skill when building a new MCP server, adding tools or resources to an existing one, reviewing an MCP server for quality, choosing between stdio and HTTP transport, designing tool schemas for LLM consumption, or hardening an MCP server for production. Also activates for questions about tool naming conventions, Pydantic Field descriptions, Zod validation for MCP, resource URI schemes, or MCP server security patterns.
How this skill is triggered — by the user, by Claude, or both
Slash command
/tooling:mcp-server-craftThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
Build MCP servers that LLMs and AI agents can use reliably. A good MCP server makes the agent feel competent — clear tool names, helpful descriptions, structured errors, and predictable behavior.
Build MCP servers that LLMs and AI agents can use reliably. A good MCP server makes the agent feel competent — clear tool names, helpful descriptions, structured errors, and predictable behavior.
This skill covers the full lifecycle:
| Phase | What you do | Key question |
|---|---|---|
| Design | Tool names, schemas, descriptions, resource URIs | Can the LLM understand what to call and why? |
| Build | Project structure, transport, implementation | Is the server clean and maintainable? |
| Harden | Security, error handling, validation | Can the server handle malicious or unexpected input? |
| Test | Functional, integration, agent workflow tests | Does it work when a real agent calls it? |
State which phase you need, or describe what you're building.
For expanded tool and resource design patterns with examples, load
references/tool-design.md.
The most important thing about an MCP server is whether the LLM can figure out how to use it. Tool names, descriptions, and schemas are your API — the LLM reads them to decide what to call and how.
Names should be verbs that tell the model exactly what happens:
| Pattern | Examples | Why it works |
|---|---|---|
verb_noun | read_file, search_issues, create_bucket | Action is clear, noun scopes it |
get_status | get_build_status, get_user_profile | Read-only intent is obvious |
list_* | list_repositories, list_connections | Signals pagination/collection |
Rules:
_, or -snake_case and kebab-casesearch_code not code_searchDescriptions are documentation the LLM reads at inference time. They directly affect whether the agent picks the right tool.
Good: "Search for code across repositories using a text query.
Returns matching file paths and line numbers.
Use this when the user wants to find where something is defined or used."
Bad: "Code search functionality."
Include in every description:
Define input schemas with rich metadata. The LLM reads field descriptions to fill in parameters correctly.
TypeScript (Zod):
server.tool("search_issues", "Search for issues by query text, label, or status.",
{
query: z.string().describe("Search text to match against issue title and body"),
status: z.enum(["open", "closed", "all"]).default("open")
.describe("Filter by issue status. Default: open"),
limit: z.number().min(1).max(100).default(20)
.describe("Maximum results to return. Default: 20"),
},
async ({ query, status, limit }) => { /* ... */ }
);
Python (Pydantic Field):
@mcp.tool()
async def search_issues(
query: str = Field(..., description="Search text to match against issue title and body"),
status: Literal["open", "closed", "all"] = Field("open", description="Filter by issue status"),
limit: int = Field(20, ge=1, le=100, description="Maximum results to return"),
) -> list[Issue]:
"""Search for issues by query text, label, or status.
Returns matching issues with title, body preview, and metadata."""
...
Key patterns:
Field(...) (required) vs Field(default) (optional) — never leave ambiguousge, le, min, max, Literal, enum) so the LLM knows valid ranges"IMPORTANT: Provide the full absolute path, not relative"Resources expose read-only data via URIs. Use them for context the agent needs before picking a tool.
resource://connections → List available service connections
resource://schema/users → Database schema for users table
file:///workspace/config.yaml → Project configuration
Rules:
postgres://, jira://)application/json, text/markdown)resource://schema/{table_name}One server, one domain. A github-mcp-server with 8 focused tools beats an everything-server with 50 tools where the LLM can't tell search_code from search_files.
TypeScript:
mcp-server-myservice/
├── src/
│ ├── index.ts # Entry point, transport setup
│ ├── server.ts # MCP server, tool/resource registration
│ ├── tools/ # Tool implementations
│ │ ├── search.ts
│ │ └── create.ts
│ ├── resources/ # Resource implementations
│ │ └── schema.ts
│ ├── types.ts # Shared types
│ └── utils/ # Helpers (http client, validation)
├── tests/
├── package.json
├── tsconfig.json
└── README.md
Python:
mcp-server-myservice/
├── src/
│ └── myservice_mcp/
│ ├── __init__.py # __version__
│ ├── server.py # MCP server, main() entry point
│ ├── models.py # Pydantic models
│ ├── consts.py # Constants (UPPER_SNAKE_CASE)
│ └── tools/ # Tool implementations
├── tests/
├── pyproject.toml
└── README.md
Key rules:
main() that creates the server and starts transport| Transport | When to use | Client examples |
|---|---|---|
| stdio | Local tools, desktop clients | Claude Desktop, local dev |
| Streamable HTTP | Remote access, cloud deployment | Cursor, cloud agents, multi-tenant |
| HTTP/SSE (legacy) | Backward compatibility only | Older MCP clients |
Prefer Streamable HTTP for anything deployed. Use stdio for local-only tools. Support both by keeping server logic transport-agnostic.
Every tool and resource handler should be async. Use concurrency for independent operations:
# Good: concurrent fetches
results = await asyncio.gather(
fetch_issues(repo_a),
fetch_issues(repo_b),
fetch_issues(repo_c),
)
# Bad: sequential when independent
result_a = await fetch_issues(repo_a)
result_b = await fetch_issues(repo_b)
result_c = await fetch_issues(repo_c)
For expanded security patterns, input validation, sandboxing, and testing strategies, load
references/security-and-testing.md.
Return errors inside tool results so the LLM can react — don't throw protocol-level exceptions that crash the conversation.
// Good: structured error the LLM can interpret
return {
content: [{ type: "text", text: JSON.stringify({
error: "Repository not found",
suggestion: "Check the repository name. Use list_repositories to see available repos."
})}],
isError: true,
};
// Bad: raw exception that kills the tool call
throw new Error("ENOENT");
Error handling rules:
isError: true in results for recoverable errorsValidate everything at the boundary. The LLM generates parameters — they will be wrong sometimes.
../../../etc/passwd)If your server executes user-provided code (diagram generators, script runners):
| Layer | What to test | How |
|---|---|---|
| Unit | Individual tool logic, validation, error paths | Mock external dependencies |
| Integration | Tool → real service round-trip | Use test accounts or sandboxes |
| Contract | Protocol compliance, schema correctness | Validate against MCP spec |
| Agent workflow | End-to-end with a real LLM client | Call tools from an agent, check results |
Agent workflow testing is the most important and most neglected. Your tools may pass unit tests but confuse the LLM because the descriptions are ambiguous or the return format is unexpected.
isError: true and a suggestionSet the instructions field — the LLM reads this before using any tool:
const server = new McpServer({
name: "github-server",
version: "1.0.0",
instructions: "Read-only access to GitHub repos. Use search_code to find definitions, list_issues to browse bugs, get_file to read files. Always provide full repo name (owner/repo)."
});
Design → Build: Every tool has a verb-noun name ≤ 64 chars? Descriptions include what/returns/when? Schemas have constraints and field descriptions?
Build → Harden: Server starts cleanly on both stdio and HTTP? All handlers are async? Tool responses are structured JSON the LLM can parse?
Harden → Test: Input validation covers path traversal, oversized inputs, invalid types? Errors use isError: true with suggestions? Rate limiting in place for external API calls?
isError: true with suggestions, not raw exceptionsnpx claudepluginhub saif-shines/devex-kit --plugin toolingGuides creation, editing, and verification of skills for AI coding agents using test-driven development with subagent scenarios. Use when authoring or debugging skills.