From debug-agent
Design a strategic logging, metrics, and tracing plan for production debugging and observability. Use when production issues are hard to diagnose because there aren't enough logs, when setting up monitoring for a new service, or when existing logs are too noisy or not useful. Trigger for: "we can't tell what's happening in production", "add logging to this service", "design observability for X", "we need better metrics", or after a production incident where lack of visibility was a problem.
How this skill is triggered — by the user, by Claude, or both
Slash command
/debug-agent:debug-instrumentThis skill is limited to the following tools:
The summary Claude sees in its skill listing — used to decide when to auto-load this skill
Good instrumentation makes production invisible problems visible. The challenge is adding enough signal without creating noise or performance overhead.
Good instrumentation makes production invisible problems visible. The challenge is adding enough signal without creating noise or performance overhead.
Ask the user which component or service to instrument if not already specified. Then analyze the code and design a comprehensive instrumentation strategy.
Before deciding where to add instrumentation, map what matters:
For each path: what does success look like? What does failure look like? What intermediate states matter?
Three levels — choose based on the path's criticality:
Entry/Exit Points:
Error Conditions:
Performance Markers:
State Changes:
Business Events:
Detailed Tracing:
Remove Level 3 instrumentation once the investigation is complete, or gate it behind a feature flag.
Structured logging makes logs queryable. Use JSON format with these standard fields:
{
"timestamp": "ISO-8601 format",
"level": "DEBUG | INFO | WARN | ERROR",
"service": "service-name",
"trace_id": "correlation ID for distributed tracing",
"span_id": "ID for this specific operation",
"user_id": "if request is user-scoped",
"operation": "what is happening",
"duration_ms": "for completed operations",
"status": "success | failure | timeout",
"error": "error details if applicable",
"metadata": {
"custom": "contextual fields"
}
}
Sampling strategy (to control volume):
Metrics (unlike logs) are aggregatable. Design metrics that answer operational questions:
RED Metrics (for every service):
USE Metrics (for resources):
Business Metrics (per domain):
For microservices, design trace context propagation:
Create spans for:
Propagate context via:
X-Trace-Id, X-Span-Id, X-Parent-Span-Id# Instrumentation Plan
**Service/Component:** [name]
**Current coverage:** [Low/Medium/High]
**Performance budget:** [acceptable overhead %]
---
## Priority 1 — Implement This Week
### [Component/Endpoint name]
**Entry point logging:**
```[language]
// Location: [file:line]
logger.info("Operation started", {
operation: "name",
trace_id: ctx.traceId,
user_id: ctx.userId,
// key inputs
});
Exit/error logging:
// On success
logger.info("Operation completed", { duration_ms: Date.now() - start, status: "success" });
// On error
logger.error("Operation failed", { error: err.message, stack: err.stack, duration_ms: ... });
Metric:
operation_duration_ms{operation="name", status="success|failure"}
[Repeat for each critical path]
// Recommended metric definitions
[concrete metric code for the target language/framework]
- name: high_error_rate
condition: error_rate > 1%
severity: critical
- name: high_latency
condition: p95_latency > 1000ms
severity: warning
| Level | Development | Staging | Production |
|---|---|---|---|
| DEBUG | On | On | Off |
| INFO | On | On | Sampled (10%) |
| WARN | On | On | On |
| ERROR | On | On | On (100%) |
| Type | CPU overhead | Memory overhead | I/O impact |
|---|---|---|---|
| Structured logging | <1% | <10MB | Async buffered |
| Metrics | <0.5% | <5MB | Batched |
| Tracing | <2% | <20MB | Sampled |
npx claudepluginhub mkellerman/bmad-skills --plugin debug-agentGuides creation, editing, and verification of skills for AI coding agents using test-driven development with subagent scenarios. Use when authoring or debugging skills.