Use when you need to review error handling, logging, metrics, tracing, and user-safe diagnostics.
How this skill is triggered — by the user, by Claude, or both
Slash command
/skillry-backend-and-api:16-error-handling-observabilityThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
Review the observability posture of a backend service: whether errors are caught and propagated correctly, whether logs carry enough context to diagnose a production failure with no debugger attached, whether metrics exist to detect anomalies before users report them, whether distributed traces connect a request across service boundaries, and whether internal error details ever leak to end user...
Review the observability posture of a backend service: whether errors are caught and propagated correctly, whether logs carry enough context to diagnose a production failure with no debugger attached, whether metrics exist to detect anomalies before users report them, whether distributed traces connect a request across service boundaries, and whether internal error details ever leak to end users. The deliverable is a severity-rated findings list with concrete fixes — the goal is a service that is observable from day one, not after the first un-diagnosable incident.
console.log.timestamp, level, message, and context fields. Unstructured console.log("User " + id) cannot be queried or alerted on.requestId/traceId. Confirm the ID is generated at the edge (or read from X-Request-ID/traceparent), stored in async context (AsyncLocalStorage, contextvars, context.Context), and attached to every logger call.catch (e) {} / log-only with no re-throw), fire-and-forget async (missing await), uninspected Promise.allSettled rejections, and background-job handlers that only log without retry/alert/DLQ.res.json({ error: err.message }), { stack: err.stack }, or raw DB errors in responses. In production the body must be a user-safe message plus a reference ID; full detail goes to logs.http_request_duration_ms (histogram by method/route/status), http_requests_total (counter), error rate by status family, per-integration error rate, and queue lag where applicable. No metrics library present is a critical gap.traceparent header is propagated outbound and extracted inbound, and that DB calls, queue ops, and external calls are wrapped in spans.DEBUG lines logging full request/response bodies (PII, tokens) must not run in production; level is env-controlled (LOG_LEVEL=info), and high-throughput routes log at debug with sampling to avoid a flood./health (and any /ready probe) checks real dependencies (DB, cache) rather than returning 200 unconditionally, so the load balancer and alerts react to actual degradation.password, authorization, tokens, and card data before any log line is written, so a future log.info({ req }) cannot leak secrets.console.log); every line has timestamp, level, message.Promise.allSettled results explicitly inspected for rejections.requestId the user can quote in a support ticket.traceparent propagated on all outbound calls; extracted on inbound.DEBUG disabled in production; log level controlled by env var.error, query noise at debug).res.status(500) calls.password, authorization, tokens, and card data before any line is written.# Unstructured logging that cannot be queried/alerted
rg -n "console\.(log|error|warn)\(" src/ | wc -l
rg -n "console\.(log|error)\(" src/ | head -20
# Swallowed errors and fire-and-forget async
rg -nU "catch\s*\([^)]*\)\s*\{\s*(\}|//|console\.log)" src/
rg -nU "allSettled\([\s\S]{0,200}" src/ | rg -v "status|reason"
# Internal detail leaking to the client
rg -n "err\.message|err\.stack|error\.stack|\.stack\b" src/ | rg "res\.|reply\.|json\("
# Correlation ID plumbing present?
rg -n "AsyncLocalStorage|contextvars|requestId|traceId|X-Request-Id|traceparent" src/
# Metrics + tracing libraries present?
rg -n "prom-client|statsd|@opentelemetry|histogram|Counter\(" src/ | head
rg -n "traceparent|propagation|startSpan|tracer\." src/ | head
# Log level controlled by env, not hardcoded debug
rg -n "LOG_LEVEL|level:\s*['\"](debug|trace)" src/
# High-cardinality metric labels (kills the metrics backend)
rg -nU "(labels|tags)\([\s\S]{0,80}(userId|email|id|url|path)\b" src/
# Bodies/headers being logged (PII + token leak)
rg -n "log.*(req\.body|req\.headers|request\.body|password|authorization)" -i src/
# Sensitive fields making it into responses
rg -n "res\.(json|send)\(" src/ | rg -i "password|token|secret|stack|hash"
# Is there a single global error handler, or scattered ad-hoc 500s?
rg -n "app\.use\(.*err|errorHandler|@Catch\(|exceptionHandler|setErrorHandler" src/
rg -n "res\.status\(500\)" src/ | wc -l # many scattered 500s = no central mapping
# Redaction configured at the logger boundary?
rg -n "redact|censor|sanitize|paths:\s*\[|filterKeys" src/ | rg -i "password|authorization|token" \
|| echo "no logger redaction list found"
# Key HTTP metrics actually emitted (duration histogram, error counter)?
rg -n "http_request_duration|requestDuration|http_requests_total|observe\(|inc\(" src/ || echo "no HTTP metrics found"
// WRONG: swallows the error and leaks internals to the client
try { await charge(order); }
catch (e) { console.log(e); res.status(500).json({ error: e.message, stack: e.stack }); }
// RIGHT: structured log with context, user-safe response with a reference ID
try { await charge(order); }
catch (e) {
log.error({ err: e, requestId: ctx.requestId, orderId: order.id }, 'charge failed');
res.status(502).json({ error: 'Payment provider unavailable', requestId: ctx.requestId });
}
| Error class | Status | Client body |
|---|---|---|
| ValidationError | 400 | field-level messages |
| AuthenticationError | 401 | "authentication required" |
| AuthorizationError | 403 | "not permitted" |
| NotFoundError | 404 | "resource not found" |
| ConflictError | 409 | "already exists / version conflict" |
| RateLimitError | 429 | "slow down" + Retry-After |
| Unhandled / unknown | 500 | "something went wrong" + requestId |
console.error(err) instead of a structured call. Produces an unparseable multi-line string in the aggregator that cannot be searched or alerted on.userId or a full URL as a label creates millions of time series and kills the metrics backend. Use route templates (/api/orders/:id), never raw values.log.info(\failed: ${err}`) discards the stack and structure. Pass the error as a field ({ err }`) so the logger serializes it properly.info floods the aggregator and hides real signals; logging errors at info means alerts never fire. Map severity to level deliberately./health that returns 200 unconditionally tells the load balancer the instance is fine while the DB is down. Reflect real dependencies.Report must include: logging framework assessment (structured vs unstructured, fields present/missing); correlation-ID coverage (propagated / partial / missing with gaps); error-swallowing findings (empty catches, fire-and-forget, unhandled rejections by location); HTTP status-mapping assessment (types mapped to wrong codes); client error-exposure findings (response bodies leaking internals); metric coverage (present vs missing); trace propagation status (inbound extraction + outbound injection); alert coverage (existing + gaps); and a severity rating per finding.
console.log(req.body) for debugging and leave it in — it logs credentials and PII on every request.Done means the logging framework, correlation-ID coverage, error-propagation paths, status mapping, client-exposure surface, metrics, traces, and alerts are each assessed from evidence, every gap has a file:line (or "absent") and a severity, and no sensitive value appears unredacted in the report.
npx claudepluginhub fluxonlab/skillry --plugin skillry-backend-and-apiProvides CDSS development patterns for drug interaction checking, dose validation, clinical scoring (NEWS2, qSOFA), and alert classification integrated into EMR workflows.