From fullstack-dev-skills
Configures monitoring systems, logging pipelines, Prometheus/Grafana dashboards, alerting rules, and distributed tracing. Use for adding observability, debugging production issues, load testing, profiling, and capacity planning.
How this skill is triggered — by the user, by Claude, or both
Slash command
/fullstack-dev-skills:monitoring-expertThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
Observability and performance specialist implementing comprehensive monitoring, alerting, tracing, and performance testing systems.
Observability and performance specialist implementing comprehensive monitoring, alerting, tracing, and performance testing systems.
import pino from 'pino';
const logger = pino({ level: 'info' });
// Good — structured fields, includes correlation ID
logger.info({ requestId: req.id, userId: req.user.id, durationMs: elapsed }, 'order.created');
// Bad — string interpolation, no correlation
console.log(`Order created for user ${userId}`);
import { Counter, Histogram, register } from 'prom-client';
const httpRequests = new Counter({
name: 'http_requests_total',
help: 'Total HTTP requests',
labelNames: ['method', 'route', 'status'],
});
const httpDuration = new Histogram({
name: 'http_request_duration_seconds',
help: 'HTTP request latency',
labelNames: ['method', 'route'],
buckets: [0.05, 0.1, 0.3, 0.5, 1, 2, 5],
});
// Instrument a route
app.use((req, res, next) => {
const end = httpDuration.startTimer({ method: req.method, route: req.path });
res.on('finish', () => {
httpRequests.inc({ method: req.method, route: req.path, status: res.statusCode });
end();
});
next();
});
// Expose scrape endpoint
app.get('/metrics', async (req, res) => {
res.set('Content-Type', register.contentType);
res.end(await register.metrics());
});
import { NodeSDK } from '@opentelemetry/sdk-node';
import { OTLPTraceExporter } from '@opentelemetry/exporter-trace-otlp-http';
import { trace } from '@opentelemetry/api';
const sdk = new NodeSDK({
traceExporter: new OTLPTraceExporter({ url: 'http://jaeger:4318/v1/traces' }),
});
sdk.start();
// Manual span around a critical operation
const tracer = trace.getTracer('order-service');
async function processOrder(orderId) {
const span = tracer.startSpan('order.process');
span.setAttribute('order.id', orderId);
try {
const result = await db.saveOrder(orderId);
span.setStatus({ code: SpanStatusCode.OK });
return result;
} catch (err) {
span.recordException(err);
span.setStatus({ code: SpanStatusCode.ERROR });
throw err;
} finally {
span.end();
}
}
groups:
- name: api.rules
rules:
- alert: HighErrorRate
expr: |
rate(http_requests_total{status=~"5.."}[5m])
/ rate(http_requests_total[5m]) > 0.05
for: 2m
labels:
severity: critical
annotations:
summary: "Error rate above 5% on {{ $labels.route }}"
import http from 'k6/http';
import { check, sleep } from 'k6';
export const options = {
stages: [
{ duration: '1m', target: 50 }, // ramp up
{ duration: '5m', target: 50 }, // sustained load
{ duration: '1m', target: 0 }, // ramp down
],
thresholds: {
http_req_duration: ['p(95)<500'], // 95th percentile < 500 ms
http_req_failed: ['rate<0.01'], // error rate < 1%
},
};
export default function () {
const res = http.get('https://api.example.com/orders');
check(res, { 'status is 200': (r) => r.status === 200 });
sleep(1);
}
Load detailed guidance based on context:
| Topic | Reference | Load When |
|---|---|---|
| Logging | references/structured-logging.md | Pino, JSON logging |
| Metrics | references/prometheus-metrics.md | Counter, Histogram, Gauge |
| Tracing | references/opentelemetry.md | OpenTelemetry, spans |
| Alerting | references/alerting-rules.md | Prometheus alerts |
| Dashboards | references/dashboards.md | RED/USE method, Grafana |
| Performance Testing | references/performance-testing.md | Load testing, k6, Artillery, benchmarks |
| Profiling | references/application-profiling.md | CPU/memory profiling, bottlenecks |
| Capacity Planning | references/capacity-planning.md | Scaling, forecasting, budgets |
npx claudepluginhub jeffallan/claude-skills --plugin fullstack-dev-skillsConfigures monitoring, logging, dashboards, alerting, and distributed tracing. Use for Prometheus/Grafana stacks, load testing, profiling, and capacity planning.
Designs production-grade monitoring, logging, and tracing systems with SLI/SLO management, alerting, and incident response workflows.
Provides monitoring and observability patterns including Prometheus RED/USE metrics, Pino/Winston structured logging, OpenTelemetry tracing, SLO-based alerting, Grafana dashboards, and burn rate alerts. Use when setting up metrics, logs, traces, or alerts for services.