From kibana-infrastructure-ops-tools
Add monitoring and observability to Kibana features including APM tracing, custom metrics, structured logging, dashboards, and alerting.
How this skill is triggered — by the user, by Claude, or both
Slash command
/kibana-infrastructure-ops-tools:monitoring-setupThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
Add monitoring and observability to Kibana features - APM tracing, custom metrics, logging, dashboards, and alerting.
Add monitoring and observability to Kibana features - APM tracing, custom metrics, logging, dashboards, and alerting.
Kibana has built-in APM support. Instrument critical paths:
// server-side tracing
import type { ElasticsearchClient, Logger } from '@kbn/core/server';
export async function executeQuery(
esClient: ElasticsearchClient,
logger: Logger,
params: QueryParams
) {
// Start APM span
const span = apm.startSpan('execute_query', 'db.elasticsearch');
try {
logger.debug('Executing query', { params });
// Your code here
const result = await esClient.search({
index: params.index,
body: params.query,
});
// Add metadata to span
span?.addLabels({
index: params.index,
result_count: result.hits.total.value,
});
return result;
} catch (error) {
// Capture error in APM
apm.captureError(error);
logger.error('Query execution failed', { error, params });
throw error;
} finally {
// End span
span?.end();
}
}
Key tracing points:
Kibana uses @kbn/core-metrics-server for custom metrics:
// In plugin setup
import type { CoreSetup, Plugin, PluginInitializerContext } from '@kbn/core/server';
import { Subject } from 'rxjs';
interface MyPluginMetrics {
requests_total: number;
request_duration_ms: number[];
active_connections: number;
errors_total: number;
}
export class MyPlugin implements Plugin {
private metrics$ = new Subject<MyPluginMetrics>();
constructor(private readonly context: PluginInitializerContext) {}
public setup(core: CoreSetup) {
// Register custom metrics
core.metrics.getOpsMetrics$().subscribe((metrics) => {
// Emit custom metrics
this.metrics$.next({
requests_total: this.requestCounter,
request_duration_ms: this.durations,
active_connections: this.activeConnections,
errors_total: this.errorCounter,
});
});
// Expose metrics for collection
return {
metrics$: this.metrics$.asObservable(),
};
}
}
// In route handler
router.post(
{
path: '/api/my-plugin/action',
validate: { body: schema.object({ data: schema.string() }) },
},
async (context, request, response) => {
const startTime = Date.now();
this.requestCounter++;
this.activeConnections++;
try {
const result = await doSomething(request.body.data);
// Record duration
const duration = Date.now() - startTime;
this.durations.push(duration);
return response.ok({ body: result });
} catch (error) {
this.errorCounter++;
return response.customError({
statusCode: 500,
body: { message: error.message },
});
} finally {
this.activeConnections--;
}
}
);
Common metrics to track:
Use the Logger from @kbn/core:
import type { Logger } from '@kbn/core/server';
export class MyService {
constructor(private readonly logger: Logger) {}
public async processItem(item: Item) {
// Debug: Verbose details for troubleshooting
this.logger.debug('Processing item', {
item_id: item.id,
item_type: item.type,
});
try {
const result = await this.doWork(item);
// Info: Important business events
this.logger.info('Item processed successfully', {
item_id: item.id,
duration_ms: result.duration,
});
return result;
} catch (error) {
// Error: Failures that need attention
this.logger.error('Failed to process item', {
item_id: item.id,
error: error.message,
stack: error.stack,
});
throw error;
}
}
// Warn: Potential issues (not failures)
public validateConfig(config: Config) {
if (config.timeout > 30000) {
this.logger.warn('Timeout is very high', {
timeout: config.timeout,
recommended: 30000,
});
}
}
}
Logging levels:
error: Failures requiring immediate attentionwarn: Potential issues, deprecated featuresinfo: Important business events (user actions, state changes)debug: Detailed troubleshooting informationtrace: Very verbose (usually disabled)Best practices:
Generate a Kibana dashboard for your feature:
// generate_monitoring_dashboard.ts
import type { SavedObjectsClientContract } from '@kbn/core/server';
export async function createMonitoringDashboard(
soClient: SavedObjectsClientContract,
pluginId: string
) {
const dashboardId = `${pluginId}-monitoring`;
// Create index pattern (for logs)
const indexPattern = await soClient.create('index-pattern', {
title: `logs-${pluginId}*`,
timeFieldName: '@timestamp',
});
// Create visualizations
const visualizations = [
// 1. Request rate (line chart)
await soClient.create('visualization', {
title: `${pluginId} - Request Rate`,
visState: JSON.stringify({
type: 'line',
params: {
type: 'line',
grid: { categoryLines: false },
categoryAxes: [{ id: 'CategoryAxis-1', type: 'category', position: 'bottom', show: true }],
valueAxes: [{ id: 'ValueAxis-1', name: 'LeftAxis-1', type: 'value', position: 'left', show: true }],
seriesParams: [{ data: { id: '1', label: 'Requests/min' }, type: 'line', mode: 'normal' }],
},
aggs: [
{ id: '1', enabled: true, type: 'count', schema: 'metric', params: {} },
{
id: '2',
enabled: true,
type: 'date_histogram',
schema: 'segment',
params: { field: '@timestamp', interval: '1m', timeRange: { from: 'now-15m', to: 'now' } },
},
],
}),
kibanaSavedObjectMeta: {
searchSourceJSON: JSON.stringify({
index: indexPattern.id,
query: { query: `log.logger:${pluginId}`, language: 'kuery' },
filter: [],
}),
},
}),
// 2. Error rate (line chart with threshold)
await soClient.create('visualization', {
title: `${pluginId} - Error Rate`,
visState: JSON.stringify({
type: 'line',
params: {
type: 'line',
addLegend: true,
addTooltip: true,
thresholdLine: { show: true, value: 5, width: 2, style: 'dashed', color: '#E7664C' },
},
aggs: [
{
id: '1',
enabled: true,
type: 'count',
schema: 'metric',
params: {},
},
{
id: '2',
enabled: true,
type: 'date_histogram',
schema: 'segment',
params: { field: '@timestamp', interval: '1m' },
},
],
}),
kibanaSavedObjectMeta: {
searchSourceJSON: JSON.stringify({
index: indexPattern.id,
query: { query: `log.logger:${pluginId} AND log.level:error`, language: 'kuery' },
filter: [],
}),
},
}),
// 3. Latency percentiles (area chart)
await soClient.create('visualization', {
title: `${pluginId} - Latency Percentiles`,
visState: JSON.stringify({
type: 'area',
params: {
type: 'area',
addLegend: true,
addTooltip: true,
},
aggs: [
{
id: '1',
enabled: true,
type: 'percentiles',
schema: 'metric',
params: {
field: 'duration_ms',
percents: [50, 95, 99],
},
},
{
id: '2',
enabled: true,
type: 'date_histogram',
schema: 'segment',
params: { field: '@timestamp', interval: 'auto' },
},
],
}),
}),
// 4. Recent errors (data table)
await soClient.create('visualization', {
title: `${pluginId} - Recent Errors`,
visState: JSON.stringify({
type: 'table',
params: {
perPage: 10,
showPartialRows: false,
showMetricsAtAllLevels: false,
showTotal: false,
totalFunc: 'sum',
},
aggs: [
{
id: '1',
enabled: true,
type: 'count',
schema: 'metric',
params: {},
},
{
id: '2',
enabled: true,
type: 'terms',
schema: 'bucket',
params: {
field: 'error.message.keyword',
size: 10,
order: 'desc',
orderBy: '1',
},
},
],
}),
kibanaSavedObjectMeta: {
searchSourceJSON: JSON.stringify({
index: indexPattern.id,
query: { query: `log.logger:${pluginId} AND log.level:error`, language: 'kuery' },
filter: [],
}),
},
}),
];
// Create dashboard
await soClient.create('dashboard', {
title: `${pluginId} - Monitoring`,
hits: 0,
description: `Monitoring dashboard for ${pluginId}`,
panelsJSON: JSON.stringify([
{ gridData: { x: 0, y: 0, w: 24, h: 12, i: '1' }, panelIndex: '1', version: '7.0.0', panelRefName: 'panel_0' },
{ gridData: { x: 24, y: 0, w: 24, h: 12, i: '2' }, panelIndex: '2', version: '7.0.0', panelRefName: 'panel_1' },
{ gridData: { x: 0, y: 12, w: 24, h: 12, i: '3' }, panelIndex: '3', version: '7.0.0', panelRefName: 'panel_2' },
{ gridData: { x: 24, y: 12, w: 24, h: 12, i: '4' }, panelIndex: '4', version: '7.0.0', panelRefName: 'panel_3' },
]),
optionsJSON: JSON.stringify({
darkTheme: false,
useMargins: true,
hidePanelTitles: false,
}),
version: 1,
timeRestore: false,
kibanaSavedObjectMeta: {
searchSourceJSON: JSON.stringify({
query: { query: '', language: 'kuery' },
filter: [],
}),
},
});
return dashboardId;
}
Dashboard should include:
Set up alerts for critical conditions:
// create_monitoring_alerts.ts
import type { RulesClient } from '@kbn/alerting-plugin/server';
export async function createMonitoringAlerts(
rulesClient: RulesClient,
pluginId: string
) {
// Alert 1: High error rate
await rulesClient.create({
data: {
name: `${pluginId} - High Error Rate`,
tags: ['monitoring', pluginId],
alertTypeId: '.es-query',
consumer: 'alerts',
schedule: { interval: '1m' },
actions: [
{
group: 'query matched',
id: 'webhook-action-id', // Pre-configured webhook
params: {
message: 'High error rate detected in {{context.pluginId}}',
},
},
],
params: {
index: [`logs-${pluginId}*`],
timeField: '@timestamp',
esQuery: JSON.stringify({
query: {
bool: {
must: [
{ match: { 'log.level': 'error' } },
{ match: { 'log.logger': pluginId } },
],
},
},
}),
size: 0,
thresholdComparator: '>',
threshold: [10], // More than 10 errors in 1 minute
timeWindowSize: 1,
timeWindowUnit: 'm',
},
throttle: '5m', // Don't alert more than once per 5 min
notifyWhen: 'onActionGroupChange',
},
});
// Alert 2: High latency
await rulesClient.create({
data: {
name: `${pluginId} - High Latency`,
tags: ['monitoring', pluginId],
alertTypeId: '.es-query',
consumer: 'alerts',
schedule: { interval: '5m' },
actions: [
{
group: 'query matched',
id: 'webhook-action-id',
params: {
message: 'High latency detected in {{context.pluginId}}: {{context.value}}ms',
},
},
],
params: {
index: [`apm-*`],
timeField: '@timestamp',
esQuery: JSON.stringify({
query: {
bool: {
must: [
{ match: { 'service.name': 'kibana' } },
{ match: { 'transaction.name': `/${pluginId}/*` } },
],
},
},
aggs: {
avg_duration: {
avg: { field: 'transaction.duration.us' },
},
},
}),
size: 0,
thresholdComparator: '>',
threshold: [5000000], // 5 seconds in microseconds
timeWindowSize: 5,
timeWindowUnit: 'm',
},
throttle: '15m',
notifyWhen: 'onActionGroupChange',
},
});
// Alert 3: No data (service down)
await rulesClient.create({
data: {
name: `${pluginId} - No Data`,
tags: ['monitoring', pluginId],
alertTypeId: '.es-query',
consumer: 'alerts',
schedule: { interval: '5m' },
actions: [
{
group: 'query matched',
id: 'webhook-action-id',
params: {
message: 'No data received from {{context.pluginId}} in last 5 minutes',
},
},
],
params: {
index: [`logs-${pluginId}*`],
timeField: '@timestamp',
esQuery: JSON.stringify({
query: {
match: { 'log.logger': pluginId },
},
}),
size: 0,
thresholdComparator: '<',
threshold: [1], // Less than 1 log in 5 minutes
timeWindowSize: 5,
timeWindowUnit: 'm',
},
throttle: '10m',
notifyWhen: 'onActionGroupChange',
},
});
}
Recommended alerts:
// my_feature.monitoring.test.ts
import { FtrProviderContext } from '../../ftr_provider_context';
export default function ({ getService }: FtrProviderContext) {
const supertest = getService('supertest');
const es = getService('es');
describe('My Feature - Monitoring', () => {
it('should emit metrics for successful requests', async () => {
// Make request
await supertest
.post('/api/my-plugin/action')
.send({ data: 'test' })
.expect(200);
// Wait for metrics to be indexed
await new Promise((resolve) => setTimeout(resolve, 1000));
// Verify metrics in Elasticsearch
const result = await es.search({
index: '.monitoring-kibana-*',
body: {
query: {
bool: {
must: [
{ match: { 'kibana.plugin': 'myPlugin' } },
{ match: { 'kibana.metrics.requests_total': { gte: 1 } } },
],
},
},
},
});
expect(result.hits.total.value).toBeGreaterThan(0);
});
it('should log errors with proper context', async () => {
// Trigger error
await supertest
.post('/api/my-plugin/action')
.send({ data: 'invalid' })
.expect(500);
// Wait for logs to be indexed
await new Promise((resolve) => setTimeout(resolve, 1000));
// Verify error logs
const result = await es.search({
index: 'logs-kibana*',
body: {
query: {
bool: {
must: [
{ match: { 'log.logger': 'myPlugin' } },
{ match: { 'log.level': 'error' } },
],
},
},
},
});
expect(result.hits.total.value).toBeGreaterThan(0);
// Verify error has context
const errorLog = result.hits.hits[0]._source;
expect(errorLog).toHaveProperty('error.message');
expect(errorLog).toHaveProperty('error.stack');
});
it('should create APM traces', async () => {
// Make request
await supertest
.post('/api/my-plugin/action')
.send({ data: 'test' })
.expect(200);
// Wait for APM to index
await new Promise((resolve) => setTimeout(resolve, 2000));
// Verify APM transaction
const result = await es.search({
index: 'apm-*',
body: {
query: {
bool: {
must: [
{ match: { 'service.name': 'kibana' } },
{ match: { 'transaction.name': '/api/my-plugin/action' } },
],
},
},
},
});
expect(result.hits.total.value).toBeGreaterThan(0);
// Verify trace has spans
const transaction = result.hits.hits[0]._source;
expect(transaction).toHaveProperty('transaction.duration.us');
expect(transaction).toHaveProperty('transaction.result', 'success');
});
});
}
Step 1: Add APM tracing to critical paths
// Identify critical code paths (slow, called frequently, error-prone)
// Add tracing:
import apm from 'elastic-apm-node';
const span = apm.startSpan('feature_name', 'custom');
try {
// ... code ...
} finally {
span?.end();
}
Step 2: Add custom metrics
// Add counters, gauges, histograms
private requestCounter = 0;
private errorCounter = 0;
private durations: number[] = [];
// In request handler:
this.requestCounter++;
const start = Date.now();
try {
// ... code ...
this.durations.push(Date.now() - start);
} catch (error) {
this.errorCounter++;
throw error;
}
Step 3: Add structured logging
// Add logs at appropriate levels
this.logger.info('Feature executed', { user_id, duration_ms });
this.logger.error('Feature failed', { error: error.message, context });
Step 4: Create monitoring dashboard
# Generate dashboard JSON
node scripts/generate_monitoring_dashboard.js --plugin myPlugin
# Import to Kibana
curl -X POST "http://localhost:5601/api/saved_objects/_import" \
-H "kbn-xsrf: true" \
--form file=@monitoring_dashboard.ndjson
Step 5: Set up alerts
// Create alerts for critical conditions
await createMonitoringAlerts(rulesClient, 'myPlugin');
Step 6: Document monitoring
# My Feature - Monitoring
## Metrics
- `myPlugin.requests_total`: Total requests
- `myPlugin.errors_total`: Total errors
- `myPlugin.duration_ms`: Request duration
## Logs
- Logger: `myPlugin`
- Index: `logs-kibana*`
## Dashboard
- URL: /app/dashboards#/view/myPlugin-monitoring
- Panels: Request rate, Error rate, Latency, Recent errors
## Alerts
- High error rate: >10 errors/min
- High latency: p99 >5s
- No data: <1 log in 5min
## Troubleshooting
1. Check dashboard for anomalies
2. Search logs: `log.logger:myPlugin AND log.level:error`
3. Check APM traces: service.name:kibana AND transaction.name:/api/myPlugin/*
Guides creation, editing, and verification of skills for AI coding agents using test-driven development with subagent scenarios. Use when authoring or debugging skills.
npx claudepluginhub patrykkopycinski/patryks-treadmill-claude-plugins --plugin kibana-infrastructure-ops-tools