From harness-claude
Implements /health and /ready endpoints for liveness and readiness probes in container orchestration platforms like Kubernetes. Includes example with Express, Prisma, and Redis.
How this skill is triggered — by the user, by Claude, or both
Slash command
/harness-claude:microservices-health-checkThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
> Implement /health and /ready endpoints for liveness and readiness probes in containers.
Implement /health and /ready endpoints for liveness and readiness probes in containers.
Two endpoints — always implement both:
GET /health → liveness probe: "Is the process alive?"
- Returns 200 if the process is running (even if dependencies are down)
- Kubernetes restarts the container if this fails repeatedly
- Should almost never fail (only if process is deadlocked)
GET /ready → readiness probe: "Can this instance handle traffic?"
- Returns 200 only if all critical dependencies are healthy
- Kubernetes removes instance from load balancer if this fails
- Common reasons it returns 503: DB not connected, cache not available, still starting
Full implementation with Express:
import express from 'express';
import { PrismaClient } from '@prisma/client';
import { Redis } from 'ioredis';
const app = express();
const prisma = new PrismaClient();
const redis = new Redis(process.env.REDIS_URL!);
// Liveness — is the process alive?
app.get('/health', (req, res) => {
res.status(200).json({
status: 'ok',
timestamp: new Date().toISOString(),
uptime: process.uptime(),
pid: process.pid,
});
});
// Readiness — can we handle traffic?
app.get('/ready', async (req, res) => {
const checks: Record<string, { status: 'ok' | 'error'; latencyMs?: number; error?: string }> = {};
let allHealthy = true;
// Check database
const dbStart = Date.now();
try {
await prisma.$queryRaw`SELECT 1`;
checks.database = { status: 'ok', latencyMs: Date.now() - dbStart };
} catch (err) {
checks.database = { status: 'error', error: (err as Error).message };
allHealthy = false;
}
// Check Redis
const redisStart = Date.now();
try {
await redis.ping();
checks.redis = { status: 'ok', latencyMs: Date.now() - redisStart };
} catch (err) {
checks.redis = { status: 'error', error: (err as Error).message };
allHealthy = false; // or set false only if Redis is required
}
// Check external critical dependencies
try {
const response = await fetch(`${process.env.PAYMENT_SERVICE_URL}/health`, {
signal: AbortSignal.timeout(2_000),
});
checks.paymentService = {
status: response.ok ? 'ok' : 'error',
error: response.ok ? undefined : `HTTP ${response.status}`,
};
if (!response.ok) allHealthy = false;
} catch (err) {
checks.paymentService = { status: 'error', error: (err as Error).message };
allHealthy = false;
}
const httpStatus = allHealthy ? 200 : 503;
res.status(httpStatus).json({
status: allHealthy ? 'ready' : 'not ready',
timestamp: new Date().toISOString(),
checks,
});
});
// Example readiness response (healthy):
// {
// "status": "ready",
// "timestamp": "2024-01-15T10:30:00Z",
// "checks": {
// "database": { "status": "ok", "latencyMs": 3 },
// "redis": { "status": "ok", "latencyMs": 1 },
// "paymentService": { "status": "ok" }
// }
// }
Kubernetes probe configuration:
containers:
- name: order-service
livenessProbe:
httpGet:
path: /health
port: 8080
initialDelaySeconds: 10 # wait for app to start
periodSeconds: 10 # check every 10s
failureThreshold: 3 # restart after 3 consecutive failures
timeoutSeconds: 5
readinessProbe:
httpGet:
path: /ready
port: 8080
initialDelaySeconds: 5 # start checking earlier (just connectivity)
periodSeconds: 5
failureThreshold: 2 # remove from LB after 2 consecutive failures
successThreshold: 1 # put back on LB after 1 success
timeoutSeconds: 3
startupProbe:
# For apps with slow startup (loading large ML models, etc.)
httpGet:
path: /health
port: 8080
failureThreshold: 30 # 30 × 5s = 150 seconds to start
periodSeconds: 5
Graceful shutdown with readiness:
let isReady = true;
process.on('SIGTERM', async () => {
console.log('SIGTERM received — starting graceful shutdown');
// 1. Stop accepting new traffic
isReady = false; // readiness probe starts returning 503
// 2. Wait for in-flight requests to complete (give LB time to stop routing)
await new Promise((r) => setTimeout(r, 5_000));
// 3. Close connections
await prisma.$disconnect();
redis.disconnect();
process.exit(0);
});
// In readiness endpoint
app.get('/ready', async (req, res) => {
if (!isReady) {
res.status(503).json({ status: 'shutting down' });
return;
}
// ... other checks
});
What to check in readiness vs. liveness:
| /health (liveness) | /ready (readiness) | |
|---|---|---|
| Purpose | Is the process alive? | Can it handle traffic? |
| DB connectivity | No | Yes |
| Redis connectivity | No | Yes (if required) |
| External services | No | Critical ones only |
| Response time | Must be fast (<50ms) | Can check dependencies (< 3s) |
Startup probes: Use startupProbe for services with long startup times (model loading, schema migration). It replaces the liveness probe during startup — the service gets time to initialize before the liveness probe kicks in.
Anti-patterns:
Security: Health endpoints should not require auth (they're called by infrastructure). But they should not expose sensitive information like connection strings or internal IPs.
microservices.io/patterns/observability/health-check-api.html
npx claudepluginhub intense-visions/harness-engineering --plugin harness-claudeImplements liveness, readiness, startup, and deep health check endpoints with dependency monitoring. Use for Kubernetes probes, load balancers, auto-scaling, or fixing probe failures and startup delays.
Implements health check endpoints (liveness, readiness, startup) for service monitoring in Kubernetes and load balancers.
Configures health checks for .NET apps: database (PostgreSQL), external HTTP services, Redis, RabbitMQ, and custom checks. Includes liveness/readiness endpoints for container orchestration.