From rhyanz46-devops
General playbook for running microservices and their web frontends on shared Linux hosts, using PM2 for development and HashiCorp Nomad (with Consul, Vault, and Traefik) for production. Server- and domain-agnostic — no hardcoded IPs, hostnames, or tokens. Use this skill whenever the user asks to deploy, operate, harden, or troubleshoot services in this dev-PM2 / prod-Nomad hybrid model. Typical signals: mentions of PM2 + Nomad together, `ecosystem.config.js`, Nomad job specs with `template` stanzas, Consul service discovery, Vault secret rendering, Traefik tags, host-MySQL/Postgres bind for containers, systemd drop-ins for consul/nomad/docker ordering, "strip source keep binary" production hygiene, or unattended-upgrades / kernel-reboot maintenance.
How this skill is triggered — by the user, by Claude, or both
Slash command
/rhyanz46-devops:microservice-devopsThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
This skill is the **canonical doctrine** for operating microservices and their
This skill is the canonical doctrine for operating microservices and their
web frontends in a hybrid model: PM2 for development, HashiCorp Nomad
(with Consul, Vault, Traefik) for production, frequently colocated on a single
shared host. This SKILL.md is the entry point — it tells future Claude what
to read, in what order, and what hard rules to enforce. Deep procedures live in
references/.
This skill is intentionally server- and domain-agnostic. Never bake in a
specific IP address, hostname, domain, token, or credential. Always use
placeholders: <host>, <service>, <domain>, <bridge-gateway-ip>,
<token>. When you operate on a real server, read its actual state first — do
not assume values from this doctrine.
Trigger when the user is deploying, operating, hardening, or debugging services that use — or want to use — this stack:
pm2 start,
ecosystem.config.js, pm2 save/resurrect).Strong signals: an ecosystem.config.js; a *.nomad / *.nomad.hcl job spec
with a template stanza using {{ range service "..." }} (Consul) or
{{ with secret "..." }} (Vault); Consul/Nomad systemd units; host databases
that containers reach via the container bridge gateway; or the user explicitly
invoking this playbook.
If the project is unrelated to this stack, do not apply this skill — ask first.
┌─────────────────── shared host ───────────────────┐
DEV (PM2) │ pm2 ─┬─ svc-a (bind 127.0.0.1:PORT) │
│ └─ svc-b (bind 127.0.0.1:PORT) │
│ │
PROD (Nomad) │ systemd ─ docker │
│ systemd ─ consul ─┐ (service discovery) │
│ systemd ─ vault ──┤ (secrets) │
│ systemd ─ nomad ──┴─ alloc ─ task (docker) ──┐ │
│ │ reads Consul+Vault │ │
│ └─ registers in Consul │ │
│ Traefik / nginx ── ingress ── :443 public ────┘ │
│ host db (mysql/postgres) bind 127.0.0.1,<bridge> │
└───────────────────────────────────────────────────┘
Two control planes share one box. The dominant risks all stem from that: resource contention, port/exposure collisions, and blast radius from dev into prod. The hard rules below exist to contain those.
Global numbering so you can cite "rule #N".
127.0.0.1 (or a private interface), never 0.0.0.0, unless the user
explicitly wants public exposure. Every PM2 app MUST set
max_memory_restart so a leak cannot starve prod. See
references/pm2-dev-workflow.md.consul/nomad, reloading
ingress, or rebinding a database are prod-affecting. Read current state,
back up the file, validate (nginx -t, nomad job validate), then act.
Localhost-only reversible reads need no confirmation.0.0.0.0.template dependencies are hard runtime dependencies. A task whose
template renders {{ range service "X" }} (Consul) or
{{ with secret "Y" }} (Vault) will NOT start until Consul/Vault are up and
X/Y resolve. If a task is stuck "pending"/"failed" with a "Template
failed" / "Missing: service" message, the root cause is upstream
(Consul down, dependency service not healthy, Vault sealed/no token), not the
task itself. See references/incident-playbook.md.consul,
vault, nomad, and docker MUST be systemctl enabled (verify with
is-enabled), and Nomad MUST start after Consul and Docker. Use systemd
drop-ins for ordering. If a Type=notify daemon flaps (systemd kills it
every TimeoutStartUSec because it never signals READY), pin Type=exec via
a drop-in. See references/platform-systemd.md.127.0.0.1,<bridge-gateway-ip> (e.g. Docker's default bridge gateway), not
127.0.0.1 alone — otherwise containers get "connection refused". Add a
systemd ordering drop-in so the DB starts after docker (so the bridge
interface exists before the DB binds). See
references/host-service-binding.md.ecosystem.config.js
/ Dockerfile) + its .env/secret/config files. Delete source, build caches,
go.mod/go.sum/package.json dev manifests, and never leave a .git
folder on prod. See references/prod-hygiene.md.references/pm2-dev-workflow.md.reboot-required flag is set —
confirm timing with the user first. See references/security-maintenance.md.references/security-maintenance.md.| Situation | Read |
|---|---|
| Setting up / managing dev processes | references/pm2-dev-workflow.md |
| Writing or operating a Nomad prod job | references/nomad-prod-workflow.md |
| Service discovery or secret rendering issues | references/consul-vault.md |
| Boot ordering, flapping daemon, enablement | references/platform-systemd.md |
| Container can't reach host DB/cache | references/host-service-binding.md |
| Cleaning a prod host / shipping a deploy | references/prod-hygiene.md |
| Patching, kernel reboot, CVE mitigation | references/security-maintenance.md |
| A service is down — diagnose root cause | references/incident-playbook.md |
nginx -t, nomad job validate, consul validate, vault operator ... dry-runs exist for a reason..bak before changing
it; record what you changed and how to revert.npx claudepluginhub rhyanz46/devops-microservice-skills --plugin rhyanz46-devopsProvides CDSS development patterns for drug interaction checking, dose validation, clinical scoring (NEWS2, qSOFA), and alert classification integrated into EMR workflows.