From chogos
Kubernetes operator management best practices. Use when deploying, configuring, upgrading, or troubleshooting Kubernetes operators including Strimzi (Kafka), Keycloak, kube-prometheus-stack (Prometheus, Grafana, Alertmanager), cert-manager, or external-dns.
How this skill is triggered — by the user, by Claude, or both
Slash command
/chogos:managing-k8s-operatorsThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
| Operator | Primary CRDs | Typical Namespace | Project |
installation-methods.mdpatterns/cert-manager/certificates.mdpatterns/cert-manager/issuers.mdpatterns/cert-manager/webhook-solvers.mdpatterns/external-dns/ownership-and-policy.mdpatterns/external-dns/provider-setup.mdpatterns/external-dns/source-configuration.mdpatterns/keycloak/client-setup.mdpatterns/keycloak/customization.mdpatterns/keycloak/identity-providers.mdpatterns/keycloak/realm-configuration.mdpatterns/kube-prometheus-stack/alerting-rules.mdpatterns/kube-prometheus-stack/grafana-dashboards.mdpatterns/kube-prometheus-stack/service-monitors.mdpatterns/kube-prometheus-stack/thanos-integration.mdpatterns/strimzi/cluster-setup.mdpatterns/strimzi/kafka-connect.mdpatterns/strimzi/listeners-and-storage.mdpatterns/strimzi/mirror-maker.mdpatterns/strimzi/topics-and-users.md| Operator | Primary CRDs | Typical Namespace | Project |
|---|---|---|---|
| Strimzi | Kafka, KafkaTopic, KafkaUser, KafkaConnect, KafkaMirrorMaker2 | strimzi-system | strimzi.io |
| Keycloak | Keycloak, KeycloakRealmImport | keycloak-system | keycloak.org |
| kube-prometheus-stack | Prometheus, Alertmanager, ServiceMonitor, PodMonitor, PrometheusRule | monitoring | prometheus-operator |
| cert-manager | Issuer, ClusterIssuer, Certificate, CertificateRequest | cert-manager | cert-manager.io |
| external-dns | (no CRDs — annotation-driven) | external-dns | kubernetes-sigs |
Read the pattern file(s) matching the user's intent. Load only what's needed.
| User Intent | Load |
|---|---|
| Install / compare installation methods | installation-methods.md |
| Deploy Kafka cluster | patterns/strimzi/cluster-setup.md |
| Configure Kafka listeners or storage | patterns/strimzi/listeners-and-storage.md |
| Create Kafka topics or users | patterns/strimzi/topics-and-users.md |
| Set up Kafka Connect / connectors | patterns/strimzi/kafka-connect.md |
| Replicate Kafka across clusters | patterns/strimzi/mirror-maker.md |
| Configure Keycloak realm | patterns/keycloak/realm-configuration.md |
| Set up SSO / identity providers | patterns/keycloak/identity-providers.md |
| Create Keycloak clients | patterns/keycloak/client-setup.md |
| Customize Keycloak themes or SPIs | patterns/keycloak/customization.md |
| Add Prometheus scraping for a service | patterns/kube-prometheus-stack/service-monitors.md |
| Create alerting rules | patterns/kube-prometheus-stack/alerting-rules.md |
| Provision Grafana dashboards | patterns/kube-prometheus-stack/grafana-dashboards.md |
| Set up Thanos for long-term metrics | patterns/kube-prometheus-stack/thanos-integration.md |
| Set up TLS certificates | patterns/cert-manager/issuers.md + patterns/cert-manager/certificates.md |
| Configure ACME / challenge solvers | patterns/cert-manager/issuers.md + patterns/cert-manager/webhook-solvers.md |
| Troubleshoot certificate renewal | patterns/cert-manager/certificates.md |
| Auto-manage DNS records | patterns/external-dns/provider-setup.md + patterns/external-dns/source-configuration.md |
| Configure DNS record ownership | patterns/external-dns/ownership-and-policy.md |
| Upgrade any operator | installation-methods.md + cross-cutting section below |
| Monitor operator health | Cross-cutting section below (no extra file) |
Three methods exist: OLM, Helm, and raw manifests. See installation-methods.md for full comparison.
| Operator | Recommended Method | Notes |
|---|---|---|
| Strimzi | Helm or OLM | strimzi/strimzi-kafka-operator chart |
| Keycloak | Helm or manifests | keycloak/keycloak-operator chart |
| kube-prometheus-stack | Helm only | Complex subchart dependencies |
| cert-manager | Helm | cert-manager/cert-manager chart |
| external-dns | Helm | kubernetes-sigs/external-dns chart |
CRD API versions progress: v1alpha1 → v1beta1 → v1. Before upgrading:
# check stored versions
kubectl get crd kafkas.kafka.strimzi.io -o jsonpath='{.status.storedVersions}'
# list all CRDs for an operator
kubectl get crd | grep strimzi.io
kubectl get crd <name> -o yaml > backup.yamlGeneral operator upgrade checklist:
kubectl get <crd> -A -o yaml > backup.yamlhelm upgrade --atomic --wait, OLM: update subscription channel)kubectl get pods -n <operator-ns>.status.conditionsRolling vs recreate:
Canary upgrades:
WATCH_NAMESPACE scopingDeploy the Strimzi Drain Cleaner for graceful pod eviction during node maintenance — it ensures brokers are rolled safely when nodes are drained.
All controller-runtime operators expose standard metrics:
# key metrics to watch
controller_runtime_reconcile_total{result="error"} # reconciliation failures
controller_runtime_reconcile_time_seconds # reconciliation latency
workqueue_depth # pending reconciliations
workqueue_longest_running_processor_seconds # stuck reconciliations
ServiceMonitor for operator pods:
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: strimzi-operator
namespace: monitoring
spec:
namespaceSelector:
matchNames: [strimzi-system]
selector:
matchLabels:
app: strimzi-cluster-operator
endpoints:
- port: http
path: /metrics
interval: 30s
Alert on reconciliation failures:
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
name: operator-health
namespace: monitoring
spec:
groups:
- name: operator-health
rules:
- alert: OperatorReconcileErrors
expr: rate(controller_runtime_reconcile_total{result="error"}[5m]) > 0
for: 10m
labels:
severity: warning
annotations:
summary: "Operator {{ $labels.controller }} has reconciliation errors"
Use Guaranteed QoS (requests == limits for CPU and memory) for all operator-managed workloads in production. Burstable pods under node pressure get OOM-killed first, triggering cascading failures.
WATCH_NAMESPACE env var to restrict scope# audit what an operator service account can do
kubectl auth can-i --list --as=system:serviceaccount:strimzi-system:strimzi-cluster-operator
strimzi-system, cert-manager, monitoring, external-dnsWATCH_NAMESPACEGeneric operator troubleshooting checklist:
kubectl get pods -n <operator-ns> — check Ready, restartskubectl logs -n <operator-ns> deploy/<operator-name> --tail=100kubectl get <cr> <name> -o yaml — check .status.conditions for errorskubectl get events -n <ns> --sort-by='.lastTimestamp' --field-selector=reason!=Pulledkubectl auth can-i --as=system:serviceaccount:<ns>:<sa> <verb> <resource>kubectl get crd | grep <operator-domain>kubectl get validatingwebhookconfigurations,mutatingwebhookconfigurations.status → fix → repeatpatterns/strimzi/cluster-setup.md — KRaft cluster provisioning, node poolspatterns/strimzi/listeners-and-storage.md — Listener types, authentication, JBOD, volume expansionpatterns/strimzi/topics-and-users.md — KafkaTopic, KafkaUser, ACLspatterns/strimzi/kafka-connect.md — Connectors, plugin builds, Connect ACLspatterns/strimzi/mirror-maker.md — MirrorMaker2, cross-cluster replicationpatterns/keycloak/realm-configuration.md — Realm CRs, export/import, settingspatterns/keycloak/identity-providers.md — OIDC, SAML, social providerspatterns/keycloak/client-setup.md — Clients, scopes, service accountspatterns/keycloak/customization.md — Themes, SPIs, custom providerspatterns/kube-prometheus-stack/service-monitors.md — ServiceMonitor, PodMonitorpatterns/kube-prometheus-stack/alerting-rules.md — PrometheusRule, routingpatterns/kube-prometheus-stack/grafana-dashboards.md — Dashboard provisioningpatterns/kube-prometheus-stack/thanos-integration.md — Long-term storagepatterns/cert-manager/issuers.md — Issuer, ClusterIssuer, ACME, CA, Vaultpatterns/cert-manager/certificates.md — Certificate lifecycle, renewalpatterns/cert-manager/webhook-solvers.md — HTTP01, DNS01 solverspatterns/external-dns/provider-setup.md — DNS provider configurationpatterns/external-dns/source-configuration.md — Ingress/Service/Gateway sourcespatterns/external-dns/ownership-and-policy.md — TXT records, policy modesnpx claudepluginhub chogos/claude-skills --plugin chogosProvides expert guidance on Kubernetes, OpenShift, and OLM: debugging resources like pods/deployments, operator development/troubleshooting, manifest/CRD reviews, and cluster investigations.
Provides Kubernetes deployment manifests, Helm chart structure, HPA configuration, and troubleshooting commands for managing clusters.
Enforces least-privilege RBAC and secure runtime configuration for Kubernetes Operators. Use when building, reviewing, or auditing Operator manifests, ClusterRoles, Roles, OLM bundles, or CRD definitions.