From ship-it-ops
Apply DevOps and CI/CD review principles (CI pipelines, infrastructure-as-code, container images, secrets/config sourcing, observability, release management, schema migrations, health/readiness, SLO/performance, incident hygiene, and flow/batch signals) when writing or reviewing pipeline YAML, IaC, Dockerfiles, k8s manifests, deploy scripts, and the application code that integrates with them. Invoke explicitly for DevOps/CI/CD reviews, or as the delegation target from the ship-reviewed-prs IN (Senior Infra / SRE / DevOps) persona. Do not invoke for pure styling, application-only logic, one-off prototypes, or test-design depth (use ship-clean-code, ship-secure-code, or ship-tested-code respectively).
How this skill is triggered — by the user, by Claude, or both
Slash command
/ship-it-ops:ship-devopsThis skill is limited to the following tools:
The summary Claude sees in its skill listing — used to decide when to auto-load this skill
This skill applies DevOps and CI/CD principles to help you write and review the pipeline, infrastructure, container, and deploy-adjacent code that turns a working commit into a safe production change. It operates in **review mode** only — it does not auto-remediate. Sibling skills handle non-DevOps concerns: `ship-clean-code` (file quality), `ship-secure-code` (appsec), `ship-tested-code` (test...
ci-github-actions.mdcontainer-docker.mdexamples/fix-example.mdexamples/review-example.mdiac-terraform.mdk8s.mdobservability.mdoverrides.example.mdreference-categories.mdreference.mdtests/README.mdtests/fixture-1-missing-rollback/expected-output.mdtests/fixture-1-missing-rollback/input.mdtests/fixture-2-secret-in-workflow/expected-output.mdtests/fixture-2-secret-in-workflow/input.mdtests/fixture-3-dockerfile-root-user/expected-output.mdtests/fixture-3-dockerfile-root-user/input.mdtests/fixture-4-non-reversible-migration/expected-output.mdtests/fixture-4-non-reversible-migration/input.mdtests/fixture-5-missing-health-check/expected-output.mdThis skill applies DevOps and CI/CD principles to help you write and review the pipeline, infrastructure, container, and deploy-adjacent code that turns a working commit into a safe production change. It operates in review mode only — it does not auto-remediate. Sibling skills handle non-DevOps concerns: ship-clean-code (file quality), ship-secure-code (appsec), ship-tested-code (test design), ship-debugged-code (root cause), ship-reviewed-prs (PR-level orchestration).
The rubric draws on three canonical DevOps texts: The DevOps Handbook (Kim/Humble/Debois/Willis), The Phoenix Project (Kim/Behr/Spafford), and Effective DevOps (Davis/Daniels). Every finding ID traces to one of: the Three Ways (Flow, Feedback, Continual Learning), the Four Types of Work, or the CAMS pillars (Culture, Automation, Measurement, Sharing). Sources are cited per category in reference.md.
Start with these 3 rules and internalize them before learning the rest:
The detailed reference files (reference.md, reference-categories.md, the platform files) assume familiarity with these three and with the OWASP-of-ops surface: pipeline YAML, IaC, containers, k8s, observability stacks.
/ship-devops <path|file>, "devops review", "ci review", "deploy review", "infra review", "pipeline review", or invocation from ship-reviewed-prs IN-persona delegation.If asked to write deploy-adjacent code (e.g., a new GitHub Actions workflow, a Terraform module, a Dockerfile), the skill does not apply directly — write the code with ship-clean-code, then run this skill to review it. The write/review split is intentional; a single mode that does both tends to produce pipelines that look defended (lots of steps, lots of if guards) without actually being safe.
These 12 rules apply to ALL DevOps review:
Identify the pipeline before reading the changed file. What gates run? What artifacts are produced? Where does this PR's change land in the pipeline graph? Findings hang off this map. If the repo doesn't have a pipeline at all, that's the first finding.
Running the deploy/IaC twice must produce the same result. Hand-edited resources, non-idempotent shell scripts, and if [ -f x ]; then mv patterns all fail this rule. DEV3 owns the deeper rubric.
Container images, lambda zips, and AMIs are built once and promoted across environments. Editing the running container, hot-patching prod, or rebuilding per-environment all fail this rule.
Every change ships with a rollback. Two-phase migrations, feature flags, blue-green slots, or canary cohorts — the skill flags changes that cannot be undone within minutes.
CI signals failure as early as possible (cheap stages before expensive ones, fail-fast matrix). Prod failure denies access / refuses traffic / falls back to safe-mode rather than serving partial state.
If the change touches a user-impacting path, the PR must add or reuse a log line, a metric, and (where the stack supports it) a trace span. Dashboards live in code. DEV6 owns the deeper rubric.
Action references, base images, package versions, Terraform providers — everything that changes silently can break silently. Pin to a digest (containers, actions) or a lockfile (npm/pip/cargo). Float only in dev.
Production secrets enter the process at runtime via env var, vault client, or platform-managed identity. SEC7 owns the leak surface (hardcoded literal in code/CI); DEV5 owns the sourcing discipline. See "Related Skills" for the boundary.
CI tokens scoped to the minimum repo/permission set, container USER is non-root, IAM roles are per-service, k8s service accounts mount only what they need. Cross-cuts with SEC1.4.
Big PRs and long-lived branches turn deploys into events. DEV12 flags batch-size signals: file count, line count, branch age, "WIP" commit messages. The goal is not to gate large PRs, but to surface the risk.
DEVn.1 findings (must-fix) block merge. DEVn.2 findings (should-fix) ship with mitigation plans. DEVn.3-5 are advisory. The skill computes tier from the finding ID and the surrounding context (touches prod path vs. dev-only); no LLM negotiation.
Include a Confidence section naming what was reviewed, what was not (binaries, generated manifests, vendored modules), and what's the residual risk. A confident "must fix" pairs with the specific pipeline/infra path that drove the finding.
| ID | Label | Covers | Tier-1 examples (must-fix) |
|---|---|---|---|
| DEV1 | CI-PIPELINE | Workflow YAML quality, build/test parallelism, caching, fast feedback (<10 min), pinned action versions, fail-fast vs. continue-on-error, matrix coverage | New workflow uses actions/checkout@main (floating tag); merge gate runs no tests; continue-on-error: true on the test step |
| DEV2 | DEPLOYMENT-SAFETY | Rollback path, blue-green/canary/feature-flag presence, big-bang detection, deploy idempotency, missing pre-deploy validation | New deploy script overwrites prod with no rollback; rollout strategy is Recreate on a stateful service; deploy job has no health-gate |
| DEV3 | IAC-IMMUTABILITY | Terraform/Pulumi/CloudFormation/Ansible hygiene, drift signals, environment parity, missing terraform plan gate, hand-edited resources | local-exec provisioner running aws cli with mutating verbs; no remote state; resources renamed in .tf without moved {} block |
| DEV4 | CONTAINER-IMAGE | Dockerfile hygiene: non-root USER, multi-stage build, pinned base, .dockerignore present, no secrets in layers, healthcheck, minimal surface | FROM ubuntu:latest; no USER; ARG SECRET=... in build; COPY . /app with no .dockerignore |
| DEV5 | CONFIG-MGMT | Env-var/vault sourcing, 12-factor compliance, per-env overrides, default-value handling, runtime mutability, no committed .env | New service reads secrets from a checked-in config.json; required env var has a silent fallback default; same config used for dev and prod |
| DEV6 | OBSERVABILITY | Structured logging, golden signals (latency/traffic/errors/saturation), correlation IDs, distributed tracing, dashboard-as-code, metrics on user paths | New endpoint logs nothing; metric only reports success counter, no error/latency; dashboard added in the UI, not the repo |
| DEV7 | RELEASE-MGMT | Versioning (semver), CHANGELOG, release-tag policy, conventional commits where adopted, breaking-change signaling, lockfile drift | Major-version bump with no CHANGELOG entry; breaking API change in a fix: commit; lockfile diff conflicts with package.json |
| DEV8 | SCHEMA-MIGRATION | Backward-compatible migrations (N and N-1 readers), reversible default, no DROP COLUMN without two-phase, online-DDL when needed, canary-safe ordering | ALTER TABLE adds NOT NULL with no default to a hot table; migration removes column still read by previous version; long-running lock |
| DEV9 | HEALTH-READINESS | /healthz//readyz endpoints, liveness vs. readiness distinction, smoke test post-deploy, k8s startup-probe correctness, dependency-health propagation | New service ships with no health endpoint; liveness probe also checks dependencies (cascading failure); deploy job has no smoke step |
| DEV10 | SLO-PERFORMANCE | Perf test in pipeline, latency budget acknowledged, regression detection, resource limits/requests on workloads, timeouts/circuit breakers, DORA signals | k8s Deployment has no resources.limits; HTTP client has no timeout; new perf-sensitive endpoint has no load test |
| DEV11 | INCIDENT-HYGIENE | Runbook in repo, on-call doc freshness, post-mortem link on fix PRs, alert quality (actionable, non-noisy), CODEOWNERS coverage for prod paths | New prod service has no runbook; alert thresholds copy/pasted from an unrelated service; production-touching path missing from CODEOWNERS |
| DEV12 | FLOW-BATCH | PR size (lines/files/surface), trunk-based vs. long-lived branch, WIP signals (commits-in-flight), feature-flag wrapping for partial work, atomic-commit hygiene | PR touches 80 files across 5 services with no flag; branch age > 30 days behind main; "WIP" / "tmp" commit messages without squash |
Full per-category rubric — antipatterns, canonical fixes, false-positive notes, cross-references — lives in reference-categories.md.
Each finding ID has a tier sub-tag computed from the deployment context:
The full tier definitions per finding ID are in reference-categories.md.
| State | Decision |
|---|---|
| Any unsuppressed *.1 (must-fix) finding | REQUEST_CHANGES |
| Only *.2 findings | COMMENT |
| Only .3-.5 findings | COMMENT |
| Zero findings | APPROVE (or NO_FINDINGS when run standalone) |
ship-devops does not have its own submission semantics — when run standalone, it produces a structured report. When run as the delegation target from ship-reviewed-prs IN persona, the parent skill maps the report to its own decision matrix (DEVn.1 → IN priority-1, DEVn.2 → IN priority-3, DEVn.3-5 → IN priority-5+) and renders findings with compound tags [INn / DEVm.t-LABEL] so the depth-target's category surfaces alongside the orchestrator's priority code.
## DevOps Review: [scope]
### Confidence
<2-4 sentences: pipeline identified, what was reviewed, what was not
reviewed (binaries, generated manifests, vendored modules, autogenerated
lockfiles), residual risk.>
### Critical (must fix before merge)
- **[DEV2.1-NO-ROLLBACK] deploy/release.sh:14**: <deploy path: trigger → action → blast radius>. → <fix>.
- **[DEV4.1-IMAGE-ROOT-USER] services/api/Dockerfile:8**: <description>. → <fix>.
### Important (should fix)
- **[DEV1.2-FLOATING-ACTION] .github/workflows/ci.yml:22**: <action ref>. → <fix>.
### Advisory (hygiene)
- **[DEV11.4-MISSING-RUNBOOK] services/api/**: <description>. → <fix>.
### What's Good
- <substantive observation about a discipline done well — not boilerplate>
Rules for the output:
DEVn.t-LABEL) — tier is part of the ID, not a separate field.scripts/, tools/, dev/, local/, or marked "dev only" in the file header get advisory-tier findings only — no blocking findings on convenience scripts that never touch prod.tests/fixtures/ or an example Terraform module under examples/ with intentional smells is not flagged.ci-github-actions.md don't fire.internal-only: true in its package metadata does not fire DEV11 runbook/CODEOWNERS findings.(pre-existing) marker; do not block.If the code under review already contains an old DevOps antipattern not introduced by this PR/diff:
(pre-existing) marker.Newly-introduced antipatterns (this PR adds them) are full-tier per the matrix.
Before applying DevOps rules, check for override files in this order:
overrides.md next to this SKILL.md (team-wide overrides bundled with the skill).claude/ship-devops-overrides.md in the user's project root (project-specific overrides)Use overrides for:
docs/runbooks/ directory).A template is at overrides.example.md.
Phased rollout recommended:
Track: tier-1 findings per PR (should trend toward zero); false-positive rate per category (if any category fires noisily, demote it via overrides).
ship-reviewed-prs — PR-level orchestrator. Its IN persona (Senior Infra / SRE / DevOps) delegates depth here, exactly as SC delegates to ship-secure-code. The orchestrator emits direct IN1–IN7 findings for high-precision single-line hits and Run /ship-devops on <file> delegation bullets for multi-file pipeline review. Compound finding tags [IN1 / DEV2.1-NO-ROLLBACK] surface this skill's category alongside the orchestrator's priority code. See ship-reviewed-prs/reference-personas.md § IN → Delegation to ship-devops for the full direct-emit-vs-delegate rubric.ship-secure-code — SEC7 owns hardcoded-secret-literal-in-code (the data leak). DEV5 owns sourcing-discipline (vault client, 12-factor, default-on-missing). On the same line both could fire; ship-secure-code wins for the user-facing finding, ship-devops adds a cross-reference. SEC1.4 (over-privileged service account) cross-cuts DEV4 (container USER) and DEV3 (IaC IAM); same tier-1 fires only once via the delegation parent. See reference.md § Anti-overlap for the full boundary.ship-clean-code — File-level code quality. DEV reviews operability, not style. A poorly-named Terraform variable is ship-clean-code; a Terraform module that mutates state without terraform plan is DEV3.ship-tested-code — Test design. DEV1 reviews whether tests run in CI, gate merge, and fail fast — not whether they're well-designed. The two are non-overlapping by intent.ship-debugged-code — Use after an incident to design the regression test, then run this skill to review the pipeline change that lands the fix.For deeper analysis, load supporting reference files alongside this SKILL.md:
reference.md — Methodology, sources (Three Ways, Four Types, CAMS), cross-cutting principles, anti-overlap with sibling skills, output schema for delegation.reference-categories.md — DEV1-DEV12 deep rubric: antipatterns, canonical fixes, false-positive notes, cross-references.ci-github-actions.md — GitHub Actions specific patterns (workflow YAML, action pinning, secret usage, gating).iac-terraform.md — Terraform-specific patterns (state, plan-gate, modules, drift, moved {}).container-docker.md — Dockerfile + compose patterns (USER, multi-stage, digest pinning, healthcheck, .dockerignore).k8s.md — Kubernetes manifest patterns (probes, resources, securityContext, PDB, rollout strategy, HPA).observability.md — Logging/metrics/tracing patterns (structured logs, golden signals, correlation, dashboards as code).overrides.example.md — Template for team overrides.examples/review-example.md — End-to-end review on a sample diff.examples/fix-example.md — One finding walked from identification through fix and verification.tests/ — Self-test fixtures (sample input + expected report).Paths are relative to this SKILL.md. Load on-demand when doing thorough reviews or when the user asks for detailed guidance on a specific topic.
npx claudepluginhub ship-it-ops/ship-code --plugin ship-secure-codeProvides CDSS development patterns for drug interaction checking, dose validation, clinical scoring (NEWS2, qSOFA), and alert classification integrated into EMR workflows.