From Mind-Vault
Deploy Docker Compose web applications across any project — branch strategy, change-aware scripts, database backup + rollback safety, screen-session remote execution, Let's Encrypt SSL, health checks, and CI/CD wiring.
How this skill is triggered — by the user, by Claude, or both
Slash command
/mv:deploymentThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
Production deployment pattern for containerised web applications using Docker Compose. Prioritises **safety** (automatic backups before destructive steps, rollback-ready state, mandatory screen sessions for remote runs) over raw speed. **This is not a zero-downtime pattern** — it's a stop/rebuild/start cycle with health verification. Brief outages (5–30 s) are expected on updates.
README.mdVERSIONreferences/CICD.mdreferences/CONTAINER_DNS_NSS.mdreferences/DJANGO_DEPLOYMENT.mdreferences/HARDENING.mdreferences/MONITORING.mdreferences/ROOTLESS_DOCKER.mdreferences/SCREEN_SESSIONS.mdreferences/SHELL_INSTALLERS.mdscripts/README.mdscripts/backup_db.shscripts/deploy.shscripts/deploy_first_time.shscripts/deploy_update.shscripts/harden_server.shscripts/restore_db.shscripts/setup_server.shscripts/verify_deployment.shProduction deployment pattern for containerised web applications using Docker Compose. Prioritises safety (automatic backups before destructive steps, rollback-ready state, mandatory screen sessions for remote runs) over raw speed. This is not a zero-downtime pattern — it's a stop/rebuild/start cycle with health verification. Brief outages (5–30 s) are expected on updates.
This skill covers:
.env handlingTRIGGER when: preparing or executing a production deploy for a Docker Compose application; setting up the initial deploy.sh + backup_db.sh + verify_deployment.sh toolchain; wiring a CI/CD pipeline that invokes the deploy scripts; handling rollback after a failed deploy.
SKIP for: local dev-container startup (docker compose up), single-container toy apps, Kubernetes-based deploys (different contract — use Helm/ArgoCD), PaaS targets (Heroku/Fly/Railway — they own their own deploy verb and these scripts will conflict).
Separate production from development:
deployment (or production) branch for releases.main into deployment explicitly — no auto-fast-forward.deployment: require review, disallow force-push.git checkout deployment
git merge <stable-sha>
git push origin deployment
Typical Docker Compose stack:
| Service | Role |
|---|---|
web | Application container (Django, Rails, Node, …) |
nginx | Reverse proxy, SSL termination, static file serving |
db | PostgreSQL / MySQL |
cache | Redis / Memcached |
worker | Background jobs (Celery / Sidekiq / BullMQ) |
certbot | Let's Encrypt renewals |
storage (opt.) | MinIO / S3-compatible object store |
Principles:
Canonical toolchain lives in scripts/ (or tools/) at the repo root:
deploy.sh — auto-detecting wrapper (first-time vs update)deploy_first_time.sh — initial deploy + data seeddeploy_update.sh — change-aware updatebackup_db.sh — database snapshotrestore_db.sh — counterpart restoreverify_deployment.sh — post-deploy health checksReference implementations live in this skill's scripts/ directory — copy them into your project and customise.
Auto-detecting wrapper:
#!/bin/bash
# scripts/deploy.sh
SERVICES_RUNNING=$(docker compose ps | grep -qE "Up|running" && echo true || echo false)
if [ "$SERVICES_RUNNING" = "true" ]; then
exec ./scripts/deploy_update.sh "$@"
else
exec ./scripts/deploy_first_time.sh "$@"
fi
Project-root detection (survives symlinks and arbitrary cwd):
if [ -n "$PROJECT_ROOT" ]; then
cd "$PROJECT_ROOT"
elif git rev-parse --git-dir > /dev/null 2>&1; then
cd "$(git rev-parse --show-toplevel)"
else
echo "Warning: using current directory as project root"
fi
Deploy scripts must support non-interactive invocation for CI and screen sessions:
DEPLOY_NON_INTERACTIVE=1 ./tools/deploy.sh
# or
./tools/deploy.sh --yes
Why: screen allocates a TTY, so [ -t 0 ] returns true inside the session. A prompt in the deploy script then blocks forever with no stdin attached. Explicit non-interactive mode forces safe defaults regardless of TTY presence. CI runners have the same problem.
The update script decides what work is needed by diffing current and previous deploy:
PREVIOUS=$(git rev-parse HEAD@{1} 2>/dev/null || echo "")
if [ -n "$PREVIOUS" ]; then
HAS_MIGRATIONS=$(git diff "$PREVIOUS" HEAD --name-only | grep -q "migrations/" && echo true || echo false)
HAS_DEPENDENCIES=$(git diff "$PREVIOUS" HEAD --name-only | grep -qE "requirements.*\.txt|Dockerfile|package(-lock)?\.json|Gemfile(\.lock)?" && echo true || echo false)
HAS_STATIC=$(git diff "$PREVIOUS" HEAD --name-only | grep -qE "\.(css|js|scss|png|jpg|svg)" && echo true || echo false)
fi
# Dependency rebuild may touch static assets the grep missed — force a static rebuild.
[ "$HAS_DEPENDENCIES" = "true" ] && HAS_STATIC=true
File-diff change detection is necessary but not sufficient when an expensive post-deploy operation (full corpus reindex, CDN purge, search-engine wipe-and-rebuild, vector-index rebuild, ML-model warm-up) depends on a subset of the files that flip the diff. The first layer answers "did anything in this area change?"; you need a second layer to answer "did the thing the expensive op actually depends on change?"
Pattern:
.env already in scope)..deploy_<topic>_shape) after the op succeeds.FORCE_<TOPIC>=1 env override for the operator-driven cases the gate can't predict (env-only fixes, manual recovery).SHAPE_FILE="$PROJECT_ROOT/.deploy_embedding_shape"
compute_embedding_shape() {
# Fingerprint the inputs the expensive op depends on — provider, model,
# dimension, API base, index version. Empty values included verbatim so
# unset → set transitions register as a change.
printf '%s|%s|%s|%s|%s|%s\n' \
"${EMBEDDING_PROVIDER:-}" "${EMBEDDING_DIMENSION:-}" \
"${EMBEDDING_MODEL_NAME:-}" "${GEMINI_EMBEDDING_MODEL:-}" \
"${EMBEDDING_API_BASE_URL:-}" "${INDEX_VERSION:-}"
}
CURRENT=$(compute_embedding_shape)
PREVIOUS=$([ -f "$SHAPE_FILE" ] && cat "$SHAPE_FILE" || echo "")
if [ "$FORCE_REINDEX" = "1" ] || [ -z "$PREVIOUS" ] || [ "$CURRENT" != "$PREVIOUS" ]; then
# Gate the marker write on the op's exit status so a failed reindex
# leaves the PREVIOUS marker intact — next deploy retries. Without
# explicit gating (or `set -e`), `echo > $SHAPE_FILE` on the next
# line would run unconditionally and silently mask the failure.
if run_expensive_reindex; then
echo "$CURRENT" > "$SHAPE_FILE"
else
echo "❌ reindex failed — marker NOT written; next deploy will retry." >&2
exit 1
fi
else
echo "🔍 Shape unchanged — skipping reindex."
fi
Anti-pattern this teaches against: gating an expensive post-deploy op on raw dependency-file diff (grep -qE "<subsystem>/" diff). The subsystem changed (new endpoint, refactor, test added) but the thing the op depends on didn't. Running anyway burns whatever the op costs — for a vector-index rebuild against a pay-per-token embedding API, that can be a rate-limit cliff and a real bill. The gate moves the decision from "did any file in the subsystem change?" to "did the op's actual contract change?", which is a much smaller surface.
Failure-mode discipline: write the marker only after the op succeeds. The if run_expensive_reindex; then ... fi shape above does this explicitly — without it (run_expensive_reindex; echo "$CURRENT" > "$SHAPE_FILE"), a failed reindex would still write the marker, and the next deploy would skip the retry, silently masking the failure. If the op is non-atomic (multi-stage), wrap only the final stage in the if so partial-success doesn't poison the marker.
Database backup is mandatory before any schema change:
if [ "$HAS_MIGRATIONS" = "true" ]; then
./scripts/backup_db.sh # pre-migration snapshot
fi
docker compose exec -T web python manage.py migrate
./scripts/backup_db.sh # post-migration snapshot (clean rollback point)
Backup naming convention:
data/db_backup_YYYYMMDD_HHMMSS_<commit-sha>.sql.tar.gz
The commit sha in the filename lets you pair any backup with the exact code that produced the schema — critical for restore-plus-revert.
Remote deploys must run inside a screen session so they survive SSH disconnects, network blips, and terminal closures.
Canonical form:
ssh [email protected] << 'EOF'
cd /opt/myapp
SESSION="myapp-deploy-$(date -u +%Y%m%d-%H%M%S)"
LOG="deploy-$(date -u +%Y%m%d-%H%M%S).log"
screen -dmS "$SESSION" bash -c \
"DEPLOY_NON_INTERACTIVE=1 ./tools/deploy.sh 2>&1 | tee $LOG"
echo "Session: $SESSION | Log: $LOG"
sleep 3 && tail -n 50 "$LOG"
EOF
Full recipe book — naming conventions, monitoring, attaching/detaching, cleanup, long-rebuild handling, troubleshooting — is in references/SCREEN_SESSIONS.md.
Host-key verification: never use -o StrictHostKeyChecking=no in automated paths. A MITM can then silently substitute for the deploy target and steal the deploy key. Populate known_hosts in advance (ssh-keyscan -H host >> ~/.ssh/known_hosts), or pin the expected fingerprint as a secret and verify it before connecting.
Record each production deploy in the repo for later forensics:
docs/deployment/sessions/{host}_yyyymmdd-hhmm.md (e.g. myapp_com_20260417-1415.md)status: PENDING before the run and flip to ✅ / ❌ at the end. Do not fork the plan into a separate ROLLOUT_PLAN_*.md — the session file is the single artefact and the PENDING status handles the pre-rollout phase. A plan-doc-plus-session-file split produces the two-narrative drift every release; the single file stays in sync by construction.docs/deployment/sessions/_template.md with PENDING / SUCCESSFUL / FAILED status slots; copy per run.This is the paper trail humans need when something goes wrong three months later and they're reconstructing the release timeline.
All configuration comes from environment variables loaded from .env:
set -a
source .env
set +a
When .env changes, force-recreate containers that consume it:
docker compose up -d --force-recreate web worker
.env is never committed. Provide .env.example with all keys and safe defaults. Rotate secrets on staff changes.
docker-compose.yml:
certbot:
image: certbot/certbot
command: >
certonly --webroot --webroot-path=/var/www/certbot
--email $$CERTBOT_EMAIL --agree-tos --no-eff-email -d $$DOMAIN
volumes:
- certbot-etc:/etc/letsencrypt
- certbot-var:/var/lib/letsencrypt
- ./nginx/www:/var/www/certbot
nginx vhost:
server {
listen 443 ssl http2;
server_name ${DOMAIN};
ssl_certificate /etc/letsencrypt/live/${DOMAIN}/fullchain.pem;
ssl_certificate_key /etc/letsencrypt/live/${DOMAIN}/privkey.pem;
# proxy configuration…
}
Renewal runs on a cron/timer or via docker compose run --rm certbot renew — do not couple renewal to the main deploy flow.
After every deploy, verify. HEALTH_URL below is a project-supplied variable that points at a cheap JSON health endpoint when one exists (/api/health, /healthz, etc.) and falls back to / (landing page 200) when the project doesn't expose a dedicated probe. Never hard-code /api/health into a project's verify script without confirming the endpoint actually exists — a 404 from the liveness probe masks real signal.
docker compose ps # containers healthy
curl -fsI https://${DOMAIN} # external HTTPS reachable
curl -fsS https://${DOMAIN}${HEALTH_URL:-/} # project health endpoint (JSON if available; else landing page)
docker compose exec -T web python manage.py check # framework-level sanity
Wrap these in verify_deployment.sh with a retry loop — services routinely take 10–30 s after container start to accept traffic, and failing on the first curl produces false-alarm rollbacks.
⚠️ Human-operated only. Rollback involves destructive git operations (
reset --hard, force-push) and potentially destructive database operations (restoring a backup over live data). PerRULE_git-safety, AI agents must not execute these steps. Agents prepare the commands, show them, and wait for the human to run them.
Code rollback (requires temporarily relaxed branch protection on deployment):
git checkout deployment
git reset --hard <last-good-sha>
git push origin deployment --force-with-lease # human-initiated only
--force-with-lease refuses to overwrite remote changes the operator hasn't seen, which is the minimum safety net on a destructive push.
Redeploy from rolled-back state:
git pull origin deployment
./scripts/deploy.sh
Database rollback:
./scripts/restore_db.sh data/db_backup_YYYYMMDD_HHMMSS_<sha>.sql.tar.gz
docker compose restart web worker
Paired rollback (code and data together): restore the DB backup whose <sha> suffix matches the code you're reverting to. Mismatched pairs cause schema drift and the app will fail in surprising ways.
.PHONY: deploy deploy-first deploy-update backup-db restore-db
.PHONY: start stop restart logs health-check
deploy: ## Auto-detect and run appropriate deploy
./tools/deploy.sh
deploy-first: ## First-time deploy + seeding
./tools/deploy_first_time.sh
deploy-update: ## Update existing deploy (change-aware)
./tools/deploy_update.sh
backup-db: ## Snapshot database
./tools/backup_db.sh
restore-db: ## Restore from backup: make restore-db FILE=...
@test -n "$(FILE)" || { echo "Usage: make restore-db FILE=backup.tar.gz"; exit 1; }
./tools/restore_db.sh "$(FILE)"
start: ## Start all services
docker compose up -d
stop: ## Stop all services
docker compose down
restart: ## Restart all services
docker compose restart
logs: ## Tail logs
docker compose logs -f
health-check: ## Run verification suite
./tools/verify_deployment.sh
For automated pipelines (GitHub Actions, GitLab CI), see references/CICD.md. Short rules:
deploy.sh — it does not reimplement the logic.main push.migrations/ diff.known_hosts, not bypassed with StrictHostKeyChecking=no.docker compose pull && up -d suffices.docker compose ps db — is the container up and healthy?docker compose exec web nc -zv db 5432docker compose logs dbdocker compose exec web python manage.py showmigrations.docker compose logs certbot — renewal output.dig +short $DOMAIN.certbot certificates shows expiry; Let's Encrypt enforces renewal limits.ls -la /opt/myapp.chmod 600 ~/.ssh/id_rsa.root? Apply the ownership-bypass pattern from the django skill (chown from inside the container using the host UID/GID).docker compose logs <service>.docker compose exec web curl -f localhost:8000${HEALTH_URL:-/}.docker stats and dmesg | tail on the host..env changes don't take effectContainers cache env vars at start. After editing .env:
docker compose up -d --force-recreate <service>
Plain restart does not reload the environment.
Applies to any Docker Compose web application because the pattern is rooted in:
git diff (language-agnostic)The deploy shape is identical for Django, Rails, Express, FastAPI, Phoenix — only the service names in docker-compose.yml and the test/migration commands differ.
Framework-specific notes below are illustrative, not prescriptive.
Django — web (Daphne ASGI) + db (Postgres) + redis + celery + certbot. Migration safety and static collection are the high-risk steps; both are covered by change detection. See references/DJANGO_DEPLOYMENT.md for specifics.
Rails — web (Puma) + db (Postgres) + redis + sidekiq + certbot. rails db:migrate replaces manage.py migrate; the asset pipeline runs on dependency change.
Node.js / Next.js — web (node process manager) + db + certbot. Build step (npm run build) runs on dependency change; static output served via nginx.
scripts/ — reference implementations (deploy.sh, backup_db.sh, verify_deployment.sh, setup_server.sh, harden_server.sh)scripts/README.md — per-script usage and customisationgetaddrinfo shadowing public DNS inside containers; anchor case: sync_domains silent drop on fresh Debian VPS when hostname matches domaininstall/install-*.sh; installer-specific catalog (chown, marker blocks, opt-out sweep, target-user resolution) distilled from review-loop cycles across PRs #55/#58/#59, with the language-general entries stubbed down into the shell layerscreen … bash -c, cron, systemd) miss the profile DOCKER_HOST and hit the dead rootful socket → idempotent deploy misreads "first-time". Fix with docker context use rootless (shell-independent) + a DOCKER_HOST auto-detect helperGuides creation, editing, and verification of skills for AI coding agents using test-driven development with subagent scenarios. Use when authoring or debugging skills.
npx claudepluginhub infohata/mind-vault --plugin mv