From research-skills
Use when creating or modifying a Dockerized service - guides the deps+app two-stage Dockerfile pattern, docker-compose structure, entrypoint conventions, and env-driven configuration
How this skill is triggered — by the user, by Claude, or both
Slash command
/research-skills:docker-split-serviceThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
Build Docker services using a **two-stage split**: a heavy `Dockerfile.deps` for dependencies and a thin `Dockerfile.app` for application code. App-only changes rebuild in seconds, not minutes.
Build Docker services using a two-stage split: a heavy Dockerfile.deps for dependencies and a thin Dockerfile.app for application code. App-only changes rebuild in seconds, not minutes.
Every Dockerized service MUST follow the conventions below — no deviations on naming, layout, or version handling. Consistency across services is the whole point. If an existing service in the codebase looks different, fix it to match before adding more on top.
Framework note: examples below use PyTorch as the concrete framework, since most services in this codebase are PyTorch-based. The pattern applies identically to other frameworks (TensorFlow, JAX, …) — substitute the relevant
<vendor>/<image>and version axes.Placeholder note:
<project>= the service/repo identifier (e.g. the folder name),<repo>= the image registry namespace,<host-port>= the service-specific host port chosen per R9. All snippets use/<project>for the in-container app home — substitute consistently; do not leave literal<project>tokens in committed files.
Do NOT pick PYTORCH_VER / CUDA_VER / CUDNN_VER (or their equivalents) from your head. Guessing silently produces images that won't run on the user's hardware, and the failure only shows up after a long build.
Ask the user for versions up front. Single question:
"Which PyTorch / CUDA / cuDNN versions should this service target? If you don't have specific pins, tell me the target GPU and the host CUDA driver version (
nvidia-smitop-rightCUDA Version:) and I'll pick a compatible stack."
If the user gives versions, verify the triple before writing a file:
pytorch/pytorch:${PYTORCH_VER}-cuda${CUDA_VER}-cudnn${CUDNN_VER}-devel tag actually exists on Docker Hub (check, don't assume).TORCH_CUDA_ARCH_LIST (with +PTX fallback) when building custom CUDA extensions.If the user gave only a GPU, derive the stack from it, then confirm with the user before writing files. Consult the current PyTorch and NVIDIA compatibility docs — do not rely on internalized tables. Special-hardware images (Jetson / L4T, ROCm, etc.) require the R8 fallback path.
If any check fails, stop and propose the nearest working triple. Do not silently downgrade or swap versions — surface the conflict and let the user confirm.
Record the decision in docker/serve/README.md under a "Version pins" heading (target GPU, host driver constraint, chosen triple) so the next agent doesn't re-derive it.
Only after these steps succeed do you start writing Dockerfiles per the rules below.
<repo-root>/
docker/
serve/
Dockerfile.deps # Heavy deps layer
Dockerfile.app # Thin app layer
docker-compose.yaml # Local compose (context resolves to repo root)
entrypoint.sh
app.py # Application code (e.g. FastAPI)
README.md
docker-compose.server.yml # Root-level compose — REQUIRED
.dockerignore # REQUIRED at repo root
.env.example # REQUIRED — documents every env var the compose reads
docker-compose.server.yml is required — it is the primary entry point. Do NOT skip it and put compose only in docker/serve/.docker/serve/docker-compose.yaml is also required, for symmetry and convenience when working inside that directory. Both files build identical images and use context resolving to the repo root.docker-compose.yml at root, serve.yml, compose.server.yaml, etc.Monorepo layout. If the repo contains multiple independent services (each with its own framework/deps), treat each subproject as its own <repo-root> — i.e. each subproject directory gets its own docker/serve/, its own root-level docker-compose.server.yml (at the subproject root, not the monorepo root), its own .dockerignore, and its own .env.example. Do not hoist a single compose file to the monorepo root to "cover" them all; do not bury the only compose file inside docker/serve/ of a subproject.
Required companion files (content contract):
.dockerignore (repo root) — minimum content:
.git
.gitignore
.venv
__pycache__
*.pyc
.pytest_cache
.mypy_cache
.ruff_cache
.ipynb_checkpoints
node_modules
.env
.env.*
!.env.example
# DO NOT ignore checkpoints/configs/data if Dockerfile.deps COPYs them
If Dockerfile.deps copies ./checkpoints or ./data, those paths must NOT be ignored — verify after writing.
.env.example (repo root) — lists every variable the compose files read, with the same defaults, as documentation:
# Image / build
SERVER_IMAGE=<repo>/<project>
SERVER_DEPS_DOCKERFILE=docker/serve/Dockerfile.deps
SERVER_APP_DOCKERFILE=docker/serve/Dockerfile.app
# Versions (see R2)
PYTORCH_VER=2.7.1
CUDA_VER=12.8
CUDNN_VER=9
# Runtime
SERVER_PORT=<host-port> # per R9
SERVER_CONTAINER_PORT=8080
SERVER_GPUS=all
DEVICE=cuda:0
Agents never commit a .env — only .env.example.
app.py contract — the FastAPI (or equivalent) application must expose at least:
GET /health → 200 {"status": "ok"}. Used by smoke tests and orchestrators.POST /predict) documented in README.md.README.md (under docker/serve/) must document, at minimum:
SERVER_PORT default and what it maps to,app.py (model paths, device, etc.),docker compose commands to build and run (copy from the Build Workflow section, adjusted for this project).Each version component is its own ARG. Compose the base image tag from those parts inside the FROM line.
WRONG — bumping any axis forces a string edit and breaks tag derivation:
ARG BASE_TAG="2.7.1-cuda12.8-cudnn9-devel"
FROM pytorch/pytorch:${BASE_TAG}
RIGHT — each axis pinned independently (PyTorch shown as the typical case):
ARG PYTORCH_VER="2.7.1"
ARG CUDA_VER="12.8"
ARG CUDNN_VER="9"
FROM pytorch/pytorch:${PYTORCH_VER}-cuda${CUDA_VER}-cudnn${CUDNN_VER}-devel
This applies to every dependency: framework version, CUDA, cuDNN, library pins, Python pin if non-default, etc. Each gets its own ARG <NAME>_VER="<x.y.z>" with a default. Override via compose args: from env vars.
For non-PyTorch projects, swap PYTORCH_VER for the equivalent (TF_VER, JAX_VER, …) and the base image (tensorflow/tensorflow:..., etc.). The structure is identical.
Never hardcode pip index URLs that bake in a CUDA version either — derive from the CUDA arg or pass INDEX_URL as its own ARG.
WRONG:
image: ${SERVER_IMAGE:-<repo>/<project>}:${IMAGE_TAG:-pytorch2.7-cuda12.8}-base
(The version part is a fixed string — bumping PYTORCH_VER doesn't change the tag, so two different builds collide on the same image.)
RIGHT:
image: ${SERVER_IMAGE:-<repo>/<project>}:pytorch${PYTORCH_VER}-cuda${CUDA_VER}-cudnn${CUDNN_VER}-base
The tag MUST include every version axis that affects the image. Bumping any version automatically produces a new tag — no collisions, no need to remember to bump IMAGE_TAG separately.
Use exactly these names. Do not introduce parallel vocabulary (HOST_PORT vs SERVER_PORT, IMAGE_REPO vs SERVER_IMAGE, DEPS_DOCKERFILE vs SERVER_DEPS_DOCKERFILE, etc.).
| Variable | Purpose | Default |
|---|---|---|
SERVER_IMAGE | Image repo (no tag) | <repo>/<project> |
SERVER_DEPS_DOCKERFILE | Path to Dockerfile.deps | docker/serve/Dockerfile.deps |
SERVER_APP_DOCKERFILE | Path to Dockerfile.app | docker/serve/Dockerfile.app |
SERVER_PORT | Host port | service-specific |
SERVER_CONTAINER_PORT | Container port | 8080 |
SERVER_GPUS | GPU spec | all |
PYTORCH_VER, CUDA_VER, CUDNN_VER | Base image axes (PyTorch case) | per project |
<DEP>_VER | Pinned dependency versions | per project |
DEVICE | Runtime device (cuda:0, cpu, …) | cuda:0 |
<project>-server-base and <project>-serverservices:
<project>-server-base: # deps image — no runtime config
...
<project>-server: # app image — ports, env, gpus
...
Not <project>-base. Not <project>-app. Always -server-base and -server. The container_name on the runtime service matches: container_name: <project>-server.
gpus, not deploy.resourcesWRONG (verbose, swarm-only semantics, harder to override):
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: 1
capabilities: [gpu]
RIGHT:
gpus: ${SERVER_GPUS:-all}
Dockerfile.appDockerfile.app does only: FROM ${BASE_IMAGE}, WORKDIR, COPY of code/entrypoint, chmod +x, ENTRYPOINT. Any RUN apt-get, RUN pip install, model downloads, etc. belong in Dockerfile.deps. If the app layer takes more than a few seconds to rebuild, it is wrong.
If the project uses a framework with a maintained image (e.g. pytorch/pytorch:... for PyTorch), base on it: interpreter, package manager, runtime libs, and CUDA are already wired up.
Falling back to a raw nvidia/cuda:...-devel (or any non-framework base) is allowed only when the framework image genuinely doesn't fit. When you do, you MUST leave a comment block at the top of Dockerfile.deps explaining why — so the next agent reading the file can verify the reason still holds before preserving the fallback.
The comment must state:
Example:
# BASE IMAGE FALLBACK — using nvidia/cuda instead of pytorch/pytorch.
# Reason: this project requires Python 3.12, but pytorch/pytorch only
# publishes 3.11 images as of 2026-04. Re-evaluate after upstream
# ships 3.12 tags and switch back to pytorch/pytorch:${PYTORCH_VER}-...
ARG CUDA_VER="12.8"
ARG CUDNN_VER="9"
FROM nvidia/cuda:${CUDA_VER}.0-cudnn${CUDNN_VER}-devel-ubuntu22.04
No comment, no fallback — default back to the official framework image.
Every service's SERVER_PORT default MUST be unique across the monorepo. Allocate sequentially in the 180xx range (e.g. 18080, 18081, 18082, …) and record the mapping in a top-level PORTS.md (one line per service: <project> — 180xx — one-line purpose). When adding a service, append the next free number; do not reuse or guess.
Default is always GPU. The bare docker compose up path must launch the GPU service; no flag required.
A CPU variant is added ONLY when the underlying model genuinely supports CPU inference (small models, pure-Python pipelines, debug workflows). For GPU-only models (large diffusion, FlashAttention kernels, etc.), do not add a CPU profile — document "GPU required" in the README and move on.
When CPU IS supported, split into compose profiles so the runtime is explicit:
<project>-server:
profiles: ["gpu"] # default profile
gpus: ${SERVER_GPUS:-all}
environment:
DEVICE: ${DEVICE:-cuda:0}
# …
<project>-server-cpu:
profiles: ["cpu"] # opt-in
extends:
service: <project>-server
# no gpus key — CPU host compatible
environment:
DEVICE: cpu
Default profile resolution: set COMPOSE_PROFILES=gpu in .env.example (and document it in README) so docker compose up runs GPU without extra flags. CPU use is an explicit opt-in: docker compose --profile cpu up -d. Never rely on unsetting SERVER_GPUS to "disable" GPUs — compose still renders the gpus: key and errors on hosts without an nvidia runtime.
ARG PYTORCH_VER="2.7.1"
ARG CUDA_VER="12.8"
ARG CUDNN_VER="9"
FROM pytorch/pytorch:${PYTORCH_VER}-cuda${CUDA_VER}-cudnn${CUDNN_VER}-devel
ARG DEP_A_VER="<x.y.z>"
ARG DEP_B_VER="<x.y.z>"
ARG NUM_WORKERS="4"
ENV DEBIAN_FRONTEND=noninteractive
ENV PYTHONUNBUFFERED=1
ENV APP_HOME="/<project>"
ENV MAX_JOBS=${NUM_WORKERS}
RUN apt-get update && apt-get install -y --no-install-recommends \
git build-essential \
&& apt-get clean && rm -rf /var/lib/apt/lists/*
RUN pip install --no-cache-dir -U pip setuptools wheel
RUN pip install --no-cache-dir "dep-a==${DEP_A_VER}" "dep-b==${DEP_B_VER}"
COPY ./configs /<project>/configs
COPY ./checkpoints /<project>/checkpoints
WORKDIR /<project>
ARG BASE_IMAGE
FROM ${BASE_IMAGE}
WORKDIR /<project>
COPY ./docker/serve/app.py /<project>/docker/serve/app.py
COPY ./docker/serve/entrypoint.sh /usr/local/bin/entrypoint.sh
RUN chmod +x /usr/local/bin/entrypoint.sh
ENTRYPOINT ["/usr/local/bin/entrypoint.sh"]
BASE_IMAGE has no default — compose always supplies it. This guarantees the app layer is built against the exact deps tag we just produced.
#!/usr/bin/env bash
set -euo pipefail
cd /<project>
if [[ "${1:-serve}" == "serve" ]]; then
shift || true
exec uvicorn docker.serve.app:app \
--host 0.0.0.0 --port "${SERVER_CONTAINER_PORT:-8080}" "$@"
fi
exec "$@"
Both docker-compose.server.yml (root) and docker/serve/docker-compose.yaml follow this shape. Adjust only context so each resolves to repo root.
services:
<project>-server-base:
build:
context: .
dockerfile: ${SERVER_DEPS_DOCKERFILE:-docker/serve/Dockerfile.deps}
args:
PYTORCH_VER: ${PYTORCH_VER:-2.7.1}
CUDA_VER: ${CUDA_VER:-12.8}
CUDNN_VER: ${CUDNN_VER:-9}
DEP_A_VER: ${DEP_A_VER:-<x.y.z>}
DEP_B_VER: ${DEP_B_VER:-<x.y.z>}
NUM_WORKERS: ${NUM_WORKERS:-4}
image: ${SERVER_IMAGE:-<repo>/<project>}:pytorch${PYTORCH_VER:-2.7.1}-cuda${CUDA_VER:-12.8}-cudnn${CUDNN_VER:-9}-base
<project>-server:
build:
context: .
dockerfile: ${SERVER_APP_DOCKERFILE:-docker/serve/Dockerfile.app}
args:
BASE_IMAGE: ${SERVER_IMAGE:-<repo>/<project>}:pytorch${PYTORCH_VER:-2.7.1}-cuda${CUDA_VER:-12.8}-cudnn${CUDNN_VER:-9}-base
image: ${SERVER_IMAGE:-<repo>/<project>}:pytorch${PYTORCH_VER:-2.7.1}-cuda${CUDA_VER:-12.8}-cudnn${CUDNN_VER:-9}-server
container_name: <project>-server
gpus: ${SERVER_GPUS:-all}
ports:
- "${SERVER_PORT:-<host-port>}:${SERVER_CONTAINER_PORT:-8080}"
environment:
SERVER_CONTAINER_PORT: ${SERVER_CONTAINER_PORT:-8080}
DEVICE: ${DEVICE:-cuda:0}
# service-specific env vars below
entrypoint: ["/usr/local/bin/entrypoint.sh"]
command: ["serve"]
restart: unless-stopped
# Full build (first time or deps changed):
docker compose -f docker-compose.server.yml build
# Fast iteration (app code only):
docker compose -f docker-compose.server.yml build <project>-server
docker compose -f docker-compose.server.yml up -d --no-build <project>-server
# Bump versions without editing files:
PYTORCH_VER=2.8.0 CUDA_VER=12.9 docker compose -f docker-compose.server.yml build
# CPU-only (only if the service defines a cpu profile per R10):
docker compose -f docker-compose.server.yml --profile cpu up -d
# Stop:
docker compose -f docker-compose.server.yml down
The conformance checklist is self-attested; these commands produce objective evidence. Run all four from the repo root before declaring a service conformant. All intermediate artifacts stay inside the workspace — never write to /tmp, ~, or anywhere outside the repo.
# Workspace-scoped scratch dir (add to .gitignore and .dockerignore).
mkdir -p .build
# 1. Compose renders without errors and resolves interpolation.
docker compose -f docker-compose.server.yml config > .build/rendered.yml
# 2. The rendered image tag reflects current version ARGs (should contain
# pytorch<ver>-cuda<ver>-cudnn<ver>-base and -server). Fail if fixed strings.
grep -E "image: .*pytorch[0-9].*cuda[0-9].*cudnn[0-9].*-(base|server)" .build/rendered.yml
# 3. Build base + server; server layer should finish in seconds on cache hit.
docker compose -f docker-compose.server.yml build
# 4. Health check on the running server.
docker compose -f docker-compose.server.yml up -d <project>-server
curl -fsS "http://localhost:${SERVER_PORT:-<host-port>}/health"
docker compose -f docker-compose.server.yml down
If step 2's grep returns nothing, the tag is hardcoded — fix R3 before proceeding. If step 4 fails, app.py is missing the required /health endpoint — fix the app.py contract before proceeding.
Add .build/ to both .gitignore and .dockerignore so verification artifacts never leak into commits or image contexts.
Run through this every time you touch a service. Any "no" is a bug to fix, not a style preference.
docker-compose.server.yml exists.docker/serve/docker-compose.yaml exists and matches the root one.docker/serve/ contains Dockerfile.deps, Dockerfile.app, entrypoint.sh, app.py, README.md..dockerignore exists and does NOT ignore paths that Dockerfile.deps COPYs (checkpoints/, configs/, data/, …)..env.example exists, lists every var the compose files read, and is committed (real .env is gitignored).docker-compose.server.yml, .dockerignore, .env.example at the subproject root — no single hoisted compose at monorepo root.app.py exposes GET /health returning 200.docker/serve/README.md documents purpose, SERVER_PORT, non-standard env vars, and the build/run commands.ARG.ARG values — bumping a version automatically yields a new tag.IMAGE_REPO, HOST_PORT, IMAGE_TAG, DEPS_DOCKERFILE.<project>-server-base and <project>-server.gpus:, not deploy.resources.reservations.devices.Dockerfile.app contains zero RUN apt-get / RUN pip install / download steps.ARG <NAME>_VER with a default, plumbed through compose args:.Dockerfile.deps has a # BASE IMAGE FALLBACK — ... comment with the rejected image, the blocker, and a re-evaluation hint.SERVER_PORT default is unique across the monorepo and recorded in PORTS.md (R9)./health returns 200).| Mistake | Fix |
|---|---|
Monolithic BASE_TAG lump | Split into per-axis ARGs (R2) |
| Image tag with hardcoded version substring | Derive from version ARGs (R3) |
Mixed naming (HOST_PORT here, SERVER_PORT there) | Use the R4 vocabulary everywhere |
Compose only at root, or only in docker/serve/ | Both files required (R1) |
Service named -base / -app | Use -server-base / -server (R5) |
deploy.resources.reservations.devices for GPUs | Top-level gpus: (R6) |
Pip installs in Dockerfile.app | Move to Dockerfile.deps (R7) |
nvidia/cuda for a PyTorch project with no fallback comment | Use pytorch/pytorch:... or document the blocker (R8) |
Forgetting chmod +x on entrypoint | RUN chmod +x in Dockerfile.app |
CMD instead of ENTRYPOINT + command | ENTRYPOINT for the script, compose command for the mode |
| Rebuilding everything after app change | Only rebuild the -server service, not the base |
.dockerignore accidentally excludes checkpoints/ that deps copies | Whitelist or remove the pattern; verify with docker compose build |
Two services defaulting to the same SERVER_PORT | Allocate a new 180xx slot and record in PORTS.md (R9) |
Emptying SERVER_GPUS to "go CPU" | Use the cpu compose profile (R10) |
Writing verification artifacts to /tmp or ~ | Use workspace-local .build/, gitignored |
| Monorepo with one hoisted compose covering multiple subprojects | Each subproject gets its own docker-compose.server.yml (R1) |
app.py without /health | Add it — verification step 4 depends on it |
When you encounter an existing service that doesn't match, do not extend it as-is. Migrate first:
BASE_TAG into per-axis ARGs (PYTORCH_VER, CUDA_VER, CUDNN_VER, …).ARGs.-server-base / -server.deploy.resources GPU block with top-level gpus:.SERVER_PORT and add the service to PORTS.md (R9).cpu compose profile (R10) — or drop CPU support entirely if the model doesn't actually run on CPU..dockerignore, .env.example, app.py /health endpoint, docker/serve/README.md.Guides creation, editing, and verification of skills for AI coding agents using test-driven development with subagent scenarios. Use when authoring or debugging skills.
npx claudepluginhub jakoerror/claude-research-skills --plugin research-skills