A Dockerfile is a text file containing instructions to build a Docker image layer by layer. Each instruction creates a new layer, and the final image is a complete, portable snapshot of your application with all dependencies. Efficient Dockerfiles reduce image size, improve build speed, and enhance security—critical for production deployments.
Docker images aren't monolithic blobs. They're built as stacked layers, and understanding this is fundamental to writing efficient Dockerfiles. When you run docker build, Docker executes each instruction and creates a layer. If nothing has changed in that layer, Docker uses the cached version instead of rebuilding it—saving significant time.
Here's the catch: cache invalidation is strict. If you modify line 5 in your Dockerfile, Docker discards the cache for line 5 and all subsequent lines. This means the order of instructions matters enormously. Place frequently changing commands (like copying source code) near the end, and stable commands (like installing system packages) near the top.
A practical example: if you install dependencies in line 3 and copy your code in line 10, changing your code invalidates only layers 10 and beyond. But if you reverse this order, changing code invalidates the entire dependency layer—forcing a complete reinstall every build.
Here's a production-grade Dockerfile pattern for a Node.js application:
FROM node:18-alpine
WORKDIR /app
# Copy package files first (stable layer)
COPY package*.json ./
# Install dependencies
RUN npm ci --only=production
# Copy application code (changes frequently)
COPY . .
# Expose port
EXPOSE 3000
# Health check
HEALTHCHECK --interval=30s --timeout=3s --start-period=5s --retries=3 \
CMD node healthcheck.js
# Run application
CMD ["node", "server.js"]
This structure follows the principle of least volatility. Package files rarely change, so they're copied first. Application code changes frequently, so it comes later. If you need to update code, only the top layers invalidate—dependencies remain cached.
Your choice of base image significantly impacts final image size and security surface. Alpine Linux variants (alpine, alpine:3.18) are minimal—often 5-10MB. Full distributions like Ubuntu are larger but include more tools and libraries.
For most applications, Alpine or distroless images are ideal. Distroless images contain only your application and runtime—no shell, no package manager. They're incredibly small and secure since attackers have minimal tools to work with.
Compare these base images for Node.js:
node:18 — ~900MB (Debian-based, includes build tools)node:18-alpine — ~160MB (minimal, Alpine-based)node:18-distroless — ~120MB (no shell, no package manager)If you need build tools (compilers, git), use a multistage build to keep the final image small:
FROM node:18 AS builder
WORKDIR /app
COPY package*.json ./
RUN npm ci
FROM node:18-alpine
WORKDIR /app
COPY --from=builder /app/node_modules ./node_modules
COPY . .
CMD ["node", "server.js"]
In this pattern, the builder stage has everything needed to compile dependencies. The final stage copies only the compiled artifacts, discarding build tools. The result is a small, production-ready image.
Dockerfiles create security vulnerabilities if not written carefully. Here's what to prioritize:
By default, containers run as the root user. If an attacker gains access to your container, they have full system privileges. Always create a non-root user:
FROM node:18-alpine
RUN addgroup -g 1001 -S nodejs && \
adduser -S nodejs -u 1001
WORKDIR /app
COPY --chown=nodejs:nodejs . .
RUN npm ci --only=production
USER nodejs
CMD ["node", "server.js"]
The --chown flag ensures the nodejs user owns the copied files. The USER instruction switches to this non-root user before running the application.
Each layer increases attack surface and image size. Combine RUN commands where logical:
# Bad
RUN apt-get update
RUN apt-get install -y curl git
RUN apt-get clean
# Good
RUN apt-get update && \
apt-get install -y curl git && \
apt-get clean && \
rm -rf /var/lib/apt/lists/*
The good approach combines updates, installation, and cleanup into one layer. It also explicitly removes package manager caches, reducing image bloat.
ADD can extract tar files and fetch remote URLs. This is often unnecessary and unpredictable. Stick with COPY for explicit, transparent file operations:
# Prefer this
COPY app.js ./
# Avoid this
ADD https://example.com/app.js ./
Use docker scout (built into Docker) or Trivy to scan images for known vulnerabilities:
docker build -t myapp:1.0 .
docker scout cves myapp:1.0
Beyond caching strategy, several techniques speed up builds and reduce image size:
Similar to .gitignore, .dockerignore prevents unnecessary files from being copied:
node_modules
npm-debug.log
.git
.env
dist
build
.DS_Store
*.md
This reduces build context size, speeding up the Docker daemon's file processing.
Docker's BuildKit engine (enabled by default in modern Docker) supports inline caching and external cache sources:
DOCKER_BUILDKIT=1 docker build -t myapp:1.0 \
--cache-from=type=local,src=/tmp/docker-cache \
--cache-to=type=local,dest=/tmp/docker-cache \
.
This preserves caches between builds, critical for CI/CD pipelines where containers are built repeatedly.
Never use latest in production Dockerfiles. Always pin versions:
# Bad
FROM node
# Good
FROM node:18.17.1-alpine3.18
Pinning ensures reproducible builds. latest changes unpredictably, breaking your builds and potentially introducing vulnerabilities.
Even experienced developers slip up. Here are frequent pitfalls:
RUN set -e && command1 | command2.Here's a complete, production-ready Dockerfile for a Python Flask app:
FROM python:3.11-slim AS builder
WORKDIR /build
COPY requirements.txt .
RUN pip install --user --no-cache-dir -r requirements.txt
FROM python:3.11-slim
RUN groupadd -r appuser && useradd -r -g appuser appuser
WORKDIR /app
COPY --from=builder /root/.local /home/appuser/.local
COPY --chown=appuser:appuser . .
ENV PATH=/home/appuser/.local/bin:$PATH
USER appuser
EXPOSE 5000
HEALTHCHECK --interval=30s --timeout=3s \
CMD curl -f http://localhost:5000/health || exit 1
CMD ["python", "-m", "flask", "run", "--host=0.0.0.0"]
This builder stage installs Python dependencies in an isolated environment. The final stage copies only the compiled packages, discarding pip cache and source files. The result is a lean, secure image.
Before pushing to production, validate your Dockerfile:
# Build the image
docker build -t myapp:test .
# Run interactively to verify
docker run -it myapp:test /bin/sh
# Check image size
docker images myapp:test
# Scan for vulnerabilities
docker scout cves myapp:test
# Inspect layers
docker history myapp:test
docker history shows each layer's size and creation command. Use it to identify bloated layers and opportunities for optimization.
Tools like Hadolint catch Dockerfile issues before build time:
docker run --rm -i hadolint/hadolint < Dockerfile
Hadolint catches common mistakes: unset shell options, unnecessary sudo, using latest tags, and more. Integrate it into your CI/CD pipeline to enforce standards across your team.
COPY simply copies files from the host to the image. ADD does the same but also supports extracting tar files and downloading remote URLs. For most use cases, COPY is clearer and safer. Use ADD only when you specifically need tar extraction.
Cache misses are usually responsible. Check the build output for "Using cache" messages. If your frequently-changing code is near the top of the Dockerfile, every build invalidates downstream layers. Reorganize instructions to place stable dependencies first, code later. Also ensure your .dockerignore file filters large directories like node_modules.
Use Alpine base images or distroless variants, employ multistage builds to discard build tools, remove package manager caches, and avoid copying unnecessary files with .dockerignore. Use docker history to identify large layers, then optimize or split them. Expect 50-80% size reductions with these techniques.
No. If an attacker compromises a container running as root, they gain full system privileges. Always create a non-root user with the RUN adduser command and switch to it with the USER instruction. This limits damage if the container is breached.