← Back to Cloud Computing

How to Build a Docker Image: Dockerfile Best Practices

A Dockerfile is a text file containing instructions to build a Docker image layer by layer. Each instruction creates a new layer, and the final image is a complete, portable snapshot of your application with all dependencies. Efficient Dockerfiles reduce image size, improve build speed, and enhance security—critical for production deployments.

Understanding Docker Image Layers and Caching

Docker images aren't monolithic blobs. They're built as stacked layers, and understanding this is fundamental to writing efficient Dockerfiles. When you run docker build, Docker executes each instruction and creates a layer. If nothing has changed in that layer, Docker uses the cached version instead of rebuilding it—saving significant time.

Here's the catch: cache invalidation is strict. If you modify line 5 in your Dockerfile, Docker discards the cache for line 5 and all subsequent lines. This means the order of instructions matters enormously. Place frequently changing commands (like copying source code) near the end, and stable commands (like installing system packages) near the top.

A practical example: if you install dependencies in line 3 and copy your code in line 10, changing your code invalidates only layers 10 and beyond. But if you reverse this order, changing code invalidates the entire dependency layer—forcing a complete reinstall every build.

The Optimal Dockerfile Structure

Here's a production-grade Dockerfile pattern for a Node.js application:

FROM node:18-alpine

WORKDIR /app

# Copy package files first (stable layer)
COPY package*.json ./

# Install dependencies
RUN npm ci --only=production

# Copy application code (changes frequently)
COPY . .

# Expose port
EXPOSE 3000

# Health check
HEALTHCHECK --interval=30s --timeout=3s --start-period=5s --retries=3 \
  CMD node healthcheck.js

# Run application
CMD ["node", "server.js"]

This structure follows the principle of least volatility. Package files rarely change, so they're copied first. Application code changes frequently, so it comes later. If you need to update code, only the top layers invalidate—dependencies remain cached.

Base Image Selection and Size Optimization

Your choice of base image significantly impacts final image size and security surface. Alpine Linux variants (alpine, alpine:3.18) are minimal—often 5-10MB. Full distributions like Ubuntu are larger but include more tools and libraries.

For most applications, Alpine or distroless images are ideal. Distroless images contain only your application and runtime—no shell, no package manager. They're incredibly small and secure since attackers have minimal tools to work with.

Compare these base images for Node.js:

If you need build tools (compilers, git), use a multistage build to keep the final image small:

FROM node:18 AS builder
WORKDIR /app
COPY package*.json ./
RUN npm ci

FROM node:18-alpine
WORKDIR /app
COPY --from=builder /app/node_modules ./node_modules
COPY . .
CMD ["node", "server.js"]

In this pattern, the builder stage has everything needed to compile dependencies. The final stage copies only the compiled artifacts, discarding build tools. The result is a small, production-ready image.

Security Best Practices in Dockerfiles

Dockerfiles create security vulnerabilities if not written carefully. Here's what to prioritize:

Never Run as Root

By default, containers run as the root user. If an attacker gains access to your container, they have full system privileges. Always create a non-root user:

FROM node:18-alpine

RUN addgroup -g 1001 -S nodejs && \
    adduser -S nodejs -u 1001

WORKDIR /app
COPY --chown=nodejs:nodejs . .
RUN npm ci --only=production

USER nodejs

CMD ["node", "server.js"]

The --chown flag ensures the nodejs user owns the copied files. The USER instruction switches to this non-root user before running the application.

Minimize Layers and Clean Up

Each layer increases attack surface and image size. Combine RUN commands where logical:

# Bad
RUN apt-get update
RUN apt-get install -y curl git
RUN apt-get clean

# Good
RUN apt-get update && \
    apt-get install -y curl git && \
    apt-get clean && \
    rm -rf /var/lib/apt/lists/*

The good approach combines updates, installation, and cleanup into one layer. It also explicitly removes package manager caches, reducing image bloat.

Use COPY Instead of ADD

ADD can extract tar files and fetch remote URLs. This is often unnecessary and unpredictable. Stick with COPY for explicit, transparent file operations:

# Prefer this
COPY app.js ./

# Avoid this
ADD https://example.com/app.js ./

Scan for Vulnerabilities

Use docker scout (built into Docker) or Trivy to scan images for known vulnerabilities:

docker build -t myapp:1.0 .
docker scout cves myapp:1.0

Performance Optimization Techniques

Beyond caching strategy, several techniques speed up builds and reduce image size:

Use .dockerignore Files

Similar to .gitignore, .dockerignore prevents unnecessary files from being copied:

node_modules
npm-debug.log
.git
.env
dist
build
.DS_Store
*.md

This reduces build context size, speeding up the Docker daemon's file processing.

Leverage BuildKit for Advanced Caching

Docker's BuildKit engine (enabled by default in modern Docker) supports inline caching and external cache sources:

DOCKER_BUILDKIT=1 docker build -t myapp:1.0 \
  --cache-from=type=local,src=/tmp/docker-cache \
  --cache-to=type=local,dest=/tmp/docker-cache \
  .

This preserves caches between builds, critical for CI/CD pipelines where containers are built repeatedly.

Use Specific Base Image Tags

Never use latest in production Dockerfiles. Always pin versions:

# Bad
FROM node

# Good
FROM node:18.17.1-alpine3.18

Pinning ensures reproducible builds. latest changes unpredictably, breaking your builds and potentially introducing vulnerabilities.

Common Dockerfile Mistakes to Avoid

Even experienced developers slip up. Here are frequent pitfalls:

Multi-Stage Build Example: Python Application

Here's a complete, production-ready Dockerfile for a Python Flask app:

FROM python:3.11-slim AS builder

WORKDIR /build
COPY requirements.txt .
RUN pip install --user --no-cache-dir -r requirements.txt

FROM python:3.11-slim

RUN groupadd -r appuser && useradd -r -g appuser appuser

WORKDIR /app
COPY --from=builder /root/.local /home/appuser/.local
COPY --chown=appuser:appuser . .

ENV PATH=/home/appuser/.local/bin:$PATH

USER appuser

EXPOSE 5000

HEALTHCHECK --interval=30s --timeout=3s \
  CMD curl -f http://localhost:5000/health || exit 1

CMD ["python", "-m", "flask", "run", "--host=0.0.0.0"]

This builder stage installs Python dependencies in an isolated environment. The final stage copies only the compiled packages, discarding pip cache and source files. The result is a lean, secure image.

Testing and Validating Your Dockerfile

Before pushing to production, validate your Dockerfile:

# Build the image
docker build -t myapp:test .

# Run interactively to verify
docker run -it myapp:test /bin/sh

# Check image size
docker images myapp:test

# Scan for vulnerabilities
docker scout cves myapp:test

# Inspect layers
docker history myapp:test

docker history shows each layer's size and creation command. Use it to identify bloated layers and opportunities for optimization.

Dockerfile Linting and Automation

Tools like Hadolint catch Dockerfile issues before build time:

docker run --rm -i hadolint/hadolint < Dockerfile

Hadolint catches common mistakes: unset shell options, unnecessary sudo, using latest tags, and more. Integrate it into your CI/CD pipeline to enforce standards across your team.

Frequently Asked Questions

What's the difference between COPY and ADD?

COPY simply copies files from the host to the image. ADD does the same but also supports extracting tar files and downloading remote URLs. For most use cases, COPY is clearer and safer. Use ADD only when you specifically need tar extraction.

Why does my Docker build take so long?

Cache misses are usually responsible. Check the build output for "Using cache" messages. If your frequently-changing code is near the top of the Dockerfile, every build invalidates downstream layers. Reorganize instructions to place stable dependencies first, code later. Also ensure your .dockerignore file filters large directories like node_modules.

How do I reduce my Docker image size?

Use Alpine base images or distroless variants, employ multistage builds to discard build tools, remove package manager caches, and avoid copying unnecessary files with .dockerignore. Use docker history to identify large layers, then optimize or split them. Expect 50-80% size reductions with these techniques.

Is it safe to run containers as root?

No. If an attacker compromises a container running as root, they gain full system privileges. Always create a non-root user with the RUN adduser command and switch to it with the USER instruction. This limits damage if the container is breached.