Our CI pipeline was embarrassing. Every PR took 8+ minutes to build a Docker image for a Next.js app. Developers complained. I ignored them for two months because “it’s just CI, ship faster code.”

Then we hit 50+ PRs per day and our CI bill jumped $400/month. Time to actually fix it.

The original Dockerfile (the bad one)

FROM node:18

WORKDIR /app
COPY . .
RUN npm install
RUN npm run build

CMD ["npm", "start"]

Looks innocent. Builds every time though. Every. Single. Time.

The problem: Docker caching works layer by layer. When we COPY . ., we copy EVERYTHING - source code, package.json, node_modules from local dev (if you forgot .dockerignore), your .git directory, random .DS_Store files. Any change to any file invalidates all layers below it.

Attempt 1: The “obvious” fix

FROM node:18

WORKDIR /app
COPY package*.json ./
RUN npm install
COPY . .
RUN npm run build

CMD ["npm", "start"]

This is what every tutorial shows you. Copy package files first, install deps, then copy code.

Build time: 6 minutes.

Better, but still slow. Why? Because npm install runs every time package.json changes, even if you just bumped the version number. And Next.js builds are still slow as hell.

What actually worked

Multi-stage builds + better caching + BuildKit.

# syntax=docker/dockerfile:1.4

FROM node:18-alpine AS deps
WORKDIR /app
COPY package.json package-lock.json ./
RUN --mount=type=cache,target=/root/.npm \
    npm ci --only=production

FROM node:18-alpine AS builder
WORKDIR /app
COPY package.json package-lock.json ./
RUN --mount=type=cache,target=/root/.npm \
    npm ci
COPY . .
RUN --mount=type=cache,target=/app/.next/cache \
    npm run build

FROM node:18-alpine AS runner
WORKDIR /app
ENV NODE_ENV production

RUN addgroup --system --gid 1001 nodejs
RUN adduser --system --uid 1001 nextjs

COPY --from=builder /app/public ./public
COPY --from=builder --chown=nextjs:nodejs /app/.next/standalone ./
COPY --from=builder --chown=nextjs:nodejs /app/.next/static ./.next/static

USER nextjs
EXPOSE 3000
CMD ["node", "server.js"]

Build time: 40 seconds on cache hit, 2 minutes on cache miss.

Let me break down what’s actually happening here.

The BuildKit magic

RUN --mount=type=cache,target=/root/.npm is the secret sauce. BuildKit (Docker’s newer build system) keeps the npm cache between builds. So when package.json changes, npm doesn’t re-download unchanged packages.

Enable it with:

export DOCKER_BUILDKIT=1
docker build .

Or in CI:

# GitHub Actions
- name: Build
  run: docker build .
  env:
    DOCKER_BUILDKIT: 1

The multi-stage breakdown

Stage 1 (deps): Install only production dependencies. These rarely change.

Stage 2 (builder): Install all dependencies (including devDependencies), then build. This layer gets invalidated most often, but we cache the npm downloads and Next.js build cache.

Stage 3 (runner): Copy only what we need to run the app. Final image is ~150MB instead of 1.2GB.

Docker containers visualization

Next.js specific tricks

Add this to next.config.js:

module.exports = {
  output: 'standalone',
}

This makes Next.js bundle only the files needed to run, not the entire node_modules. Our final image dropped from 850MB to 150MB.

Also cache the Next.js build:

RUN --mount=type=cache,target=/app/.next/cache \
    npm run build

Subsequent builds reuse unchanged pages. Huge win for large apps.

The .dockerignore nobody writes

node_modules
.next
.git
.gitignore
README.md
.dockerignore
.env*.local
*.log
.DS_Store
coverage
.vscode

We forgot this initially. Literally copied 800MB of node_modules into the builder, then ran npm install on top of it. Build times were inconsistent and weird. This file fixed that.

Cache invalidation strategy

The order matters:

  1. Copy package files → Install deps (cached unless package.json changes)
  2. Copy source code → Build app (deps cache still valid)
  3. Copy build artifacts → Final image (smallest possible)

We also tag our images by git commit:

docker build -t myapp:$(git rev-parse --short HEAD) .

CI can check if an image already exists before building:

if docker pull myapp:$GIT_COMMIT 2>/dev/null; then
  echo "Image exists, skipping build"
else
  docker build -t myapp:$GIT_COMMIT .
fi

Saves ~30% of builds in our workflow.

Layer caching in CI

GitHub Actions example:

- name: Set up Docker Buildx
  uses: docker/setup-buildx-action@v2

- name: Cache Docker layers
  uses: actions/cache@v3
  with:
    path: /tmp/.buildx-cache
    key: ${{ runner.os }}-buildx-${{ github.sha }}
    restore-keys: |
      ${{ runner.os }}-buildx-

- name: Build
  uses: docker/build-push-action@v4
  with:
    context: .
    push: false
    cache-from: type=local,src=/tmp/.buildx-cache
    cache-to: type=local,dest=/tmp/.buildx-cache-new,mode=max

- name: Move cache
  run: |
    rm -rf /tmp/.buildx-cache
    mv /tmp/.buildx-cache-new /tmp/.buildx-cache

This persists BuildKit cache between CI runs. First build: 4 minutes. Subsequent builds: 45 seconds.

What we saved

  • CI time: 8min → 40sec (87% reduction)
  • CI cost: $400/month → $150/month
  • Image size: 1.2GB → 150MB
  • Developer happiness: measurably improved

The image size reduction also made deployments faster. Pulling a 150MB image vs 1.2GB adds up when you’re deploying 20 times a day.

Would I do anything differently?

Yeah, I’d fix it two months earlier. But seriously, the only thing I’d add now is better layer caching in our Kubernetes clusters. We’re looking at using a registry proxy to cache layers closer to the cluster.

Also considering moving to pnpm instead of npm. Their cache strategy is better and they support Docker layer caching natively with their fetch command.

But for now, this works. Builds are fast, images are small, developers stopped complaining. That’s a win in my book.