Our CI pipeline was embarrassing. Every PR took 8+ minutes to build a Docker image for a Next.js app. Developers complained. I ignored them for two months because “it’s just CI, ship faster code.”
Then we hit 50+ PRs per day and our CI bill jumped $400/month. Time to actually fix it.
The original Dockerfile (the bad one)
FROM node:18
WORKDIR /app
COPY . .
RUN npm install
RUN npm run build
CMD ["npm", "start"]
Looks innocent. Builds every time though. Every. Single. Time.
The problem: Docker caching works layer by layer. When we COPY . ., we copy EVERYTHING - source code, package.json, node_modules from local dev (if you forgot .dockerignore), your .git directory, random .DS_Store files. Any change to any file invalidates all layers below it.
Attempt 1: The “obvious” fix
FROM node:18
WORKDIR /app
COPY package*.json ./
RUN npm install
COPY . .
RUN npm run build
CMD ["npm", "start"]
This is what every tutorial shows you. Copy package files first, install deps, then copy code.
Build time: 6 minutes.
Better, but still slow. Why? Because npm install runs every time package.json changes, even if you just bumped the version number. And Next.js builds are still slow as hell.
What actually worked
Multi-stage builds + better caching + BuildKit.
# syntax=docker/dockerfile:1.4
FROM node:18-alpine AS deps
WORKDIR /app
COPY package.json package-lock.json ./
RUN --mount=type=cache,target=/root/.npm \
npm ci --only=production
FROM node:18-alpine AS builder
WORKDIR /app
COPY package.json package-lock.json ./
RUN --mount=type=cache,target=/root/.npm \
npm ci
COPY . .
RUN --mount=type=cache,target=/app/.next/cache \
npm run build
FROM node:18-alpine AS runner
WORKDIR /app
ENV NODE_ENV production
RUN addgroup --system --gid 1001 nodejs
RUN adduser --system --uid 1001 nextjs
COPY --from=builder /app/public ./public
COPY --from=builder --chown=nextjs:nodejs /app/.next/standalone ./
COPY --from=builder --chown=nextjs:nodejs /app/.next/static ./.next/static
USER nextjs
EXPOSE 3000
CMD ["node", "server.js"]
Build time: 40 seconds on cache hit, 2 minutes on cache miss.
Let me break down what’s actually happening here.
The BuildKit magic
RUN --mount=type=cache,target=/root/.npm is the secret sauce. BuildKit (Docker’s newer build system) keeps the npm cache between builds. So when package.json changes, npm doesn’t re-download unchanged packages.
Enable it with:
export DOCKER_BUILDKIT=1
docker build .
Or in CI:
# GitHub Actions
- name: Build
run: docker build .
env:
DOCKER_BUILDKIT: 1
The multi-stage breakdown
Stage 1 (deps): Install only production dependencies. These rarely change.
Stage 2 (builder): Install all dependencies (including devDependencies), then build. This layer gets invalidated most often, but we cache the npm downloads and Next.js build cache.
Stage 3 (runner): Copy only what we need to run the app. Final image is ~150MB instead of 1.2GB.
Next.js specific tricks
Add this to next.config.js:
module.exports = {
output: 'standalone',
}
This makes Next.js bundle only the files needed to run, not the entire node_modules. Our final image dropped from 850MB to 150MB.
Also cache the Next.js build:
RUN --mount=type=cache,target=/app/.next/cache \
npm run build
Subsequent builds reuse unchanged pages. Huge win for large apps.
The .dockerignore nobody writes
node_modules
.next
.git
.gitignore
README.md
.dockerignore
.env*.local
*.log
.DS_Store
coverage
.vscode
We forgot this initially. Literally copied 800MB of node_modules into the builder, then ran npm install on top of it. Build times were inconsistent and weird. This file fixed that.
Cache invalidation strategy
The order matters:
- Copy package files → Install deps (cached unless package.json changes)
- Copy source code → Build app (deps cache still valid)
- Copy build artifacts → Final image (smallest possible)
We also tag our images by git commit:
docker build -t myapp:$(git rev-parse --short HEAD) .
CI can check if an image already exists before building:
if docker pull myapp:$GIT_COMMIT 2>/dev/null; then
echo "Image exists, skipping build"
else
docker build -t myapp:$GIT_COMMIT .
fi
Saves ~30% of builds in our workflow.
Layer caching in CI
GitHub Actions example:
- name: Set up Docker Buildx
uses: docker/setup-buildx-action@v2
- name: Cache Docker layers
uses: actions/cache@v3
with:
path: /tmp/.buildx-cache
key: ${{ runner.os }}-buildx-${{ github.sha }}
restore-keys: |
${{ runner.os }}-buildx-
- name: Build
uses: docker/build-push-action@v4
with:
context: .
push: false
cache-from: type=local,src=/tmp/.buildx-cache
cache-to: type=local,dest=/tmp/.buildx-cache-new,mode=max
- name: Move cache
run: |
rm -rf /tmp/.buildx-cache
mv /tmp/.buildx-cache-new /tmp/.buildx-cache
This persists BuildKit cache between CI runs. First build: 4 minutes. Subsequent builds: 45 seconds.
What we saved
- CI time: 8min → 40sec (87% reduction)
- CI cost: $400/month → $150/month
- Image size: 1.2GB → 150MB
- Developer happiness: measurably improved
The image size reduction also made deployments faster. Pulling a 150MB image vs 1.2GB adds up when you’re deploying 20 times a day.
Would I do anything differently?
Yeah, I’d fix it two months earlier. But seriously, the only thing I’d add now is better layer caching in our Kubernetes clusters. We’re looking at using a registry proxy to cache layers closer to the cluster.
Also considering moving to pnpm instead of npm. Their cache strategy is better and they support Docker layer caching natively with their fetch command.
But for now, this works. Builds are fast, images are small, developers stopped complaining. That’s a win in my book.