skip to Main Content

I’m a little new to Docker and trying to wrap my head around some of the concepts.

In a lot of tutorials and articles (actually, almost all of them), this is a typical Dockerfile for a create-react-app and nginx configuration:

# CRA
FROM node:alpine as build-deps
WORKDIR /usr/src/app
COPY package.json package-lock.json ./
RUN npm install
COPY . ./
RUN npm run build

# Nginx
FROM nginx:1.12-alpine
COPY --from=build-deps /usr/src/app/build /usr/share/nginx/html
EXPOSE 80
CMD ["nginx", "-g", "daemon off;"]

Assuming everything works as expected, the image would be huge.

I had the idea of taking a slightly different approach. Run npm install && npm run build locally, and then have this as the Dockerfile:

FROM nginx:1.12-alpine
WORKDIR /opt/app-root

COPY ./nginx/nginx.conf /etc/nginx/
COPY ./build ./src/dist/

COPY ./node_modules .

USER 1001
EXPOSE 8080
ENTRYPOINT ["nginx", "-g", "daemon off;"]

Which approach is better? Whenever I run docker build -t app-test:0.0.1 ., it seems to me that the second approach is always faster.

5

Answers


  1. Should Dockerfile execute “npm install” and “npm run build” or should it only copy those files over?

    TL;DR: It should always execute all necessary build commands in the "build step" of the multi-stage image!

    Long answer:

    In the first example "tutorials" Dockerfile that you posted, a multi-stage build is used. With multistage builds you can discard artifacts created in previous stages and only keep those files and changes in the final image that you really need. In this case, the installed "dev" packages are not copied into the final image, thus not consuming any space. The build folder will only contain the code and the node modules required at runtime without any dev dependencies that were required in the first step of the build to compile the project.

    In your second approach, you are running npm install && npm run build ouside your Dockerfile and then copy your results into the final image. While this works, from a devops perspective it is not a good idea since you want to keep all required building instructions consistently in one place (preferably in one Dockerfile), so the next person building your image does not have to figure out how the compilation process works. Another problem with copying the build results from your local machine is that you may be running another OS with a different node version etc. and this can impact the build result. If you instead, as with the "tutorial" Dockerfile, conduct the build within the Dockerfile, you have full control over the OS and the environment (node version, node-sass libraries etc.) and everybody executing docker build will get the same compilation results (given that you pinpointed the node version of your Dockerfile base image, i.e. using FROM node:14.15.4-alpine as build-deps instead of merely FROM node:alpine as build-deps).

    One last note on the evolution of Dockerfiles. In the past, it was actually the way to go to perform the compilation outside the Dockerfile (or in another separate Dockerfile) and then to copy all the results into your final image. This matches the second approach mentioned in your OP. But for all the shortcomings mentioned above the docker architects in 2017 invented multi-stage builds. Here are some enlightening quotes from the docker blog:

    Before multi-stage build, Docker users would use a script to compile the applications on the host machine, then use Dockerfiles to build the images. Multi-stage builds, however, facilitate the creation of small and significantly more efficient containers since the final image can be free of any build tools. [And] External scripts are no longer needed to orchestrate a build.

    The same idea is reiterated in the official docs:

    It was actually very common to have one Dockerfile to use for development (which contained everything needed to build your application), and a slimmed-down one to use for production, which only contained your application and exactly what was needed to run it. This has been referred to as the “builder pattern”. Maintaining two Dockerfiles is not ideal. […] Multi-stage builds vastly simplify this situation! […] You only need the single Dockerfile. You don’t need a separate build script, either. Just run docker build. The end result is the same tiny production image as before, with a significant reduction in complexity. You don’t need to create any intermediate images and you don’t need to extract any artifacts to your local system at all.

    Login or Signup to reply.
  2. If you don’t need the node_modules tree (and for Nginx hosting a browser application, you don’t), the second approach of just copying in the built application is fine.

    There are a couple of reasons to specifically want the first approach, to run the build in Docker. If there are architecture-specific details in your build (Node packages with native extensions, for example) Docker could be a different OS and library stack than your host system, so you might not be able to directly copy in a node_modules directory. If your build is really specific to tiny fixes in the language runtime, you can force a very specific version of Node in the Dockerfile.

    Almost every browser application I’ve worked on builds fine with whatever node binary I have lying around, and once you’ve done that, the dist tree is platform-independent static files.

    "Normal" style seems to vary across languages. Java applications in particular seem to generally build the application outside Docker, and then COPY the (platform-independent) final .jar file into the image. Go tends to use a multi-stage build, copying a built binary into an extremely minimal final image. If I was writing a Node browser application it’d probably look like your first form, RUN yarn build in Docker, but I’ve seen a lot of variations on the theme.

    Login or Signup to reply.
  3. IMO first option is better on containerization perspective, because you don’t need the package (npm) on your laptop to run your applications. You just need docker.

    Login or Signup to reply.
  4. For that you could use multi-stage docker build.

    In the first container you would install all dependencies (along with the dev dependencies) and then run npm run build. It will build your app but you will have useless dev dependencies inside your node_modules. You don’t have to copy that node_modules.

    In the second container your would run npm install --production --no-audit and copy the dist directory from the first container. Now you will have your compiled code and a node_modules folder with only production modules.

    Making it lighter but the build time would be a bit longer.

    Login or Signup to reply.
  5. Building inside a container guarantees a predictable and reproducible build artifact. Running npm install on macOS and Linux can produce different node_modules, for example node-gyp.

    People often build node_modules with multi-stage build (if the actual container you’re trying to build is not a Node.js application). That is to say, your actual nginx application per se does not depend on Node.js, but the node_modules directory and its containing files instead. So we generate node_modules in a node container, and copy it to the new container (nginx).

    Thus, everyone building with the multi-stage Dockerfile will produce exact same container. If you copy your local node_modules into a container during build, other coworkers will not be able to predict the content of node_modules.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search