I am trying to create a Apache Spark docker container, mainly for self learning purposes.
So, I know (not enough details) but there are vulnerability issues with log4j1.2.17 jar.
Having said that, I have the following Dockerfile(some contents have been stripped off)
FROM ubuntu:20.04 #for having atleast python3.8
# avoid stuck build due to user prompt
ARG DEBIAN_FRONTEND=noninteractive
# Some apt-get commands here
# Install Spark
ENV SPARK_HOME /opt/spark
ENV PATH="${SPARK_HOME}/bin:${PATH}"
ENV PATH="${SPARK_HOME}/python:${PATH}"
ENV PYSPARK_PYTHON python3
ENV JAVA_HOME="/usr/lib/jvm/java-8-openjdk-amd64/jre"
# Download apache spark
RUN cd /
RUN wget https://dlcdn.apache.org/spark/spark-3.5.0/spark-3.5.0-bin-hadoop3.tgz
RUN tar zxvf spark-3.5.0-bin-hadoop3.tgz
RUN mv /spark-3.5.0-bin-hadoop3 $SPARK_HOME
RUN rm $SPARK_HOME/jars/log4j-1.2.17.jar <---- FAILS on this line
RUN wget -O $SPARK_HOME/jars/log4j-core-2.17.0.jar https://repo1.maven.org/maven2/org/apache/logging/log4j/log4j-core/2.17.0/log4j-core-2.17.0.jar
RUN wget -O $SPARK_HOME/jars/log4j-api-2.17.0.jar https://repo1.maven.org/maven2/org/apache/logging/log4j/log4j-api/2.17.0/log4j-api-2.17.0.jar
RUN wget -O $SPARK_HOME/jars/log4j-1.2-api-2.17.0.jar https://repo1.maven.org/maven2/org/apache/logging/log4j/log4j-1.2-api/2.17.0/log4j-1.2-api-2.17.0.jar
# Some more commands after this...
I am running the above script via a Jenkins build. The build fails with
13:47:45 #14 0.296 rm: cannot remove '/opt/spark/jars/log4j-1.2.17.jar': No such file or directory
13:47:45 #14 ERROR: process "/bin/sh -c rm $SPARK_HOME/jars/log4j-1.2.17.jar" did not complete successfully: exit code: 1
13:47:45 ------
13:47:45 > [ 9/28] RUN rm /opt/spark/jars/log4j-1.2.17.jar:
13:47:45 0.296 rm: cannot remove '/opt/spark/jars/log4j-1.2.17.jar': No such file or directory
So,
- should i just remove the
RUN rm
line? or - do an
ls
first -> check in the output if there actually exists the said jar -> then userm
command?
Any thoughts or pointers on what I might be missing?
2
Answers
Since Spark 3.3, Log4j 1.x is not used (cf. upgrade notes). Therefore you can remove the 4 lines that deal with logging.
No need to replace
log4j
because it’s already using a more recent (and secure) version. See screenshot below. The version does not have the CVE-2021-44228 vulnerability.