skip to Main Content

I am trying to create a Apache Spark docker container, mainly for self learning purposes.
So, I know (not enough details) but there are vulnerability issues with log4j1.2.17 jar.
Having said that, I have the following Dockerfile(some contents have been stripped off)

FROM ubuntu:20.04 #for having atleast python3.8

# avoid stuck build due to user prompt
ARG DEBIAN_FRONTEND=noninteractive

# Some apt-get commands here

# Install Spark
ENV SPARK_HOME /opt/spark
ENV PATH="${SPARK_HOME}/bin:${PATH}"
ENV PATH="${SPARK_HOME}/python:${PATH}"
ENV PYSPARK_PYTHON python3
ENV JAVA_HOME="/usr/lib/jvm/java-8-openjdk-amd64/jre"

# Download apache spark
RUN cd /
RUN wget https://dlcdn.apache.org/spark/spark-3.5.0/spark-3.5.0-bin-hadoop3.tgz
RUN tar zxvf spark-3.5.0-bin-hadoop3.tgz
RUN mv /spark-3.5.0-bin-hadoop3 $SPARK_HOME
RUN rm $SPARK_HOME/jars/log4j-1.2.17.jar <---- FAILS on this line
RUN wget -O $SPARK_HOME/jars/log4j-core-2.17.0.jar https://repo1.maven.org/maven2/org/apache/logging/log4j/log4j-core/2.17.0/log4j-core-2.17.0.jar
RUN wget -O $SPARK_HOME/jars/log4j-api-2.17.0.jar https://repo1.maven.org/maven2/org/apache/logging/log4j/log4j-api/2.17.0/log4j-api-2.17.0.jar
RUN wget -O $SPARK_HOME/jars/log4j-1.2-api-2.17.0.jar https://repo1.maven.org/maven2/org/apache/logging/log4j/log4j-1.2-api/2.17.0/log4j-1.2-api-2.17.0.jar
# Some more commands after this...

I am running the above script via a Jenkins build. The build fails with

13:47:45  #14 0.296 rm: cannot remove '/opt/spark/jars/log4j-1.2.17.jar': No such file or directory
13:47:45  #14 ERROR: process "/bin/sh -c rm $SPARK_HOME/jars/log4j-1.2.17.jar" did not complete successfully: exit code: 1
13:47:45  ------
13:47:45   > [ 9/28] RUN rm /opt/spark/jars/log4j-1.2.17.jar:
13:47:45  0.296 rm: cannot remove '/opt/spark/jars/log4j-1.2.17.jar': No such file or directory

So,

  1. should i just remove the RUN rm line? or
  2. do an ls first -> check in the output if there actually exists the said jar -> then use rm command?

Any thoughts or pointers on what I might be missing?

2

Answers


  1. Since Spark 3.3, Log4j 1.x is not used (cf. upgrade notes). Therefore you can remove the 4 lines that deal with logging.

    Login or Signup to reply.
  2. FROM ubuntu:20.04
    
    ARG DEBIAN_FRONTEND=noninteractive
    
    ENV SPARK_HOME /opt/spark
    ENV PATH="${SPARK_HOME}/bin:${PATH}"
    ENV PATH="${SPARK_HOME}/python:${PATH}"
    ENV PYSPARK_PYTHON python3
    ENV JAVA_HOME="/usr/lib/jvm/java-8-openjdk-amd64/jre"
    
    RUN apt-get update -q && apt-get install -y -qq wget
    
    RUN wget -q https://dlcdn.apache.org/spark/spark-3.5.0/spark-3.5.0-bin-hadoop3.tgz && 
        tar zxvf spark-3.5.0-bin-hadoop3.tgz && 
        mv spark-3.5.0-bin-hadoop3 $SPARK_HOME && 
        rm spark-3.5.0-bin-hadoop3.tgz
    

    No need to replace log4j because it’s already using a more recent (and secure) version. See screenshot below. The version does not have the CVE-2021-44228 vulnerability.

    enter image description here

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search