skip to Main Content

SparkContext: Error initializing SparkContext while Running Spark Job via google DataProc

After I upgraded the google dataproc version from 1.3.62-debian9 to 1.4-debian, all spark data proc jobs started falling with an error:

22/01/09 00:36:50 INFO org.spark_project.jetty.server.Server: Started 3339ms
22/01/09 00:36:50 INFO org.spark_project.jetty.server.AbstractConnector: Started 
22/01/09 00:36:50 WARN org.apache.spark.scheduler.FairSchedulableBuilder: Fair 
Scheduler configuration file not found so jobs will be scheduled in FIFO order. To use 
fair scheduling, configure pools in fairscheduler.xml or set 
spark.scheduler.allocation.file to a file that contains the configuration.

ERROR org.apache.spark.SparkContext: Error initializing SparkContext.

java.lang.NumberFormatException: For input string: "30s"
at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
at java.lang.Long.parseLong(Long.java:589)
at java.lang.Long.parseLong(Long.java:631)
at org.apache.hadoop.conf.Configuration.getLong(Configuration.java:1441)

I don’t set ’30s’ in spark configration file or in sparkConf object.

initialize SparkContext in my code:

val conf = new SparkConf().setAppName(getMainName().toString)

val sc = new SparkContext(conf)

spark version – 2.3.0

I saw that the default setting of ׳spark.scheduler.maxRegisteredResourcesWaitingTime׳ has the same value
(https://spark.apache.org/docs/latest/configuration.html#spark-configuration) But I did not change or update it.

I do not understand where this value comes from and why it is related to updating the dataproc.

2

Answers


  1. Chosen as BEST ANSWER

    it's related to 'apache hadoop' - they adding a time unit to hdfs-default.xml.

    more info about the issue - https://issues.apache.org/jira/browse/HDFS-12920 Apache Tez Job fails due to java.lang.NumberFormatException for input string: "30s"


  2. 1.4 image is approaching EOL too, may you try 1.5 instead?

    That said, the problem probably is in your app that probably brings some old Hadoop/Spark jars and/or configs that break Spark, because when you SSH in 1.4 cluster main node and execute your code in spark-shell it works:

    val conf = new SparkConf().setAppName("test-app-name")
    val sc = new SparkContext(conf)
    
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search