SparkContext: Error initializing SparkContext while Running Spark Job via google DataProc - Debian

AAA
January 10, 2022
165 views
0 votes
2 Answers

SparkContext: Error initializing SparkContext while Running Spark Job via google DataProc

After I upgraded the google dataproc version from 1.3.62-debian9 to 1.4-debian, all spark data proc jobs started falling with an error:

22/01/09 00:36:50 INFO org.spark_project.jetty.server.Server: Started 3339ms
22/01/09 00:36:50 INFO org.spark_project.jetty.server.AbstractConnector: Started 
22/01/09 00:36:50 WARN org.apache.spark.scheduler.FairSchedulableBuilder: Fair 
Scheduler configuration file not found so jobs will be scheduled in FIFO order. To use 
fair scheduling, configure pools in fairscheduler.xml or set 
spark.scheduler.allocation.file to a file that contains the configuration.

ERROR org.apache.spark.SparkContext: Error initializing SparkContext.

java.lang.NumberFormatException: For input string: "30s"
at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
at java.lang.Long.parseLong(Long.java:589)
at java.lang.Long.parseLong(Long.java:631)
at org.apache.hadoop.conf.Configuration.getLong(Configuration.java:1441)

I don’t set ’30s’ in spark configration file or in sparkConf object.

initialize SparkContext in my code:

val conf = new SparkConf().setAppName(getMainName().toString)

val sc = new SparkContext(conf)

spark version – 2.3.0

I saw that the default setting of ׳spark.scheduler.maxRegisteredResourcesWaitingTime׳ has the same value
(https://spark.apache.org/docs/latest/configuration.html#spark-configuration) But I did not change or update it.

I do not understand where this value comes from and why it is related to updating the dataproc.

Answers

Chosen as BEST ANSWER
- AAA
- January 11, 2022 at 9:16 am
- 0 votes
0
it's related to 'apache hadoop' - they adding a time unit to hdfs-default.xml.

more info about the issue - https://issues.apache.org/jira/browse/HDFS-12920 Apache Tez Job fails due to java.lang.NumberFormatException for input string: "30s"

(Edit)

- IgorDvorzhak
- January 11, 2022 at 7:54 pm
- 0 votes
0
1.4 image is approaching EOL too, may you try 1.5 instead?

That said, the problem probably is in your app that probably brings some old Hadoop/Spark jars and/or configs that break Spark, because when you SSH in 1.4 cluster main node and execute your code in spark-shell it works:
```
val conf = new SparkConf().setAppName("test-app-name")
val sc = new SparkContext(conf)
```
Login or Signup to reply.

Please signup or login to give your own answer.

Click here to cancel reply.

SparkContext: Error initializing SparkContext while Running Spark Job via google DataProc – Debian

Answers