skip to Main Content

Amazon web services – Why does AWS EMR PySpark get stuck when I try to aggregate dataframe

I'm running a Spark application in AWS EMR. The code is like this: with SparkSession.builder.appName(f"Spark App").getOrCreate() as spark: dataframe = spark.read.format('jdbc').options( ... ).load() print("Log A") max_date_result = dataframe.agg(max_(date_format('date', 'yyyy-MM-dd')).alias('max_date')).collect()[0] print("Log B") This application always gets stuck for a long time…

VIEW QUESTION
Back To Top
Search