skip to Main Content

I am working on a project in Databricks using Apache Spark,
I was doing some data manipulation, during which I encountered this error basically stating "DATATYPE_MISMATCH.UNEXPECTED_INPUT_TYPE".
The code snippet is as follows:

player_match_df = player_match_df.withColumn(
    "years_since_debut",
    (year(current_date()) - (col("season_year")))
)

More details:

Cannot resolve "(year(current_date()) - season_year)" due to data type mismatch: parameter 1 requires "DATE" type, however, "year(current_date())" is of "INT" type.;

Resolve the error without changing much structure.

2

Answers


  1. Chosen as BEST ANSWER

    I tried the first one, and it is giving me the same error message. On the second one, it is not giving an error but not showing any records either, which is weird because, on my other colleague's system, the same code is running fine.


  2. To resolve this error without changing much of the structure, you need to make a small modification. The issue arises because the year(current_date()) function returns an integer (INT), and you’re trying to subtract it directly from the season_year column. Depending on the data type of the season_year column, you can try one of the following solutions:

    1. If the season_year column is already an integer (INT):

       from pyspark.sql.functions import year, current_date, col
      

      player_match_df = player_match_df.withColumn(
      "years_since_debut",
      year(current_date()) – col("season_year")
      )

    2)If the season_year column is a date (DATE) or timestamp (TIMESTAMP):

    from pyspark.sql.functions import year, current_date, col
    

    player_match_df = player_match_df.withColumn(
    "years_since_debut",
    year(current_date()) – year(col("season_year"))
    )

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search