I am working on a project in Databricks using Apache Spark,
I was doing some data manipulation, during which I encountered this error basically stating "DATATYPE_MISMATCH.UNEXPECTED_INPUT_TYPE".
The code snippet is as follows:
player_match_df = player_match_df.withColumn(
"years_since_debut",
(year(current_date()) - (col("season_year")))
)
More details:
Cannot resolve "(year(current_date()) - season_year)" due to data type mismatch: parameter 1 requires "DATE" type, however, "year(current_date())" is of "INT" type.;
Resolve the error without changing much structure.
2
Answers
I tried the first one, and it is giving me the same error message. On the second one, it is not giving an error but not showing any records either, which is weird because, on my other colleague's system, the same code is running fine.
To resolve this error without changing much of the structure, you need to make a small modification. The issue arises because the year(current_date()) function returns an integer (INT), and you’re trying to subtract it directly from the season_year column. Depending on the data type of the season_year column, you can try one of the following solutions:
If the season_year column is already an integer (INT):
player_match_df = player_match_df.withColumn(
"years_since_debut",
year(current_date()) – col("season_year")
)
2)If the season_year column is a date (DATE) or timestamp (TIMESTAMP):
player_match_df = player_match_df.withColumn(
"years_since_debut",
year(current_date()) – year(col("season_year"))
)