skip to Main Content

I am facing an issue where the expected date parition folder should be named in format date=yyyymmdd, but instead writing as –
enter image description here

Sometimes for each parquet file created in delta path, it’s creating a seperate folder.

I don’t see any issues with the source data or pyspark code, since it’s working perfectly for other data sources. Also , the same data is writing perfectly in seperate delta path.

It’s not causing any issues since in delta table date format is captured correctly and can be queried. But if I change the folder names manually in the storage account, then it throws error.

I am expecting data should be written for each date in a specific folder which should be named with the date value –

enter image description here

Since, the pyspark code is creating a date column from timestamp value like this – 2021-10-27T11:56:41.380416Z .I tried to convert the field into timestamp and then extract the date, but it then creates the folder as date=. The existing code was working for this database earlier , but suddenly started behaving this way

2

Answers


  1. Chosen as BEST ANSWER

    Thanks for the response. But it was an issue with delta table version. To remove or rename any delta table column we need to change in read version 2 and write version 5. But that causes the issue I was facing. And this change is irreversible as per databricks.

    For a delta table with default read version 1 and write version 2, the same code works fine and date folders


  2. You try below code using partitionBy function.

    Below is the timestamp i am having.

    enter image description here

    Converted to date and replacing - to empty.

    from pyspark.sql.functions import *
    w_df = df.withColumn("date",regexp_replace(to_date("timestamp","yyyy-MM-dd'T'HH:mm:ss.SSSSSS'Z'"),"-",""))
    

    enter image description here

    Next, while write to delta, you need to give partitionby with this date column.

    enter image description here

    You said you manually changed the folder names, that gives you error since the delta logs will be remembered these partitions and doesn’t match when it is changed manually.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search