I want to copy a table from one storage account to another with the Copy activity in ADF. I just need a simple copy of my table, preferably 100% identical to the source table.
There is a field in the source table called ExpectedArrivalDate which can change its datatype between rows/entities. On some entities the field is DateTime, on other entities it is String.
The reason for this is that the entities are manipulated from C# code where the datatype of this field is DateTime? (i.e. nullable DateTime).
When I execute the Copy activity with no explicit mapping defined, ADF decides automatically that the datatype of the ExpectedArrivalDate column is DateTimeOffset and this causes all the entities where ExpectedArrivalDate is of type String (example value: ‘DateTime.Null’) to be skipped during the copy operation. Based on the error log these entities are skipped because in the source table, instead of a valid DateTime the string value ‘DateTime.Null’ is found and ADF considers this to be incompatible with the schema.
Example of error log generated:
Level | OperationName | Message |
---|---|---|
Warning | TabularRowSkip | Column ‘ExpectedArrivalDate’ contains an invalid value ‘DateTime.Null’. |
Trying to add explicit datatype mapping
If I try to define explicit mappings in ADF by using the Import schemas button, I see the following mapping generated:
The data type cannot be changed from the UI, so I go to the JSON view of the pipeline and change it there manually to String for both source and sink.
Running the pipeline again, the same rows are still skipped and my changed datatype mapping is ignored.
Workaround that I found
- Go to Table Storage and edit the first entity in the source table, changing the datatype of the ExpectedArrivalDate field from DateTime to String.
- Delete all explicit mappings on the Copy activity in ADF.
- Run the pipeline.
Result: All rows get copied to the sink, the datatype of ExpectedArrivalDate column is converted to String on all entities, even if it was DateTime on some of them in the source table.
Question
Why cannot the ADF Copy activity handle rows where the schema is not the same as the one inferred from the first entity of the table? Is there a way to make this work without manually changing data in the source table?
2
Answers
I contacted Microsoft Support and they told me my use case scenario is not supported by ADF. As a workaround they suggested to break up my copy activity in two pieces.
This way all the data will be copied, no matter if the data type of a column changes between the entities. Unfortunately this results in additional execution time (2 copy activities instead of 1) and as a side effect all the data in all the columns gets converted to String.
The ADF Copy activity uses schema inference to determine the schema of the source data. When the Copy activity encounters a row with a different schema than the inferred schema, it skips the row. This behavior is by design to ensure that the data is correctly copied to the destination.
Since you need the data transformation logic to be implemented to handle this scenario dynamically, I would suggest using Mapping data flow instead of copy activity