I have several tables the Dataflow reads from JSON-Files. Some of the String-Columns therein are actually Epoch Timestamps, but, since there quite a few tables and I don’t want to make one dataflow per table, I can’t manually tell the Source what column-type it’s supposed to use.
I’ve tried using a DerivedColumn-Modifier, but there you have to filter which columns to modify before looking at the content of the column.
Setting a default format for Time in the Source failed as well, though admittedly setting it the Time to ‘mmmmmmmmmmmmm’ was a bit of a longshot.
Now it seems like I’m stuck between the unpalatable options of hardcoding the exact name of every column that’s supposed to be a Date/Time into the DerivedColumn and maybe adapting the target Database to try and convert that in a later step.
I’d very much like to avoid either scenario.
Is there way to achieve this that I forgot or, more likely, just didn’t know about?
2
Answers
Solved the problem by taking a step back and first querying the the DB, into which the JSON-data was supposed to go, to see which columns had time-based datatypes, converting the return into an String-Array variable and passing the thing to the dataflow as a parameter. With that Array, the DerivedColumn could just match the column-names it was supposed to convert.
The only minor hiccup was, when one column-name existed twice across my schema while only one of those had a time-type. It required manually excluding the table-column combination from the condition of my DerivedColumn and now it works well enough.
You will be able to find the schema in the projection tab of the source, as shown below:
If you want to convert an epoch timestamp string column to a Date column, you can use
(toTimestamp(seconds(toInteger(<epochcolumnName>))))
expression in the derived column transformation. You will be able to get the date column as shown below: