I have a huge dataset where each record have json data similar to below –
{“project”:{“id”:”2625″,”createDate”:1542597000000,”rank”:0,”highlight”:false,”isDisplay”:true,”isNewProject”:true,”propertyId”:2231,”districts”:{“id”:41,”name”:”abc”,”region”:”123″}}}
When I am trying to genrate key value pairs using select kvgen(t.project) from dfs.filePath t
in apache drill, I am getting below error –
DrillRuntimeException: Mappify/kvgen does not support heterogeneous value types. All values in the input map must be of the same type. The field [createDate] has a differing type [minor_type: BIGINT mode: OPTIONAL ]
It looks like drill expects all values to be of same type. But how to do that? Is there any function available in drill?
My drill version is 1.9.0
2
Answers
I figured it out. KVGEN method doesn't work if json is nested. To make it work, there are two approaches which can be followed -
and then apply KVGEN method as
select kvgen(t.project) from dfs.filePath t
Apply kvgen method on inner json first and then use nested query as below
select tbl2.col1.id, tbl2.col2.value from (select tbl1.project as col1, flatten(kvgen(tbl1.project.districts)) col2 from dfs.filePath tbl1) tbl2
And as rightly mentioned by @arina-yelchiyeva, session option
store.json.all_text_mode
needs to be set to true.Try setting session option
store.json.all_text_mode
to true.https://drill.apache.org/docs/json-data-model/