skip to Main Content

I have a huge dataset where each record have json data similar to below –

{“project”:{“id”:”2625″,”createDate”:1542597000000,”rank”:0,”highlight”:false,”isDisplay”:true,”isNewProject”:true,”propertyId”:2231,”districts”:{“id”:41,”name”:”abc”,”region”:”123″}}}

When I am trying to genrate key value pairs using select kvgen(t.project) from dfs.filePath t in apache drill, I am getting below error –

DrillRuntimeException: Mappify/kvgen does not support heterogeneous value types. All values in the input map must be of the same type. The field [createDate] has a differing type [minor_type: BIGINT mode: OPTIONAL ]

It looks like drill expects all values to be of same type. But how to do that? Is there any function available in drill?

My drill version is 1.9.0

2

Answers


  1. Chosen as BEST ANSWER

    I figured it out. KVGEN method doesn't work if json is nested. To make it work, there are two approaches which can be followed -

    • Take out the nested json outside

    {"project":{"id":"2625","createDate":1542597000000,"rank":0,"highlight":false,"isDisplay":true,"isNewProject":true,"propertyId":2231},"districts":{"id":41,"name":"abc","region":"123"}}

    and then apply KVGEN method as select kvgen(t.project) from dfs.filePath t

    • Apply kvgen method on inner json first and then use nested query as below

      select tbl2.col1.id, tbl2.col2.value from (select tbl1.project as col1, flatten(kvgen(tbl1.project.districts)) col2 from dfs.filePath tbl1) tbl2

    And as rightly mentioned by @arina-yelchiyeva, session option store.json.all_text_mode needs to be set to true.


  2. Try setting session option store.json.all_text_mode to true.

    https://drill.apache.org/docs/json-data-model/

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search