skip to Main Content

In my hive table T the field A is a string in json format and A stores a value of {"c_e_i":"{"e_c_f":1}"}.

I want to get c_e_i."e_c_f"

So I used get_json_object(T.A, '$.c_e_i.\"e_c_f\"') but it doesn’t work. what should I do?

I want to get c_e_i."e_c_f"

2

Answers


  1. Simply because "{"e_c_f":1}" value is a string, not JSON map. The whole thing is not of type Map<String:Map<String:String>>, it is Map<String:String>

    Map<String:Map<String:String>> should look like this {"c_e_i":{"e_c_f":1}}

    And you can extract the string value like this:

    with T as (
    select '{"c_e_i":"{\"e_c_f\":1}"}' as A 
    )
    
    select A as original_str, 
    get_json_object(A, '$.c_e_i') as result
    from T
    

    Result:

    original_str                  result    
    
    {"c_e_i":"{"e_c_f":1}"}     {"e_c_f":1}
    

    You see, the result is a correct JSON Map – slashes removed as well because they are interpreted during extract and converted (removed in this case), so you can extract e_c_f from it (apply get_json_object twice):

    with T as (
    select '{"c_e_i":"{\"e_c_f\":1}"}' as A 
    )
    
    select A as original_str, 
    get_json_object(get_json_object(A, '$.c_e_i'),'$.e_c_f') as result
    from T
    

    Result:

    original_str                result  
    {"c_e_i":"{"e_c_f":1}"}   1
        
    

    Or remove everything extra from original JSON to make it correct JSON Map<String:Map<String:String>> then extract:

    with T as (
    select '{"c_e_i":"{\"e_c_f\":1}"}' as A 
    )
    
    select A as original_str, 
    regexp_replace(regexp_replace(regexp_replace(A, '\\"','"'), '"\{','\{'),'"\}','\}') as correct_JSON, --remove slashes and remove doublequotes before { and after }  
    get_json_object(regexp_replace(regexp_replace(regexp_replace(A, '\\"','"'), '"\{','\{'),'"\}','\}'), '$.c_e_i.e_c_f') as result
    

    Result:

    original_str                    correct_json           result   
    {"c_e_i":"{"e_c_f":1}"}      {"c_e_i":{"e_c_f":1}}    1
    
    Login or Signup to reply.
  2. In Hive, when using get_json_object to extract values from a JSON string, you need to escape the backslashes and double quotes in the JSON path. In your case, to retrieve the value of c_e_i."e_c_f", you can modify your query as follows:

    SELECT get_json_object(T.A, '$.c_e_i.\"e_c_f\"') AS result
    FROM T;
    

    By escaping the backslashes and double quotes (\"), you ensure that the JSON path is correctly interpreted by Hive. This modified query should return the value of c_e_i."e_c_f" from the JSON string stored in field A of your Hive table.

    Make sure that the table and field names are correct in your actual query, and adjust them accordingly if needed.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search