I have two string
formatted columns in my table. One of the columns has json objects in it. The keys in the JSON objects are not fixed – so the problem is I cant use standard json extract functions. Here is how the table looks
timestamp | upstream_tables |
---|---|
2023-02-02T17:34:55.946Z | {"ETL_table_1":true,"ETL_table_2":true} |
2023-02-02T13:30:11.882Z | {"ETL_table_3":true} |
2023-02-02T20:12:22.116Z | {"ETL_table_4":true,"ETL_table_5":true,"ETL_table_6":false} |
I want to flatten this table to something like below
timestamp | upstream_tablename | was_completed |
---|---|---|
2023-02-02T17:34:55.946Z | ETL_table_1 | TRUE |
2023-02-02T17:34:55.946Z | ETL_table_2 | TRUE |
2023-02-02T13:30:11.882Z | ETL_table_3 | TRUE |
2023-02-02T20:12:22.116Z | ETL_table_4 | TRUE |
2023-02-02T20:12:22.116Z | ETL_table_5 | TRUE |
2023-02-02T20:12:22.116Z | ETL_table_6 | FALSE |
Can anyone please help? Have spent a lot of time using map_values and map_keys but couldnt get it right.
The only closest thing I could come up with was this
select
timestamp,
t.*
FROM mytable
CROSS JOIN UNNEST(map_entries(CAST(json_extract(upstream_tables, '$') AS MAP(VARCHAR, VARCHAR)))) AS t
2
Answers
You can do this with a combination of
UNNEST
andjson_query
.First, use
json_query
to convert all the field-value pairs to a normalized formwith the shape
{"name": "xxx", value: yyy}
. Then, convert these to an array ofrows and unnest them into individual rows.
@martin-traverso‘s answer can be used with Athena engine v. 3 which is based on Trino, for both v.2 and v.3 the main trick of casting to
map
you have discovered, I would switch from usingjson_extract
tojson_parse
(to transform from string to json), skipmap_entries
(Presto/Trino can unnest maps to key-value pairs, optionally useMAP(VARCHAR, JSON)
as target type) and specify column names for unnest result, Presto/Trino can unnest maps to key-value pairs: