A response.json()
returns a JSON in the following format:
{
"workbooks": [
{
"name": "WORKBOOK_A",
"embeddedDatasources": [
{
"upstreamTables": [
{"name": "WORKBOOK_A_TABLE_A"}]},
{
"upstreamTables": [
{"name": "WORKBOOK_A_TABLE_B"},
{"name": "WORKBOOK_A_TABLE_C"}]},
{
"upstreamTables": []}]},
{
"name": "WORKBOOK_B",
"embeddedDatasources": [
{
"upstreamTables": [
{"name": "WORKBOOK_B_TABLE_A"},
{"name": "WORKBOOK_B_TABLE_B"}]},
{
"upstreamTables": [
{"name": "WORKBOOK_B_TABLE_C"},
{"name": "WORKBOOK_B_TABLE_D"}]}]}]}
I am trying to convert it to a dataframe like this:
workbooks | upstreamTables |
---|---|
WORKBOOK_A | WORKBOOK_A_TABLE_A |
WORKBOOK_A | WORKBOOK_A_TABLE_B |
WORKBOOK_A | WORKBOOK_A_TABLE_C |
WORKBOOK_B | WORKBOOK_B_TABLE_A |
WORKBOOK_B | WORKBOOK_B_TABLE_B |
WORKBOOK_B | WORKBOOK_B_TABLE_C |
WORKBOOK_B | WORKBOOK_B_TABLE_D |
"upstreamTables": []
should be ignored in this case.
Playing around with json_normalize
df = pd.json_normalize(json_data)
didn’t play out so far and extracting the data as separate dataframes and rejoining them seems too convulsive.
3
Answers
Here’s one approach:
resp
(i.e.,response.json()
) topd.json_normalize
with bothrecord_path
andmeta
. Addmeta_prefix
to avoid aValueError: Conflicting metadata
. Cf. this post. Otherwise we would end up with 2name
columns.df.rename
to rename the columns and re-order them.Output:
to keep it compact I used list comprehensions, I hope it is readable.
To convert the JSON response into the desired DataFrame, you can iterate through the JSON structure
This would give the result in the desired table structure. But make sure to always pass the correct JSON or use any online tool like JSON Reader or any tool