Question posted in Json
Our archive of expertly curated questions and answers provides insights and solutions to common problems related to this popular data interchange format. From parsing and manipulating JSON data to integrating it with various programming languages and web services, our archive has got you covered. Start exploring today and take your JSON skills to the next level

Json – Listception… Extracting data from a list within a list within a list

versb
December 29, 2022
214 views
2 votes
3 Answers

Hey stack overflow legends,

I’m trying to extract a dataframe from a nested list within a nested list, and it feels like I’m stuck in some kind of Inception-like dream within a dream. These lists are just nesting inside each other indefinitely, it’s like a never-ending Russian doll situation.

I’m using the following Trademe API call (https://api.trademe.co.nz/v1/Categories/0153-.xml) and the data is structured somewhat like this:

{
  "Category": {
    "Name": "Fashion",
    "Subcategories": [
      {
        "Name": "Women's Clothing",
        "Number": "0153-0154-",
        "Path": "Fashion/Women's Clothing",
        "Subcategories": [
          {
            "Name": "Dresses",
            "Number": "0153-0154-0155-",
            "Path": "Fashion/Women's Clothing/Dresses",
            "Subcategories": []
          },
          {
            "Name": "Tops",
            "Number": "0153-0154-0156-",
            "Path": "Fashion/Women's Clothing/Tops",
            "Subcategories": []
          },
          ...
        ]
      },
      {
        "Name": "Men's Clothing",
        "Number": "0153-0157-",
        "Path": "Fashion/Men's Clothing",
        "Subcategories": [
          {
            "Name": "Shirts",
            "Number": "0153-0157-0158-",
            "Path": "Fashion/Men's Clothing/Shirts",
            "Subcategories": []
          },
          {
            "Name": "Pants",
            "Number": "0153-0157-0159-",
            "Path": "Fashion/Men's Clothing/Pants",
            "Subcategories": []
          },
          ...
        ]
      },
      ...
    ]
  }
}

I want to extract a dataframe with three columns: "Name", "Number", and "Path". The "Name" column should contain the names of all the subcategories, the "Number" column should contain their corresponding numbers, and the "Path" column should contain their full paths. I really only need the data from the lowest level.

I’ve tried using lapply() and sapply(), but I just keep getting lost in the nested lists. Any help would be greatly appreciated!

Thanks in advance,
[Your Name]

Answers

In the provided example we could do:

library(jsonlite)
library(purrr)
library(dplyr)

jsontext |>
  fromJSON() |>
  pluck("Category", "Subcategories", "Subcategories") |>
  bind_rows() |>
  select(-Subcategories)

Output:

     Name          Number                            Path
1 Dresses 0153-0154-0155- Fashion/Womens Clothing/Dresses
2    Tops 0153-0154-0156-    Fashion/Womens Clothing/Tops
3  Shirts 0153-0157-0158-    Fashion/Mens Clothing/Shirts
4   Pants 0153-0157-0159-     Fashion/Mens Clothing/Pants

Update: If the subcategories is nested indefinitely (theoretically) I guess that the easiest would be to unlist it and make use of the name structure instead of doing recursions.

E.g.

jsonunlisted <- 
  jsontext |>
  fromJSON() |>
  pluck("Category", "Subcategories") |>
  unlist() 

cbind(Name = jsonunlisted[grep("Name", names(jsonunlisted))],
      Number = jsonunlisted[grep("Number", names(jsonunlisted))],
      Path = jsonunlisted[grep("Path", names(jsonunlisted))])

Output:

                                  Name              Number                 Path                               
Name1                             "Womens Clothing" "0153-0154-"           "Fashion/Womens Clothing"          
Name2                             "Mens Clothing"   "0153-0157-"           "Fashion/Mens Clothing"            
Subcategories.Name1               "Dresses"         "0153-0154-0155-"      "Fashion/Womens Clothing/Dresses"  
Subcategories.Name2               "Tops"            "0153-0154-0156-"      "Fashion/Womens Clothing/Tops"     
Subcategories.Name1               "Shirts"          "0153-0157-0158-"      "Fashion/Mens Clothing/Shirts"     
Subcategories.Name2               "Pants"           "0153-0157-0159-"      "Fashion/Mens Clothing/Pants"      
Subcategories.Subcategories.Name1 "More Shirts"     "More 0153-0157-0158-" "More Fashion/Mens Clothing/Shirts"
Subcategories.Subcategories.Name2 "More Pants"      "More 0153-0157-0159-" "More Fashion/Mens Clothing/Pants"

Data:

jsontext <- '{
  "Category": {
    "Name": "Fashion",
    "Subcategories": [
      {
        "Name": "Womens Clothing",
        "Number": "0153-0154-",
        "Path": "Fashion/Womens Clothing",
        "Subcategories": [
          {
            "Name": "Dresses",
            "Number": "0153-0154-0155-",
            "Path": "Fashion/Womens Clothing/Dresses",
            "Subcategories": []
          },
          {
            "Name": "Tops",
            "Number": "0153-0154-0156-",
            "Path": "Fashion/Womens Clothing/Tops",
            "Subcategories": []
          }
        ]
      },
      {
        "Name": "Mens Clothing",
        "Number": "0153-0157-",
        "Path": "Fashion/Mens Clothing",
        "Subcategories": [
          {
            "Name": "Shirts",
            "Number": "0153-0157-0158-",
            "Path": "Fashion/Mens Clothing/Shirts",
            "Subcategories": []
          },
          {
            "Name": "Pants",
            "Number": "0153-0157-0159-",
            "Path": "Fashion/Mens Clothing/Pants",
            "Subcategories": [          {
            "Name": "More Shirts",
            "Number": "More 0153-0157-0158-",
            "Path": "More Fashion/Mens Clothing/Shirts",
            "Subcategories": []
          },
          {
            "Name": "More Pants",
            "Number": "More 0153-0157-0159-",
            "Path": "More Fashion/Mens Clothing/Pants",
            "Subcategories": []
          }]
          }
        ]
      }
    ]
  }
}'

- I_O
- December 29, 2022 at 12:17 pm
- 0 votes
0
json_string being your JSON text:
```
library(jsonlite)
fromJSON(json_string, simplifyDataFrame = TRUE)$Category$Subcategories[, -4]
```
output:
```
             Name     Number                    Path
1 Womens Clothing 0153-0154- Fashion/Womens Clothing
2   Mens Clothing 0153-0157-   Fashion/Mens Clothing
```
Login or Signup to reply.

Here is a solution using rrapply() in package rrapply. We prune the Name, Number and Path elements of all nodes containing a Path variable (nb: the root node does not have a Path variable). This avoids the need for regular expressions applied to the unlisted/collapsed list names.

library(jsonlite)
library(rrapply)

jsontext |>
  fromJSON(simplifyVector = FALSE) |>
  rrapply(condition = (x, .xname) .xname == "Path", f = (x, .xsiblings) .xsiblings[c("Name", "Number", "Path")], how = "flatten") |>
  rrapply(how = "bind")

#>              Name               Number                              Path
#> 1 Womens Clothing           0153-0154-           Fashion/Womens Clothing
#> 2         Dresses      0153-0154-0155-   Fashion/Womens Clothing/Dresses
#> 3            Tops      0153-0154-0156-      Fashion/Womens Clothing/Tops
#> 4   Mens Clothing           0153-0157-             Fashion/Mens Clothing
#> 5          Shirts      0153-0157-0158-      Fashion/Mens Clothing/Shirts
#> 6           Pants      0153-0157-0159-       Fashion/Mens Clothing/Pants
#> 7     More Shirts More 0153-0157-0158- More Fashion/Mens Clothing/Shirts
#> 8      More Pants More 0153-0157-0159-  More Fashion/Mens Clothing/Pants

The second rrapply() could also be replaced by e.g. dplyr::bind_rows(), do.call(rbind, ...) or data.table::rbindlist().

Please signup or login to give your own answer.

Click here to cancel reply.