Question posted in Json
Our archive of expertly curated questions and answers provides insights and solutions to common problems related to this popular data interchange format. From parsing and manipulating JSON data to integrating it with various programming languages and web services, our archive has got you covered. Start exploring today and take your JSON skills to the next level

Convert R data frame to json with different depths

JosepPueyo
September 18, 2023
182 views
1 vote
2 Answers

I converted some nested json files (used by i18n translation system) to csv to allow non-IT colleagues to review the language without handling directly with the json files.

Therefore, then I have to convert those csv back to json. I lost my mind trying to do this, using a lot of conditions to check the depth of nesting for each row and which key changed from previous row. And then modifying a list or a list inside a list, or a list inside a list inside a list… I ended up with a messy spagetti code that does not fully work yet. I will eventually success on this task (or become mad).

However, I was wondering if there is a more straightforward method, maybe using another library or using some R’s magic that I don’t know.

Below, there is a reprex with a toy dataframe (real ones has more rows but the same depth) and the expected output.

library(tibble)
library(jsonlite)

df <- tribble(
  ~ a, ~ b, ~ c, ~ d, ~ value,
  "a", "c", NA, NA,  "q",
  "a", "d", NA, NA, "r",
  "a", "e", NA, NA, "s",
  "a", "f", "i", NA, "t",
  "a", "f", "j", NA, "u",
  "b", "g", NA, NA, "v",
  "b", "h", "k", "m", "x",
  "b", "h", "k", "n", "y",
  "b", "h", "k", "o", "z",
  "b", "h", "k", "p", "aa",
  "b", "h", "l", NA,  "ab"
)

df
#> # A tibble: 11 × 5
#>    a     b     c     d     value
#>    <chr> <chr> <chr> <chr> <chr>
#>  1 a     c     <NA>  <NA>  q    
#>  2 a     d     <NA>  <NA>  r    
#>  3 a     e     <NA>  <NA>  s    
#>  4 a     f     i     <NA>  t    
#>  5 a     f     j     <NA>  u    
#>  6 b     g     <NA>  <NA>  v    
#>  7 b     h     k     m     x    
#>  8 b     h     k     n     y    
#>  9 b     h     k     o     z    
#> 10 b     h     k     p     aa   
#> 11 b     h     l     <NA>  ab

expected <- list(
  a = list(
    c = "q", 
    d = "r",
    e = "s",
    f = list(
      i = "t",
      j = "u"
    )
  ),
  b = list(
    g = "v",
    h = list(
      k = list(
        m = "x",
        n = "y",
        o = "z",
        p = "aa"
      ),
      l = "ab"
    )
    
  )
)

toJSON(expected, auto_unbox = T, pretty = T)
#> {
#>   "a": {
#>     "c": "q",
#>     "d": "r",
#>     "e": "s",
#>     "f": {
#>       "i": "t",
#>       "j": "u"
#>     }
#>   },
#>   "b": {
#>     "g": "v",
#>     "h": {
#>       "k": {
#>         "m": "x",
#>         "n": "y",
#>         "o": "z",
#>         "p": "aa"
#>       },
#>       "l": "ab"
#>     }
#>   }
#> }

^{Created on 2023-09-14 with reprex v2.0.2}

Tags: json jsonlite r tidyverse

Answers

- MrFlick
- September 14, 2023 at 10:03 pm
- 0 votes
0
You could do some recursive branching. For example
```
first_split <- function(df, val_col="value") {
  cval <- df[,1]
  nval <- df[,2]
  terminal <- is.na(nval) | names(df[2])==val_col
  out <- list()
  if (any(terminal)) {
    out <- append(out, setNames(as.list(df[[val_col]][terminal]), cval[terminal]))
  }
  if (any(!terminal)) {
    out <- append(out, split(df[!terminal,-1], df[!terminal,1]) |>
      lapply(FUN=first_split))
  }
  out
}
```
You basically look at the first two columns. The first columns are the names and if the next is NA (or you run out) then you create a node, otherwise you split the remaining values into their own lists.

Using this on the test data we get
```
jsonlite::toJSON(first_split(df), auto_unbox=TRUE, pretty=TRUE)
{
  "a": {
    "c": "q",
    "d": "r",
    "e": "s",
    "f": {
      "i": "t",
      "j": "u"
    }
  },
  "b": {
    "g": "v",
    "h": {
      "l": "ab",
      "k": {
        "m": "x",
        "n": "y",
        "o": "z",
        "p": "aa"
      }
    }
  }
} 
```
which seems to match the desired output.

This function makes the strong assumption that the "value" column will always be the last column and that all columns before it are to be interpreted as names.
Login or Signup to reply.

Using an external package, rrapply() (in package rrapply) has an option how = "unmelt" to do exactly this, i.e. transforming a melted data.frame into a nested list:

json <- rrapply::rrapply(df, how = "unmelt") |>
  jsonlite::toJSON(auto_unbox = TRUE, pretty = TRUE)

json
#> {
#>   "a": {
#>     "c": "q",
#>     "d": "r",
#>     "e": "s",
#>     "f": {
#>       "i": "t",
#>       "j": "u"
#>     }
#>   },
#>   "b": {
#>     "g": "v",
#>     "h": {
#>       "k": {
#>         "m": "x",
#>         "n": "y",
#>         "o": "z",
#>         "p": "aa"
#>       },
#>       "l": "ab"
#>     }
#>   }
#> }

Note that the other way around, we can also easily get the json file into the melted data.frame format:

jsonlite::fromJSON(json, simplifyVector = FALSE) |>
  rrapply::rrapply(how = "melt")
#>    L1 L2   L3   L4 value
#> 1   a  c <NA> <NA>     q
#> 2   a  d <NA> <NA>     r
#> 3   a  e <NA> <NA>     s
#> 4   a  f    i <NA>     t
#> 5   a  f    j <NA>     u
#> 6   b  g <NA> <NA>     v
#> 7   b  h    k    m     x
#> 8   b  h    k    n     y
#> 9   b  h    k    o     z
#> 10  b  h    k    p    aa
#> 11  b  h    l <NA>    ab

Please signup or login to give your own answer.

Click here to cancel reply.