skip to Main Content

I converted some nested json files (used by i18n translation system) to csv to allow non-IT colleagues to review the language without handling directly with the json files.

Therefore, then I have to convert those csv back to json. I lost my mind trying to do this, using a lot of conditions to check the depth of nesting for each row and which key changed from previous row. And then modifying a list or a list inside a list, or a list inside a list inside a list… I ended up with a messy spagetti code that does not fully work yet. I will eventually success on this task (or become mad).

However, I was wondering if there is a more straightforward method, maybe using another library or using some R’s magic that I don’t know.

Below, there is a reprex with a toy dataframe (real ones has more rows but the same depth) and the expected output.

library(tibble)
library(jsonlite)

df <- tribble(
  ~ a, ~ b, ~ c, ~ d, ~ value,
  "a", "c", NA, NA,  "q",
  "a", "d", NA, NA, "r",
  "a", "e", NA, NA, "s",
  "a", "f", "i", NA, "t",
  "a", "f", "j", NA, "u",
  "b", "g", NA, NA, "v",
  "b", "h", "k", "m", "x",
  "b", "h", "k", "n", "y",
  "b", "h", "k", "o", "z",
  "b", "h", "k", "p", "aa",
  "b", "h", "l", NA,  "ab"
)

df
#> # A tibble: 11 × 5
#>    a     b     c     d     value
#>    <chr> <chr> <chr> <chr> <chr>
#>  1 a     c     <NA>  <NA>  q    
#>  2 a     d     <NA>  <NA>  r    
#>  3 a     e     <NA>  <NA>  s    
#>  4 a     f     i     <NA>  t    
#>  5 a     f     j     <NA>  u    
#>  6 b     g     <NA>  <NA>  v    
#>  7 b     h     k     m     x    
#>  8 b     h     k     n     y    
#>  9 b     h     k     o     z    
#> 10 b     h     k     p     aa   
#> 11 b     h     l     <NA>  ab

expected <- list(
  a = list(
    c = "q", 
    d = "r",
    e = "s",
    f = list(
      i = "t",
      j = "u"
    )
  ),
  b = list(
    g = "v",
    h = list(
      k = list(
        m = "x",
        n = "y",
        o = "z",
        p = "aa"
      ),
      l = "ab"
    )
    
  )
)

toJSON(expected, auto_unbox = T, pretty = T)
#> {
#>   "a": {
#>     "c": "q",
#>     "d": "r",
#>     "e": "s",
#>     "f": {
#>       "i": "t",
#>       "j": "u"
#>     }
#>   },
#>   "b": {
#>     "g": "v",
#>     "h": {
#>       "k": {
#>         "m": "x",
#>         "n": "y",
#>         "o": "z",
#>         "p": "aa"
#>       },
#>       "l": "ab"
#>     }
#>   }
#> }

Created on 2023-09-14 with reprex v2.0.2

2

Answers


  1. You could do some recursive branching. For example

    first_split <- function(df, val_col="value") {
      cval <- df[,1]
      nval <- df[,2]
      terminal <- is.na(nval) | names(df[2])==val_col
      out <- list()
      if (any(terminal)) {
        out <- append(out, setNames(as.list(df[[val_col]][terminal]), cval[terminal]))
      }
      if (any(!terminal)) {
        out <- append(out, split(df[!terminal,-1], df[!terminal,1]) |>
          lapply(FUN=first_split))
      }
      out
    }
    

    You basically look at the first two columns. The first columns are the names and if the next is NA (or you run out) then you create a node, otherwise you split the remaining values into their own lists.

    Using this on the test data we get

    jsonlite::toJSON(first_split(df), auto_unbox=TRUE, pretty=TRUE)
    {
      "a": {
        "c": "q",
        "d": "r",
        "e": "s",
        "f": {
          "i": "t",
          "j": "u"
        }
      },
      "b": {
        "g": "v",
        "h": {
          "l": "ab",
          "k": {
            "m": "x",
            "n": "y",
            "o": "z",
            "p": "aa"
          }
        }
      }
    } 
    

    which seems to match the desired output.

    This function makes the strong assumption that the "value" column will always be the last column and that all columns before it are to be interpreted as names.

    Login or Signup to reply.
  2. Using an external package, rrapply() (in package rrapply) has an option how = "unmelt" to do exactly this, i.e. transforming a melted data.frame into a nested list:

    json <- rrapply::rrapply(df, how = "unmelt") |>
      jsonlite::toJSON(auto_unbox = TRUE, pretty = TRUE)
    
    json
    #> {
    #>   "a": {
    #>     "c": "q",
    #>     "d": "r",
    #>     "e": "s",
    #>     "f": {
    #>       "i": "t",
    #>       "j": "u"
    #>     }
    #>   },
    #>   "b": {
    #>     "g": "v",
    #>     "h": {
    #>       "k": {
    #>         "m": "x",
    #>         "n": "y",
    #>         "o": "z",
    #>         "p": "aa"
    #>       },
    #>       "l": "ab"
    #>     }
    #>   }
    #> }
    

    Note that the other way around, we can also easily get the json file into the melted data.frame format:

    jsonlite::fromJSON(json, simplifyVector = FALSE) |>
      rrapply::rrapply(how = "melt")
    #>    L1 L2   L3   L4 value
    #> 1   a  c <NA> <NA>     q
    #> 2   a  d <NA> <NA>     r
    #> 3   a  e <NA> <NA>     s
    #> 4   a  f    i <NA>     t
    #> 5   a  f    j <NA>     u
    #> 6   b  g <NA> <NA>     v
    #> 7   b  h    k    m     x
    #> 8   b  h    k    n     y
    #> 9   b  h    k    o     z
    #> 10  b  h    k    p    aa
    #> 11  b  h    l <NA>    ab
    
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search