Question posted in Json
Our archive of expertly curated questions and answers provides insights and solutions to common problems related to this popular data interchange format. From parsing and manipulating JSON data to integrating it with various programming languages and web services, our archive has got you covered. Start exploring today and take your JSON skills to the next level

Convert nested list to dataframe when reading json file

JerryN
January 12, 2023
214 views
0 votes
2 Answers

I am trying to import data from two sensors (many more actually but this is just a test case). The final product should be a data frame or table that has 4 columns – two for each sensor. One is the time of the data collection, the other is the return data. I can’t include the url that accesses the data because it includes keys for access but the start looks like this https://api.ecowitt.net/api/v3/device/history
The following code runs fine

response <- httr::GET(url_complete_noDates, query = list(start_date = "2022-07-01 00:00:00", end_date = "2022-07-01 02:00:00"))

jsonRespText <- content(response, as =  "text") 
j <- fromJSON(jsonRespText)

But I can’t figure out how to parse this into a data frame. This gets close

j_data <- j$data
df_json1 <- as_tibble(j_data, validate = F) 
temp_unnest <- df_json1 %>% unnest(cols = names(df_json1))

Here is the dput output for j. The time value is the time of the data download in seconds from 1970-01-01-00:00 MDT. Each element like this – 1656655200 = "38" is the time/data point that should go in a row of the the data frame.

list(code = 0L, msg = "success", time = "1673535777", data = list(
    soil_ch1 = list(soilmoisture = list(unit = "%", list = list(
        `1656655200` = "38", `1656657000` = "38", `1656658800` = "38", 
        `1656660600` = "38", `1656662400` = "38"))), soil_ch2 = list(
        soilmoisture = list(unit = "%", list = list(`1656655200` = "48", 
            `1656657000` = "48", `1656658800` = "48", `1656660600` = "48", 
            `1656662400` = "48")))))

Answers

You can transform the nested list into a data.frame, reshape it to the name-value format with pivot_longer, and separate the name column into multiple columns with the separators "\.".

library(tidyr)

as.data.frame(j$data) %>%
  pivot_longer(everything()) %>%
  separate(name, paste0("col", 1:4), sep = '\.', fill = 'right')

# # A tibble: 12 × 5
#    col1     col2         col3  col4       value
#    <chr>    <chr>        <chr> <chr>      <chr>
#  1 soil_ch1 soilmoisture unit  NA         %    
#  2 soil_ch1 soilmoisture list  1656655200 38   
#  3 soil_ch1 soilmoisture list  1656657000 38   
#  4 soil_ch1 soilmoisture list  1656658800 38   
#  5 soil_ch1 soilmoisture list  1656660600 38   
#  6 soil_ch1 soilmoisture list  1656662400 38   
#  7 soil_ch2 soilmoisture unit  NA         %    
#  8 soil_ch2 soilmoisture list  1656655200 48   
#  9 soil_ch2 soilmoisture list  1656657000 48   
# 10 soil_ch2 soilmoisture list  1656658800 48   
# 11 soil_ch2 soilmoisture list  1656660600 48   
# 12 soil_ch2 soilmoisture list  1656662400 48

Here is another approach, melting the nested list into a long data.frame with reshape2::melt():

reshape2::melt(j_data)
#>         value         L5   L4           L3       L2   L1
#> 1           0       <NA> <NA>         <NA>     <NA> code
#> 2     success       <NA> <NA>         <NA>     <NA>  msg
#> 3  1673535777       <NA> <NA>         <NA>     <NA> time
#> 4           %       <NA> unit soilmoisture soil_ch1 data
#> 5          38 1656655200 list soilmoisture soil_ch1 data
#> 6          38 1656657000 list soilmoisture soil_ch1 data
#> 7          38 1656658800 list soilmoisture soil_ch1 data
#> 8          38 1656660600 list soilmoisture soil_ch1 data
#> 9          38 1656662400 list soilmoisture soil_ch1 data
#> 10          %       <NA> unit soilmoisture soil_ch2 data
#> 11         48 1656655200 list soilmoisture soil_ch2 data
#> 12         48 1656657000 list soilmoisture soil_ch2 data
#> 13         48 1656658800 list soilmoisture soil_ch2 data
#> 14         48 1656660600 list soilmoisture soil_ch2 data
#> 15         48 1656662400 list soilmoisture soil_ch2 data

Or to have more control in melting the nested list, we could use rrapply() in package rrapply:

rrapply::rrapply(
  j_data, 
  condition = (x, .xname) !.xname %in% c("code", "msg", "time", "unit"),
  f = as.numeric,
  how = "melt"
)
#>      L1       L2           L3   L4         L5 value
#> 1  data soil_ch1 soilmoisture list 1656655200    38
#> 2  data soil_ch1 soilmoisture list 1656657000    38
#> 3  data soil_ch1 soilmoisture list 1656658800    38
#> 4  data soil_ch1 soilmoisture list 1656660600    38
#> 5  data soil_ch1 soilmoisture list 1656662400    38
#> 6  data soil_ch2 soilmoisture list 1656655200    48
#> 7  data soil_ch2 soilmoisture list 1656657000    48
#> 8  data soil_ch2 soilmoisture list 1656658800    48
#> 9  data soil_ch2 soilmoisture list 1656660600    48
#> 10 data soil_ch2 soilmoisture list 1656662400    48

Instead, we can also unnest to a wide data.frame if that is more appropriate:

rrapply::rrapply(
  j_data, 
  how = "bind", 
  options = list(coldepth = 5, namecols = TRUE)
)
#>     L1       L2           L3   L4 1656655200 1656657000 1656658800 1656660600 1656662400
#> 1 data soil_ch1 soilmoisture list         38         38         38         38         38
#> 2 data soil_ch2 soilmoisture list         48         48         48         48         48

Please signup or login to give your own answer.

Click here to cancel reply.