R - Nested list to (wide) dataframe - Twitter API

DanielSHain
June 25, 2018
128 views
0 votes
2 Answers

I currently have the following problem: I extracted some data via the crunchbase API, resulting in a big nested list of the following structure (there are many more nested lists on several instances included, I here only display the part of the structure currently relevant for me):

> str(x[[1]])
$ uuid         : chr "5f9957b0841251e6e439d757XXXXXX"
$ relationships: List of 27
..$ websites: List of 3
.. ..$ cardinality: chr "OneToMany"
.. ..$ items      :'data.frame':    4 obs. of  7 variables:
.. .. ..$ properties.website_type: chr [1:4] "homepage" "facebook" "twitter" "linkedin"
.. .. ..$ properties.url         : chr [1:4] "http://www.example.com" "https://www.facebook.com/example" "http://twitter.com/example" "http://www.linkedin.com/company/example"

Consider the following minimal example:

x <- list()
x[[1]] <- list(uuid = "123", 
           relationships = list(websites = list(items =  list(
                                                properties.website_type = c("homepage", "facebook", "twitter", "linkedin"), 
                                                properties.url = c("www.example1.com", "www.fbex1.com", "www.twitterex1.com", "www.linkedinex1.com") ) )  ) )
x[[2]] <- list(uuid = "987", 
           relationships = list(websites = list(items =  list(
             properties.website_type = c("homepage", "facebook", "twitter" ), 
             properties.url = c("www.example2.com", "www.fbex2.com", "www.twitterex2.com") ) )  ) )

Now, I would like to create a dataframe with the following column structure:

> x.df
uuid          web.url  web.facebook        web.twitter        web.linkedin
1  123 www.example1.com www.fbex1.com www.twitterex1.com www.linkedinex1.com
2  987 www.example2.com www.fbex2.com www.twitterex2.com                <NA>

Meaning: I would like to have every uuid (a unique firm identifier) in a single column, followed by the urls of the different platforms (fb, twitter…). I tried a lot of different things with a combination of lapply(), spread(), and row_bind(), yet didn’t manage to make anything work. Any help on that would be appreciated.

Tags: list r

Answers

dplyr approach could be

library(dplyr)
library(tidyr)

#convert list to dataframe in long format
df <- do.call(rbind, lapply(x, data.frame, stringsAsFactors = FALSE))

#final result
df1 <- df %>%
  spread(relationships.websites.items.properties.website_type, relationships.websites.items.properties.url)

which gives

  uuid      facebook         homepage            linkedin            twitter
1  123 www.fbex1.com www.example1.com www.linkedinex1.com www.twitterex1.com
2  987 www.fbex2.com www.example2.com                <NA> www.twitterex2.com

Sample data:

x <- list(structure(list(uuid = "123", relationships = structure(list(
    websites = structure(list(items = structure(list(properties.website_type = c("homepage", 
    "facebook", "twitter", "linkedin"), properties.url = c("www.example1.com", 
    "www.fbex1.com", "www.twitterex1.com", "www.linkedinex1.com"
    )), .Names = c("properties.website_type", "properties.url"
    ))), .Names = "items")), .Names = "websites")), .Names = c("uuid", 
"relationships")), structure(list(uuid = "987", relationships = structure(list(
    websites = structure(list(items = structure(list(properties.website_type = c("homepage", 
    "facebook", "twitter"), properties.url = c("www.example2.com", 
    "www.fbex2.com", "www.twitterex2.com")), .Names = c("properties.website_type", 
    "properties.url"))), .Names = "items")), .Names = "websites")), .Names = c("uuid", 
"relationships")))

Update: In order to fix below error

Error in (function (…, row.names = NULL, check.rows = FALSE,
check.names = TRUE, : arguments imply differing number of rows: 1,
0

you would need to remove corrupted elements from input data where website_type has one value but properties.url has NULL. Run this chunk of code as a pre-processing step before executing the main solution:

idx <- which(sapply(x, function(k) is.null(k$relationships$websites$items$properties.url)))
x <- x[-idx]

Sample data to test this pre-processing step:

x <- list(structure(list(uuid = "123", relationships = structure(list(
    websites = structure(list(items = structure(list(properties.website_type = c("homepage", 
    "facebook", "twitter", "linkedin"), properties.url = c("www.example1.com", 
    "www.fbex1.com", "www.twitterex1.com", "www.linkedinex1.com"
    )), .Names = c("properties.website_type", "properties.url"
    ))), .Names = "items")), .Names = "websites")), .Names = c("uuid", 
"relationships")), structure(list(uuid = "987", relationships = structure(list(
    websites = structure(list(items = structure(list(properties.website_type = "homepage", 
        properties.url = NULL), .Names = c("properties.website_type", 
    "properties.url"))), .Names = "items")), .Names = "websites")), .Names = c("uuid", 
"relationships")), structure(list(uuid = "345", relationships = structure(list(
    websites = structure(list(items = structure(list(properties.website_type = "homepage", 
        properties.url = NULL), .Names = c("properties.website_type", 
    "properties.url"))), .Names = "items")), .Names = "websites")), .Names = c("uuid", 
"relationships")))

I know this is a clunkier solution, but it helped me seeing the process step by step (running str (x_df) to see each result):

library(tidyverse)

# Using your example
x <- list()
x[[1]] <- list(uuid = "123",
                    relationships = list(websites = list(items =  list(
                        properties.website_type = c("homepage", "facebook", "twitter", "linkedin"),
                        properties.url = c("www.example1.com", "www.fbex1.com", "www.twitterex1.com", "www.linkedinex1.com") ) )  ) )
x[[2]] <- list(uuid = "987",
                    relationships = list(websites = list(items =  list(
                        properties.website_type = c("homepage", "facebook", "twitter" ),
                        properties.url = c("www.example2.com", "www.fbex2.com", "www.twitterex2.com") ) )  ) )

 

# --- Iterations of unnest:
x_df <- x %>% tibble::as_tibble_col( .) %>%  
    tidyr::unnest_wider(col = "value")  %>% 
    tidyr::unnest_longer(col = "relationships")   %>%  
    tidyr::unnest_wider(col = "relationships")  %>%  
    tidyr::unnest_wider(col =  "items")  %>%  
    tidyr::unnest_longer(col = c("properties.website_type", "properties.url")) %>% 
# --- Lastly, group by id: 
    group_by(uuid) %>% 
    tidyr::pivot_wider(data = ., 
                             names_from = properties.website_type, 
                             values_from = c("properties.url"))

Please signup or login to give your own answer.

Click here to cancel reply.

R – Nested list to (wide) dataframe – Twitter API

Answers