I currently have the following problem: I extracted some data via the crunchbase API, resulting in a big nested list of the following structure (there are many more nested lists on several instances included, I here only display the part of the structure currently relevant for me):
> str(x[[1]])
$ uuid : chr "5f9957b0841251e6e439d757XXXXXX"
$ relationships: List of 27
..$ websites: List of 3
.. ..$ cardinality: chr "OneToMany"
.. ..$ items :'data.frame': 4 obs. of 7 variables:
.. .. ..$ properties.website_type: chr [1:4] "homepage" "facebook" "twitter" "linkedin"
.. .. ..$ properties.url : chr [1:4] "http://www.example.com" "https://www.facebook.com/example" "http://twitter.com/example" "http://www.linkedin.com/company/example"
Consider the following minimal example:
x <- list()
x[[1]] <- list(uuid = "123",
relationships = list(websites = list(items = list(
properties.website_type = c("homepage", "facebook", "twitter", "linkedin"),
properties.url = c("www.example1.com", "www.fbex1.com", "www.twitterex1.com", "www.linkedinex1.com") ) ) ) )
x[[2]] <- list(uuid = "987",
relationships = list(websites = list(items = list(
properties.website_type = c("homepage", "facebook", "twitter" ),
properties.url = c("www.example2.com", "www.fbex2.com", "www.twitterex2.com") ) ) ) )
Now, I would like to create a dataframe with the following column structure:
> x.df
uuid web.url web.facebook web.twitter web.linkedin
1 123 www.example1.com www.fbex1.com www.twitterex1.com www.linkedinex1.com
2 987 www.example2.com www.fbex2.com www.twitterex2.com <NA>
Meaning: I would like to have every uuid (a unique firm identifier) in a single column, followed by the urls of the different platforms (fb, twitter…). I tried a lot of different things with a combination of lapply()
, spread()
, and row_bind()
, yet didn’t manage to make anything work. Any help on that would be appreciated.
2
Answers
dplyr
approach could bewhich gives
Sample data:
Update: In order to fix below error
you would need to remove corrupted elements from input data where
website_type
has one value butproperties.url
hasNULL
. Run this chunk of code as a pre-processing step before executing the main solution:Sample data to test this pre-processing step:
I know this is a clunkier solution, but it helped me seeing the process step by step (running
str (x_df)
to see each result):