I extracted the JSON from the following page:
library(jsonlite)
results <- fromJSON("https://www.reddit.com/r/gardening/comments/1196opl/tree_surgeon_butchered_my_tree_will_it_be_ok/.json")
final = results$data
When I inspect the output, I can see that even though that the output is in a "list" format, there appears to be a "tabular data frame" structure within the output:
t3, NA, gardening, , FALSE, NA, 0, FALSE, Tree surgeon butchered my tree - will it be ok?, r/gardening, FALSE, 6, NA, 0, 140, NA, all_ads, FALSE, t3_1196op
My Question: Based on the above – is it possible to somehow convert this output into a data frame?
I tried the following code:
dataframe_list = as.data.frame(final)
The code ran – but the output is still not in a tabular/data frame output.
In the end, I would like to have the result in the following format:
comment_id comment_text
1 1 I like gardening!
2 2 I dont like to garden!
3 3 its too cold outside?
4 4 try planting something different?
5 5 garden is fun!
Can someone please show me how to do this?
Thanks!
Note: If you look at the actual website https://www.reddit.com/r/gardening/comments/1196opl/tree_surgeon_butchered_my_tree_will_it_be_ok/.json – the desired text appears to be between the tags "body:" and "edited" :
Maybe I am approaching this problem the wrong way and there is a better way of doing this?
2
Answers
Here is one approach using
pluck()
,bind_rows()
andunnest()
:Output:
For parsing JSON from Reddit you may want to check RedditExtractoR package,
get_thread_content()
returns list of 2 data.frames, one for thread and another for comments:Created on 2023-02-23 with reprex v2.0.2