How to extract all "query" value from SerpApi auto-generated html file

kangBaca
September 10, 2024
228 views
0 votes
2 Answers

How to extract all "query" data (keywords of coffee) from this link: https://serpapi.com/search.html?engine=google_trends&q=coffee&data_type=RELATED_QUERIES&cat=0&date=now+7-d&api_key=317da75462cab4790705a5cf8b6a9c74c9ba9f279150afb87d4b191f95d8d5de

to be one column data frame in R. I didn’t get the result with rvest.

Regards

library(rvest)

allcom <- read_html("https://serpapi.com/search.html?engine=google_trends&q=coffee&data_type=RELATED_QUERIES&cat=0&date=now+7-d&api_key=317da75462cab4790705a5cf8b6a9c74c9ba9f279150afb87d4b191f95d8d5de")

allcom %>% html_attr("query")

[1] NA

Tags: extract html r rvest serpapi

Answers

You should use a json request instead of html and use the httr2 and jsonlite packages to easily convert to a dataframe:

replace your url by "https://serpapi.com/search.json?engine=google_trends&q=coffee&data_type=RELATED_QUERIES&cat=0&date=now+7-d&api_key=317da75462cab4790705a5cf8b6a9c74c9ba9f279150afb87d4b191f95d8d5de" to get a JSON response.

library(httr2)
library(jsonlite)

req = request("https://serpapi.com/search.json?engine=google_trends&q=coffee&data_type=RELATED_QUERIES&cat=0&date=now+7-d&api_key=317da75462cab4790705a5cf8b6a9c74c9ba9f279150afb87d4b191f95d8d5de")
resp = req_perform(req)
resp2 = resp |> resp_body_string() |> fromJSON()

# if you need all values, bind "rising" and "top" values
df = rbind(resp2$related_queries$rising, resp2$related_queries$top)

# show only the "coffee" column
df[1]

> head(df[1])
#                           query
#1     the grounds coffee factory
#2 why can't mormons drink coffee
#3           ninja coffee machine
#4            national coffee day
#5                     the coffee
#6                    coffee shop

I used the rjsoncons CRAN package to retrieve and ‘pivot’ the ‘related_queries.top’ JMESPath to a tibble

> url = "https://serpapi.com/search.json?engine=google_trends&q=coffee&data_type=RELATED_QUERIES&cat=0&date=now+7-d&api_key=317da75462cab4790705a5cf8b6a9c74c9ba9f279150afb87d4b191f95d8d5de"
> rjsoncons::j_pivot(url, "related_queries.top", as = "tibble")
# A tibble: 25 × 5
   query          value extracted_value link                        serpapi_link
   <chr>          <chr>           <int> <chr>                       <chr>       
 1 the coffee     100               100 https://trends.google.com/… https://ser…
 2 coffee shop    90                 90 https://trends.google.com/… https://ser…
 3 coffee near me 66                 66 https://trends.google.com/… https://ser…
 4 coffee machine 57                 57 https://trends.google.com/… https://ser…
 5 coffee table   57                 57 https://trends.google.com/… https://ser…
 6 best coffee    56                 56 https://trends.google.com/… https://ser…
 7 coffee cup     44                 44 https://trends.google.com/… https://ser…
 8 black coffee   43                 43 https://trends.google.com/… https://ser…
 9 coffee maker   40                 40 https://trends.google.com/… https://ser…
10 coffee shops   39                 39 https://trends.google.com/… https://ser…
# ℹ 15 more rows
# ℹ Use `print(n = ...)` to see more rows

Actually to develop this answer I first explored the JSON using the listviewer package

listviewer::jsonedit(rjsoncons::j_query(url))

Please signup or login to give your own answer.

Click here to cancel reply.