skip to Main Content

How to extract all "query" data (keywords of coffee) from this link: https://serpapi.com/search.html?engine=google_trends&q=coffee&data_type=RELATED_QUERIES&cat=0&date=now+7-d&api_key=317da75462cab4790705a5cf8b6a9c74c9ba9f279150afb87d4b191f95d8d5de

to be one column data frame in R. I didn’t get the result with rvest.

Regards

library(rvest)

allcom <- read_html("https://serpapi.com/search.html?engine=google_trends&q=coffee&data_type=RELATED_QUERIES&cat=0&date=now+7-d&api_key=317da75462cab4790705a5cf8b6a9c74c9ba9f279150afb87d4b191f95d8d5de")

allcom %>% html_attr("query")

[1] NA

2

Answers


  1. You should use a json request instead of html and use the httr2 and jsonlite packages to easily convert to a dataframe:

    replace your url by "https://serpapi.com/search.json?engine=google_trends&q=coffee&data_type=RELATED_QUERIES&cat=0&date=now+7-d&api_key=317da75462cab4790705a5cf8b6a9c74c9ba9f279150afb87d4b191f95d8d5de&quot; to get a JSON response.

    library(httr2)
    library(jsonlite)
    
    req = request("https://serpapi.com/search.json?engine=google_trends&q=coffee&data_type=RELATED_QUERIES&cat=0&date=now+7-d&api_key=317da75462cab4790705a5cf8b6a9c74c9ba9f279150afb87d4b191f95d8d5de")
    resp = req_perform(req)
    resp2 = resp |> resp_body_string() |> fromJSON()
    
    # if you need all values, bind "rising" and "top" values
    df = rbind(resp2$related_queries$rising, resp2$related_queries$top)
    
    # show only the "coffee" column
    df[1]
    
    > head(df[1])
    #                           query
    #1     the grounds coffee factory
    #2 why can't mormons drink coffee
    #3           ninja coffee machine
    #4            national coffee day
    #5                     the coffee
    #6                    coffee shop
    
    Login or Signup to reply.
  2. I used the rjsoncons CRAN package to retrieve and ‘pivot’ the ‘related_queries.top’ JMESPath to a tibble

    > url = "https://serpapi.com/search.json?engine=google_trends&q=coffee&data_type=RELATED_QUERIES&cat=0&date=now+7-d&api_key=317da75462cab4790705a5cf8b6a9c74c9ba9f279150afb87d4b191f95d8d5de"
    > rjsoncons::j_pivot(url, "related_queries.top", as = "tibble")
    # A tibble: 25 × 5
       query          value extracted_value link                        serpapi_link
       <chr>          <chr>           <int> <chr>                       <chr>       
     1 the coffee     100               100 https://trends.google.com/… https://ser…
     2 coffee shop    90                 90 https://trends.google.com/… https://ser…
     3 coffee near me 66                 66 https://trends.google.com/… https://ser…
     4 coffee machine 57                 57 https://trends.google.com/… https://ser…
     5 coffee table   57                 57 https://trends.google.com/… https://ser…
     6 best coffee    56                 56 https://trends.google.com/… https://ser…
     7 coffee cup     44                 44 https://trends.google.com/… https://ser…
     8 black coffee   43                 43 https://trends.google.com/… https://ser…
     9 coffee maker   40                 40 https://trends.google.com/… https://ser…
    10 coffee shops   39                 39 https://trends.google.com/… https://ser…
    # ℹ 15 more rows
    # ℹ Use `print(n = ...)` to see more rows
    

    Actually to develop this answer I first explored the JSON using the listviewer package

    listviewer::jsonedit(rjsoncons::j_query(url))
    
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search