skip to Main Content

I am trying to query the Cameo database.

If I use the URL https://cameo.mfa.org/api.php?action=query&pageids=17051&prop=extracts&format=json, then I get, online, a valid output.

However, if I use:

library(httr)
library(jsonlite)

base_url <- "https://cameo.mfa.org/api.php"

query_param <- list(action  = "query",
                    pageids = "17051",
                    format = "json",
                    prop = "extracts"
)

parsed_content <- httr::GET(base_url, query_param)

jsonlite::fromJSON(content(parsed_content, as = "text", encoding = "UTF-8"))

Then jsonlite fails because the output is in html format and not json.

Do you have any advice on this?

2

Answers


  1. A bit different approach:

    library(httr)
    library(jsonlite)
    
    url <- httr::parse_url("https://cameo.mfa.org/api.php")
    url$query <- list(
      action = "query",
      pageids = "17051",
      format = "json",
      prop = "extracts"
    )
    
    json <- jsonlite::fromJSON(httr::build_url(url))
    
    
    json$query$pages
    #> $`17051`
    #> $`17051`$pageid
    #> [1] 17051
    #> 
    #> $`17051`$ns
    #> [1] 0
    #> 
    #> $`17051`$title
    #> [1] "Copper"
    #> 
    #> $`17051`$extract
    #> [1] "<h2><span id="Description">Description</span></h2>n<p>A reddish-brown, ductile, metallic element. Copper is present [...]"
    

    Created on 2023-07-03 with reprex v2.0.2

    Login or Signup to reply.
  2. The second argument to httr::GET is config=, which is not where you should be assigning query_param. Instead name it as query=query_param.

    res <- httr::GET(base_url, query = query_param)
    res
    # Response [https://cameo.mfa.org/api.php?action=query&pageids=17051&format=json&prop=extracts]
    #   Date: 2023-07-03 15:06
    #   Status: 200
    #   Content-Type: application/json; charset=utf-8
    #   Size: 5.22 kB
    str(httr::content(res))
    # List of 3
    #  $ batchcomplete: chr ""
    #  $ warnings     :List of 1
    #   ..$ extracts:List of 1
    #   .. ..$ *: chr "HTML may be malformed and/or unbalanced and may omit inline images. Use at your own risk. Known problems are li"| __truncated__
    #  $ query        :List of 1
    #   ..$ pages:List of 1
    #   .. ..$ 17051:List of 4
    #   .. .. ..$ pageid : int 17051
    #   .. .. ..$ ns     : int 0
    #   .. .. ..$ title  : chr "Copper"
    #   .. .. ..$ extract: chr "<h2><span id="Description">Description</span></h2>n<p>A reddish-brown, ductile, metallic element. Copper is "| __truncated__
    
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search