I am trying to scrape the first 200 entries from https://www.ssrn.com/index.cfm/en/arn/?page=1&sort=0 (title, authors, url, …). I used rvest so far (which worked fine looping over the first 4 pages until this week), and try now to scrape json directly from https://api.ssrn.com/content/v1/bindings/204/papers. Code works fine (see below), but I don’t know how to get more than the first 50 entries, or even display more than 50 entries (out of 43602). Any solution using jsonlite or rvest?
Any help appreciated! Thanks in advance.
library(jsonlite)
json_file <- "https://api.ssrn.com/content/v1/bindings/204/papers"
data <- fromJSON(json_file)
data <- as.data.frame(data)
2
Answers
If you look at the link, you can alter the out parameters
count
perindex
. The max output is 200 per index, then map over the sequence of index to get all 43602 entries like so (2-3 min scraping time):Keeping papers and authors in 2 separate tables:
Result :
Created on 2023-01-22 with reprex v2.0.2