So I made this script to automatically download the picture of a species from the general info box on Wikipedia. I have this data frame containing all the (latin) names of the species where I then want to automatically download the Wikipedia species picture and put them on a map.
Wikipedia link example:
https://en.wikipedia.org/wiki/Eurasian_eagle-owl
However, my script downloads the low-quality version of the picture. How can I modify it so that it downloads the original file with the best quality?
Dataframe example:
> bird_names
[1] "Prunella modularis" "Myiopsitta monachus"
[3] "Pyrrhura perlata" "Tyto alba"
[5] "Panurus biarmicus" "Merops apiaster"
Script:
# Function to download and save an image from Wikipedia
download_wikipedia_image <- function(bird_name) {
# Construct the Wikipedia URL for the bird species
wikipedia_url <- paste0("https://en.wikipedia.org/wiki/", gsub(" ", "_", bird_name))
# Read the HTML content of the Wikipedia page
page <- read_html(wikipedia_url)
# Extract all image URLs from the page
image_urls <- page %>%
html_nodes("table.infobox img") %>%
html_attr("src")
# Download and save the first image (if available)
if (length(image_urls) > 0) {
download.file(paste0("https:", image_urls[1]), paste0("BIRDPHOTO/", gsub(" ", "_", bird_name), ".jpg"))
cat("Downloaded photo for", bird_name, "n")
} else {
cat("No photo found for", bird_name, "n")
}
}
# Create BIRDPHOTO directory if it doesn't exist
dir.create("BIRDPHOTO", showWarnings = FALSE)
# Loop through each bird name and download the corresponding image
for (bird_name in bird_names) {
download_wikipedia_image(bird_name)
}
# Optional: Print a message when all downloads are complete
cat("All downloads completed.n")
2
Answers
That’s because you have to follow the low quality photo to wiki page (i.e. https://en.wikipedia.org/wiki/File:Baardman_-_Panurus_biarmicus_(15147085070).jpg) and search for
Original file
link, like:Created on 2023-12-11 with reprex v2.0.2
Use this: