I am using selenium within R.
I have the following script which searches Google Maps for all pizza restaurants around a given geographical coordinate – and then keeps scrolling until all restaurants are loaded.
First, I navigate to the starting page:
library(RSelenium)
library(wdman)
library(netstat)
selenium()
seleium_object <- selenium(retcommand = T, check = F)
remote_driver <- rsDriver(browser = "chrome", chromever = "114.0.5735.90", verbose = F, port = free_port())
remDr<- remote_driver$client
lat <- 40.7484
lon <- -73.9857
# Create the URL using the paste function
URL <- paste0("https://www.google.com/maps/search/pizza/@", lat, ",", lon, ",17z/data=!3m1!4b1!4m6!2m5!3m4!2s", lat, ",", lon, "!4m2!1d", lon, "!2d", lat, "?entry=ttu")
# Navigate to the URL
remDr$navigate(URL)
Then, I use the following code to keep scrolling until all entries have been loaded:
# Waits 10 seconds for the elements to load before scrolling
elements <- remDr$findElements(using = "css selector", "div.qjESne")
while (TRUE) {
new_elements <- remDr$findElements(using = "css selector", "div.qjESne")
# Pick the last element in the list - this is the one we want to scroll to
last_element <- elements[[length(elements)]]
# Scroll to the last element
remDr$executeScript("arguments[0].scrollIntoView(true);", list(last_element))
Sys.sleep(10)
# Update the elements list
elements <- new_elements
# Check if there are any new elements loaded - the "You've reached the end of the list." message
if (length(remDr$findElements(using = "css selector", "span.HlvSq")) > 0) {
print("No more elements")
break
}
}
Finally, I use this code to extract the names and addresses of all restaurants:
titles <- c()
addresses <- c()
# Check if there are any new elements loaded - the "You've reached the end of the list." message
if (length(remDr$findElements(using = "css selector", "span.HlvSq")) > 0) {
# now we can parse the data since all the elements loaded
for (data in remDr$findElements(using = "css selector", "div.lI9IFe")) {
title <- data$findElement(using = "css selector", "div.qBF1Pd.fontHeadlineSmall")$getElementText()[[1]]
restaurant <- data$findElement(using = "css selector", ".W4Efsd > span:nth-of-type(2)")$getElementText()[[1]]
titles <- c(titles, title)
addresses <- c(addresses, restaurant)
}
# This converts the list of titles and addresses into a dataframe
df <- data.frame(title = titles, address = addresses)
print(df)
break
}
My Question: Instead of using Sys.sleep()
in R, I am trying to change my code such that only scrolls (i.e. delays the action) once the previous action has been completed. I am noticing that my existing code often freezes half way through and I suspect that this is because I am trying to load a new page when the existing page is not fully loaded. I think it might be better to somehow delay the action and wait for the page to be fully loaded prior to proceeding.
Can someone please show me how I might be able to delay my script and force it to wait for the existing page to load before loading a new page? (e.g. R – Waiting for page to load in RSelenium with PhantomJS)
Thanks!
Note: I am also open to a Python solution.
2
Answers
The remDr$wait function waits until the document.readyState becomes ‘complete’, indicating that the page has finished loading. Once the condition is met, you can proceed with your scrolling and data extraction code.
Using remDr$wait with the condition to wait for the page to be fully loaded is a more reliable approach than using Sys.sleep because it ensures that your script waits until the page is actually ready for interaction.
One way to wait for a page to load in RSelenium is to use the
ExpectedConditions
class from thewebdriver
package. This class provides a set of predefined conditions that can be used to wait for certain events to occur before proceeding with the script. For example, you can use theelementToBeClickable
method to wait for an element to become clickable before clicking on it.Here’s an example of how you can use
ExpectedConditions
in your script:In this example, we first create a
WebDriverWait
object with a timeout of 10 seconds and a polling frequency of 0.5 seconds. This means that the script will check for the condition every 0.5 seconds, and if the condition is not met within 10 seconds, it will throw an error.Next, we use the
until
method of theWebDriverWait
object to wait for the search box element to become clickable. We pass in an instance of theExpectedConditions
class with theelementToBeClickable
method, specifying the CSS selector of the search box element.Once the search box becomes clickable, we can enter a search query and submit the form as usual.