I am trying to get the lat lon of all stores from here
https://www.wellcome.com.hk/en/our-store
On inspecting, I can see that lat and lon are contained within div
library(dplyr)
library(rvest)
url_company <- rvest::read_html("https://www.wellcome.com.hk/en/our-store")
url_company %>%
html_elements("div") %>% # extracted all the div tag
html_elements("p") # extracted all p tag
How do I reach to the data-lat and data-lng tag?
2
Answers
You may want to check the page source instead of Insepctor. Or first disable JavaScript for the site in your brwser, reload and then inspect. What you see in Insepctor is a DOM-tree that’s modified by JavaScript but in
rvest
you can only work with the actual page source. And the same section in the source looks like this:Though coordinates are actually there along with all the other map data, embedded in one of the
<script>
elements:We can extract the element content with
rvest
and process the resulting string to get lat/lon values. Or be bit smarter about it and apply just minimal processing to get the right side ofvar googleMapData =
assignment that could then be parsed as a JSON withjsonlite
to get a nice data.frame. Or.. if we feel super-lazy, we can throw all that<script>
element content into V8, a JavaScript engine, cross our fingers and get the value ofgoogleMapData
js variable:Created on 2023-11-24 with reprex v2.0.2
A different way, not using V8:
Output:
I assume you want the name, the lat, and the lng for each one, but maybe you just want the lat and lng columns, or maybe the name_zh too, or something else, so in the absence of more definitive guidance I’ll leave it there.