skip to Main Content

I am trying to get the lat lon of all stores from here
https://www.wellcome.com.hk/en/our-store

On inspecting, I can see that lat and lon are contained within div
enter image description here

library(dplyr)
library(rvest)

url_company <- rvest::read_html("https://www.wellcome.com.hk/en/our-store") 
url_company %>%
 html_elements("div") %>% # extracted all the div tag
 html_elements("p") # extracted all p tag

How do I reach to the data-lat and data-lng tag?

2

Answers


  1. You may want to check the page source instead of Insepctor. Or first disable JavaScript for the site in your brwser, reload and then inspect. What you see in Insepctor is a DOM-tree that’s modified by JavaScript but in rvest you can only work with the actual page source. And the same section in the source looks like this:

    <div class="wellcome_map_shop">
    <div class="wellcome_map_filter js-filter"><select aria-label="Location" name="location"></select> <select aria-label="District" name="district"></select></div>
    
    <div class="wellcome_map_loc"><span class="js-loc_on" style="display:none">Location on</span> <span class="js-loc_off">Location permission is denied</span></div>
    
    <div class="wellcome_map_shop_item js-shop_template js-shop_item" style="display:none">
    <div class="wellcome_map_shop_detail">
    

    Though coordinates are actually there along with all the other map data, embedded in one of the <script> elements:

    <script type="text/javascript">
    <!--//--><![CDATA[// ><!--
    
    var googleMap        = null;
    ...
    var googleMapData = [
    {"name":"Ching Tin","addr":"Shop No. G6, G/F Ching Tin Shopping Centre, Ching Tin Estate, Tuen Mun, N.T.","name_zh":"菁田","addr_zh":"屯門菁田邨菁田購物中心地下G6室","tel":"2317 6863","time":"08:00-22:00","time_zh":"08:00-22:00","region":"32","district":"24","lat":22.4123694,"lng":113.9714609},
    {"name":"Lei King Wan","addr":"Shop GC19-21. Site C. Lei King Wan, 35 Tai Hong Street, Sai Wan Ho, Hong Kong","name_zh":"鯉景灣","addr_zh":"香港西灣河太康街35號鯉景灣C 期GC19-21號舖","tel":"2815 6029","time":"07:30-22:00","time_zh":"07:30-22:00","region":"30","district":"161","lat":22.2851255,"lng":114.2233381},
    ...
    

    We can extract the element content with rvest and process the resulting string to get lat/lon values. Or be bit smarter about it and apply just minimal processing to get the right side of var googleMapData = assignment that could then be parsed as a JSON with jsonlite to get a nice data.frame. Or.. if we feel super-lazy, we can throw all that <script> element content into V8, a JavaScript engine, cross our fingers and get the value of googleMapData js variable:

    library(rvest)
    library(dplyr)
    library(V8)
    #> Using V8 engine 11.8.172.13
    
    ctx <- v8()
    
    # load page, use xpath to extract correct script element, 
    # the one conatining text "googleMapData", get text and evaluate as JavaScript
    read_html("https://www.wellcome.com.hk/en/our-store") %>% 
      html_elements(xpath =  "//script[contains(text(),'googleMapData')]") %>% 
      html_text() %>% 
      ctx$eval() 
    
    # we only care about `var googleMapData = [...]` assignment, rest of the script
    # might as well fail; 
    # extract googleMapData value from v8
    ctx$get("googleMapData") %>% 
      as_tibble() %>% 
      # fromat lat/lon columns
      mutate(across(where(is.numeric), ~ tibble::num(.x, digits = 2))) %>% 
      select(name, addr, lat, lng)
    #> # A tibble: 279 × 4
    #>    name             addr                                              lat    lng
    #>    <chr>            <chr>                                           <num> <num:>
    #>  1 Ching Tin        Shop No. G6, G/F Ching Tin Shopping Centre, Ch… 22.41 113.97
    #>  2 Lei King Wan     Shop GC19-21. Site C. Lei King Wan, 35 Tai Hon… 22.29 114.22
    #>  3 Garden Estate    Shop No. 15-18, G/F Lotus Tower 3, 297 Kwun To… 22.32 114.22
    #>  4 Tsuen Wan        57-61, Lo Tak Court, G/F, Tsuen Wan, NT         22.37 114.12
    #>  5 Pak Tin Estate   Shop LG201, Lower Ground Level 2, Pak Tin Comm… 22.34 114.17
    #>  6 Tak Bo Garden    Shop 138, G/F, TBG Mall, Tak Bo Garden, No. 3 … 22.33 114.21
    #>  7 Dor Hei Building Shop No.2-3, G/F, Dor Hei Building, Nos.9-17 T… 22.32 114.22
    #>  8 Chevalier House  Shop C and Portion of Shop D on Ground Floor, … 22.30 114.18
    #>  9 Shan King 2      Stall No. T-SK73, G/F, Shan King Shopping Cent… 22.40 113.97
    #> 10 Shek Mun         Shop No. 28, G/F, 1 On Ping Street, Shatin, NT  22.39 114.21
    #> # ℹ 269 more rows
    

    Created on 2023-11-24 with reprex v2.0.2

    Login or Signup to reply.
  2. A different way, not using V8:

    
    url %>%
      read_html %>%
      html_nodes(css = ".content > div:nth-child(1) > script:nth-child(3)") %>% 
      html_text %>%
      str_split("googleMapData = |;nnnvar googleMapLocation =") %>%
      {.[[1]][2]} %>%
      fromJSON 
    

    Output:

       name    addr  name_zh addr_zh tel   time  time_zh region district   lat   lng
       <chr>   <chr> <chr>   <chr>   <chr> <chr> <chr>   <chr>  <chr>    <dbl> <dbl>
     1 Ching … Shop… 菁田    屯門菁… 2317… 08:0… 08:00-… 32     24        22.4  114.
     2 Lei Ki… Shop… 鯉景灣  香港西… 2815… 07:3… 07:30-… 30     161       22.3  114.
     3 Garden… Shop… 花園大… 官塘牛… 2372… 08:0… 08:00 … 31     96        22.3  114.
     4 Tsuen … 57-6… 荃灣    荃灣路… 2411… 07:0… 07:00-… 32     23        22.4  114.
     5 Pak Ti… Shop… 白田邨  九龍白… 2335… 08:0… 08:00-… 31     166       22.3  114.
     6 Tak Bo… Shop… 得寶花… 九龍牛… 2382… 08:0… 08:00-… 31     35        22.3  114.
     7 Dor He… Shop… 多喜大… 九龍牛… 2628… 09:0… 09:00-… 31     216       22.3  114.
     8 Cheval… Shop… 其士大… 九龍尖… 2713… 08:0… 08:00-… 31     15        22.3  114.
     9 Shan K… Stal… 山景2   屯門鳴… 2653… 07:3… 07:30-… 32     24        22.4  114.
    10 Shek M… Shop… 石門    沙田安… 2854… 08:0… 08:00-… 32     21        22.4  114.
    # ℹ 269 more rows
    

    I assume you want the name, the lat, and the lng for each one, but maybe you just want the lat and lng columns, or maybe the name_zh too, or something else, so in the absence of more definitive guidance I’ll leave it there.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search