skip to Main Content

I’m wondering if it’s possible to just return "Glenvale" from the below HTML with Selenium:

Link to image

I’ve tried using the Xpath, but that doesn’t seem to work.

suburb = driver.find_element(By.XPATH, '//*[@id="__next"]/div/div[2]/div/div[4]/div/div/div[8]/div/div/p/text()[6]').text

below is the website:
https://www.domain.com.au/2-76-shelby-street-glenvale-qld-4350-2014406153

2

Answers


  1. According to the HTML you’ve shared, the content that you want to filter is actually the text content of the p tag. So, you wont be able to get the output by relying on locators only. You can however, use regex to your advantage. Simply put, get the textContent of the <p> tag and then extract out the detail you want to zero in on. Sharing an example of the same approach.

    import re
    
    # Initialize the driver and perform the steps
    
    suburb = driver.find_element(By.XPATH, //*[@data-testid="listing-details__domain-says-text"]).get_attribute('textContent')
    
    pattern = r'.*in (.*) have.*'
    
    # Search for the pattern in the text
    match = re.search(pattern, text)
    
    if match:
        print(match.group(1)) # returns the location
    else:
        print("No match found")
    
    

    This script will capture the textContent of the p tag which is 37 other 3 bedroom unit in Glenvale have recently been sold. There are currently 7 properties for sale in Glenvale. and using the regex will capture and return 1 location which in this case is "Glenvale".

    NOTE: The regex can be modified as per your need. If you need to capture the stats of properties and bedroom unit sold. Just update the regex to contain the required capture groups.

    Login or Signup to reply.
  2. Algorithm:

    1. Capture the entire text from <p> node
    2. Split it by whitespace as an array
    3. Capture the last index of an array which will be Glenvale.

    Code:

    from selenium import webdriver
    from selenium.webdriver.common.by import By
    from selenium.webdriver.support.ui import WebDriverWait
    from selenium.webdriver.support import expected_conditions as EC
    
    driver = webdriver.Chrome()
    driver.get("https://www.domain.com.au/2-76-shelby-street-glenvale-qld-4350-2014406153")
    driver.maximize_window()
    wait = WebDriverWait(driver,10)
    element = wait.until(EC.visibility_of_element_located((By.XPATH, "//p[@data-testid='listing-details__domain-says-text']"))).text
    print("Full text is :" + element)
    print("Required text is :" + element.split(" ")[len(element.split(" "))-1])
    

    Result:

    Full text is :37 other 3 bedroom unit in Glenvale have recently been sold. There are currently 7 properties for sale in Glenvale.
    Required text is :Glenvale.
    
    Process finished with exit code 0
    

    Be aware: This solution will capture the last word in the sentence of <p> node. If the last word is not Glenvale. then it will capture whatever the last word is.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search