Html - Return the one part of the text under a button - Selenium Python

ClayBurnett
July 9, 2024
85 views
1 vote
2 Answers

I’m wondering if it’s possible to just return "Glenvale" from the below HTML with Selenium:

I’ve tried using the Xpath, but that doesn’t seem to work.

suburb = driver.find_element(By.XPATH, '//*[@id="__next"]/div/div[2]/div/div[4]/div/div/div[8]/div/div/p/text()[6]').text

below is the website:
https://www.domain.com.au/2-76-shelby-street-glenvale-qld-4350-2014406153

Answers

- Techrookie89
- July 9, 2024 at 11:39 am
- 0 votes
0
According to the HTML you’ve shared, the content that you want to filter is actually the text content of the p tag. So, you wont be able to get the output by relying on locators only. You can however, use regex to your advantage. Simply put, get the textContent of the <p> tag and then extract out the detail you want to zero in on. Sharing an example of the same approach.
```
import re

# Initialize the driver and perform the steps

suburb = driver.find_element(By.XPATH, //*[@data-testid="listing-details__domain-says-text"]).get_attribute('textContent')

pattern = r'.*in (.*) have.*'

# Search for the pattern in the text
match = re.search(pattern, text)

if match:
    print(match.group(1)) # returns the location
else:
    print("No match found")
```
This script will capture the textContent of the p tag which is 37 other 3 bedroom unit in Glenvale have recently been sold. There are currently 7 properties for sale in Glenvale. and using the regex will capture and return 1 location which in this case is "Glenvale".

NOTE: The regex can be modified as per your need. If you need to capture the stats of properties and bedroom unit sold. Just update the regex to contain the required capture groups.
Login or Signup to reply.

Algorithm:

Capture the entire text from <p> node
Split it by whitespace as an array
Capture the last index of an array which will be Glenvale.

Code:

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

driver = webdriver.Chrome()
driver.get("https://www.domain.com.au/2-76-shelby-street-glenvale-qld-4350-2014406153")
driver.maximize_window()
wait = WebDriverWait(driver,10)
element = wait.until(EC.visibility_of_element_located((By.XPATH, "//p[@data-testid='listing-details__domain-says-text']"))).text
print("Full text is :" + element)
print("Required text is :" + element.split(" ")[len(element.split(" "))-1])

Result:

Full text is :37 other 3 bedroom unit in Glenvale have recently been sold. There are currently 7 properties for sale in Glenvale.
Required text is :Glenvale.

Process finished with exit code 0

Be aware: This solution will capture the last word in the sentence of <p> node. If the last word is not Glenvale. then it will capture whatever the last word is.

Please signup or login to give your own answer.

Click here to cancel reply.

Html – Return the one part of the text under a button – Selenium Python

Answers