Javascript - Extract text from Text Node using XPath

Anaras
June 13, 2023
167 views
2 votes
3 Answers

I am new to XPath and trying to capture the values "Time: " and "13:45" from the following HTML snippet. Any help or suggestion will be really useful. Thank you!

<div class="inner-box">
    <p class="inner-info-blk">
        <strong>Time: </strong>
        "13:45"
    </p>
</div>

I can access the label with in the <strong>...</strong> container with the pattern below but cannot figure out how to get the time value with in the <p ...> container.

Label xpath:

//div[@class="inner-box"]/p[@class="inner-info-blk"]/strong

Answers

You can use text() to get the text from an element.

from lxml import etree

html = '''
<div class="inner-box">
<p class="inner-info-blk">
    <strong>Time: </strong>
    "13:45"
</p>
'''

x = etree.HTML(html)
result = x.xpath('//div[@class="inner-box"]/p[@class="inner-info-blk"]/text()[2]') # get the text inside p
print(result[0].strip()) # since LXML return a list, you need to get the first one

And that would get the text from the <p> element.

UPDATE:
As @shailesh has mentioned, the Selenium locator would not evaluate XPath expression that returns a text; nor, to the best of my knowledge, there exists such a method in Selenium that will evaluate arbitrary XPath expression. But just to offer an alternative, you may also use a bit of JS here:

from selenium import webdriver
from selenium.webdriver.common.by import By

driver = webdriver.Chrome()
driver.get(
    "file:///C:/Users/yubo/data/social/stackoverflow/6.10%20Selenium/example.html"
)
time = driver.find_element(
    By.XPATH,
    './/div[@class="inner-box"]/p[@class="inner-info-blk"]',
)
print(driver.execute_script("return arguments[0].lastChild.textContent", time).strip()) # Same as @undetected selenium; a coincidence where we happened to write at the same time.
driver.quit()

- Shailesh
- June 10, 2023 at 9:20 pm
- 0 votes
0
You can find out the solution using split method, because Locators do not allow to use text() method with xpath. Time: in your example is a static and unique value which can split to get actual time value what you expect. I would recommend to first deal with xpath, if not found the solution try to resolve by logic. May be this can help you.
```
from selenium import webdriver
from selenium.webdriver.common.by import By


driver = webdriver.Firefox()
driver.get('https://www.yourpage.html')
time = driver.find_element(By.XPATH,"//p")

print(time.text.split("Time:")[1])

driver.quit()
```
O/P:
"13:45"

This can be also relevant
```
from selenium import webdriver
from selenium.webdriver.common.by import By

driver = webdriver.Firefox()
driver.get('file:///Users/shakava/Downloads/stackoverflow.html')
time = driver.find_element(By.XPATH,"//p")
arr = time.text.split(":")

START = 1
timeVal = ""

for index, item in enumerate(arr[START:], START):
    if index>1:
        timeVal+=":"

    timeVal+=item
    index+1

print(timeVal)
driver.quit()
```
O/P:
"13:45"
Login or Signup to reply.

- undetectedSelenium
- June 11, 2023 at 12:47 am
- 0 votes
0
Given the HTML:
```
<div class="inner-box">
    <p class="inner-info-blk">
        <strong>Time: </strong>
        "13:45"
    </p>
</div>
```
The time value i.e. 13:45 is a within a Text Node_ and the lastChild of it’s parent <p>. So to extract the desired text you can use either of the following locator strategies:
- Using xpath, execute_script() and textContent:
```
print(driver.execute_script('return arguments[0].lastChild.textContent;', driver.find_element(By.XPATH, "//div[@class="inner-box"]/p[@class="inner-info-blk"]")).strip())
```
- Using xpath, get_attribute() and splitlines():
```
print(driver.find_element(By.CSS_SELECTOR, "div.inner-box > p.inner-info-blk").get_attribute("innerHTML").splitlines()[2])
```
Alternative

As an alternative you can also use Beautiful Soup as follows:

Code Block:
```
from bs4 import BeautifulSoup

html_text = '''
<div class="inner-box">
    <p class="inner-info-blk">
        <strong>Time: </strong>
        "13:45"
    </p>
</div>
'''

soup = BeautifulSoup(html_text, 'html.parser')

last_text = soup.find("p", {"class": "inner-info-blk"}).contents[2]
print(last_text.strip())
```
Console Output:
```
"13:45"
```
References

You can find a couple of relevant detailed discussions in:
- How to extract just the number from html?
- Trying to get following element (text) without class tag etc
Login or Signup to reply.

Please signup or login to give your own answer.

Click here to cancel reply.

Javascript – Extract text from Text Node using XPath

Answers

Alternative

References