skip to Main Content

I am new to XPath and trying to capture the values "Time: " and "13:45" from the following HTML snippet. Any help or suggestion will be really useful. Thank you!

<div class="inner-box">
    <p class="inner-info-blk">
        <strong>Time: </strong>
        "13:45"
    </p>
</div>

I can access the label with in the <strong>...</strong> container with the pattern below but cannot figure out how to get the time value with in the <p ...> container.

Label xpath:

//div[@class="inner-box"]/p[@class="inner-info-blk"]/strong

3

Answers


  1. You can use text() to get the text from an element.

    from lxml import etree
    
    html = '''
    <div class="inner-box">
    <p class="inner-info-blk">
        <strong>Time: </strong>
        "13:45"
    </p>
    '''
    
    x = etree.HTML(html)
    result = x.xpath('//div[@class="inner-box"]/p[@class="inner-info-blk"]/text()[2]') # get the text inside p
    print(result[0].strip()) # since LXML return a list, you need to get the first one
    

    And that would get the text from the <p> element.

    UPDATE:
    As @shailesh has mentioned, the Selenium locator would not evaluate XPath expression that returns a text; nor, to the best of my knowledge, there exists such a method in Selenium that will evaluate arbitrary XPath expression. But just to offer an alternative, you may also use a bit of JS here:

    from selenium import webdriver
    from selenium.webdriver.common.by import By
    
    driver = webdriver.Chrome()
    driver.get(
        "file:///C:/Users/yubo/data/social/stackoverflow/6.10%20Selenium/example.html"
    )
    time = driver.find_element(
        By.XPATH,
        './/div[@class="inner-box"]/p[@class="inner-info-blk"]',
    )
    print(driver.execute_script("return arguments[0].lastChild.textContent", time).strip()) # Same as @undetected selenium; a coincidence where we happened to write at the same time.
    driver.quit()
    
    Login or Signup to reply.
  2. You can find out the solution using split method, because Locators do not allow to use text() method with xpath. Time: in your example is a static and unique value which can split to get actual time value what you expect. I would recommend to first deal with xpath, if not found the solution try to resolve by logic. May be this can help you.

    from selenium import webdriver
    from selenium.webdriver.common.by import By
    
    
    driver = webdriver.Firefox()
    driver.get('https://www.yourpage.html')
    time = driver.find_element(By.XPATH,"//p")
    
    print(time.text.split("Time:")[1])
    
    driver.quit()
    

    O/P:
    "13:45"

    This can be also relevant

    from selenium import webdriver
    from selenium.webdriver.common.by import By
    
    driver = webdriver.Firefox()
    driver.get('file:///Users/shakava/Downloads/stackoverflow.html')
    time = driver.find_element(By.XPATH,"//p")
    arr = time.text.split(":")
    
    START = 1
    timeVal = ""
    
    for index, item in enumerate(arr[START:], START):
        if index>1:
            timeVal+=":"
    
        timeVal+=item
        index+1
    
    print(timeVal)
    driver.quit()
    

    O/P:
    "13:45"

    Login or Signup to reply.
  3. Given the HTML:

    <div class="inner-box">
        <p class="inner-info-blk">
            <strong>Time: </strong>
            "13:45"
        </p>
    </div>
    

    The time value i.e. 13:45 is a within a Text Node_ and the lastChild of it’s parent <p>. So to extract the desired text you can use either of the following locator strategies:

    • Using xpath, execute_script() and textContent:

      print(driver.execute_script('return arguments[0].lastChild.textContent;', driver.find_element(By.XPATH, "//div[@class="inner-box"]/p[@class="inner-info-blk"]")).strip())
      
    • Using xpath, get_attribute() and splitlines():

      print(driver.find_element(By.CSS_SELECTOR, "div.inner-box > p.inner-info-blk").get_attribute("innerHTML").splitlines()[2])
      

    Alternative

    As an alternative you can also use Beautiful Soup as follows:

    Code Block:

    from bs4 import BeautifulSoup
    
    html_text = '''
    <div class="inner-box">
        <p class="inner-info-blk">
            <strong>Time: </strong>
            "13:45"
        </p>
    </div>
    '''
    
    soup = BeautifulSoup(html_text, 'html.parser')
    
    last_text = soup.find("p", {"class": "inner-info-blk"}).contents[2]
    print(last_text.strip())
    

    Console Output:

    "13:45"
    

    References

    You can find a couple of relevant detailed discussions in:

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search