skip to Main Content

I have a html below:

<span class="ui-cfs-sn-l" xpath="1">
    ABC
    <span class="ui-cfs-txt">°⌃</span>
</span>

I used the following python code to extract the text which returns ABC°⌃

element = driver.find_element(by=By.XPATH, value="//span[@class='ui-cfs-sn-l']")
result = element.text

Is there a way to extract just the text before the inner span? The solution should return ABC

2

Answers


  1. You could use Pythons BeautifulSoup4 library to achieve this.

    Beautiful Soup is a library that makes it easy to scrape information from web pages. It sits atop an HTML or XML parser, providing Pythonic idioms for iterating, searching, and modifying the parse tree.

    Here is an example:

    from bs4 import BeautifulSoup
    
    html_content = '<span class="ui-cfs-sn-l" xpath="1">ABC<span class="ui-cfs-txt">°⌃</span></span>'
    soup = BeautifulSoup(html_content, features='html.parser')
    
    desired_content = soup.findAll('span', {'class': 'ui-cfs-sn-l'})[0].contents[0]
    print(desired_content)  # outputs 'ABC'
    

    Edit:

    You can use BeautifulSoup with Selenium as described in this answer

    Assuming you are on the page you want to parse, Selenium stores the source HTML in the driver’s page_source attribute. You would then load the page_source into BeautifulSoup as follows:

    from bs4 import BeautifulSoup
    from selenium import webdriver
    
    driver = webdriver.Firefox()
    driver.get('http://yoursite.com')
    html_content = driver.page_source
    
    soup = BeautifulSoup(html_content, features='html.parser')
    desired_content = soup.findAll('span', {'class': 'ui-cfs-sn-l'})[0].contents[0]
    print(desired_content)
    
    Login or Signup to reply.
  2. You can extract the text like this:

    element = driver.find_element(by=By.XPATH, value="//span[@class='ui-cfs-sn-l']")
    result = driver.execute_script("return document.getElementsByClassName('ui-cfs-sn-l')[0].childNodes[0].nodeValue")
    

    or:

    element = driver.find_element(by=By.XPATH, value="//span[@class='ui-cfs-sn-l']")
    result = element.get_attribute('innerHTML').split('<')[0].strip()  # make sure no '<' is in your target text
    

    or:

    element = driver.find_element(by=By.XPATH, value="//span[@class='ui-cfs-sn-l']")
    result = element.get_attribute('textContent').strip().split('n')[0] # make sure no 'n' is in your target text
    
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search