skip to Main Content

I am trying to scrape a text from a line using Python. I was able to get the class attribute from the same line but just not the text, tried .text and .get_text(), and neither of them works.

What am I missing?

Here is my Python script to get the text from the line:

from selenium import webdriver
from webdriver_manager.chrome import ChromeDriverManager
import time
import datetime
import csv
    
    class toy(object):
    
        browser = webdriver.Chrome(ChromeDriverManager().install())
    
        browser.get('https://continuumgames.com/product/16-tracer-racer-set/')
        time.sleep(2)
    
        try:
            test = browser.find_element_by_xpath('//*[@id="tab-additional_information"]/table/tbody/tr[3]/td').get_attribute('class')
    
        except:
            test = 'NA'
    
        try:
            upcode = browser.find_element_by_xpath('//*[@id="tab-additional_information"]/table/tbody/tr[3]/td').text
    
        except:
            upcode = 'NA'
    
    
        print(test)
        print(upcode)
    
    
        browser.close()

Here is the page’s HTML:

<div class="woocommerce-Tabs-panel woocommerce-Tabs-panel--additional_information panel entry-content wc-tab" id="tab-additional_information" role="tabpanel" aria-labelledby="tab-title-additional_information" style="display: none;">
 
    <table class="woocommerce-product-attributes shop_attributes">
        <tbody>
            <tr class="woocommerce-product-attributes-item woocommerce-product-attributes-item--weight">
               <th class="woocommerce-product-attributes-item__label">Weight</th>
               <td class="woocommerce-product-attributes-item__value">2.5 oz</td>
            </tr>
            <tr class="woocommerce-product-attributes-item woocommerce-product-attributes-item--dimensions">
               <th class="woocommerce-product-attributes-item__label">Dimensions</th>
               <td class="woocommerce-product-attributes-item__value">24 × 4 × 2 in</td>
            </tr>
            <tr class="woocommerce-product-attributes-item woocommerce-product-attributes-item--attribute_product_upc">
               <th class="woocommerce-product-attributes-item__label">UPC</th>
               <td class="woocommerce-product-attributes-item__value">605444972168</td>
            </tr>
        </tbody>
     </table>
</div>

Here is my run:

C:UsersCarrescrape>python test.py

[WDM] - Current google-chrome version is 83.0.4103
[WDM] - Get LATEST driver version for 83.0.4103
[WDM] - Driver [C:UsersCarre.wdmdriverschromedriverwin3283.0.4103.39chromedriver.exe] found in cache

DevTools listening on ws://127.0.0.1:56807/devtools/browser/03318f43-1d26-44c7-8d90-65233969f03b
woocommerce-product-attributes-item__value

2

Answers


  1. Your selector is probably off. Try using Xpath. Right-click on the tag and then select copy Xpath. Then replace your code with this.

    upcode = browser.find_element_by_xpath('paste XPath here').text
    
    Login or Signup to reply.
  2. I have your solution, this is my usual roundabout way when dealing with inconsistencies on selenium: switch to beautifulsoup4

    from selenium import webdriver
    import bs4
    from webdriver_manager.chrome import ChromeDriverManager
    import time
    import datetime
    import csv
    
    
    
    class toy(object):
    
        browser = webdriver.Chrome(ChromeDriverManager().install())
    
        browser.get('https://continuumgames.com/product/16-tracer-racer-set/')
        time.sleep(2)
    
        try:
            test = browser.find_element_by_xpath('//*[@id="tab-additional_information"]/table/tbody/tr[3]/td').get_attribute('class')
    
        except:
            test = 'NA'
    
        try:
            upcode = browser.find_element_by_xpath('//*[@id="tab-additional_information"]/table/tbody/tr[3]/td')
            upcode = bs4.BeautifulSoup(upcode.get_attribute('outerHTML'))
            upcode = upcode.text
    
        except:
            upcode = 'NA'
    
    
        print(test)
        print(upcode)
    
    
        browser.close()
    
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search