skip to Main Content

I am currently building a Python script that takes the price of trainers from the Nike website and pushes the price into a CSV file. Originally, the code took the element where the price data lives but after it failed, I switched to using CSS selectors, since Nike’s prices for its products are wrapped in CSS. But when I ran the script, it couldn’t extract the price data. I was stumped, so I modified the script to account for elements that are dynamically loaded in Javascript after the initial page load. But after all of that, I still can’t get it to push the price and the product into the CSV file. I deciced to modifiy the code to extract the code using xPath. However all of that I still couldn’t extract the price data and push it through the CSV file.


from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
import time
import numpy as np
import pandas as pd
# Get the website using the Chrome web driver
browser = webdriver.Chrome()

# Create an empty DataFrame to store the results
df = pd.DataFrame(columns=["Product", "Price"])

# Define a list of websites to scrape
websites = [
    'https://www.nike.com/gb/t/dunk-low-retro-shoe-Kd1wZr/DD1391-103',
    'https://www.nike.com/gb/t/dunk-low-retro-shoe-QgD9Gv/DD1391-100',
    'https://www.nike.com/gb/t/dunk-low-retro-shoes-p6gmkm/DV0833-400'
]

# Loop through the websites
for website in websites:
    browser.get(website)
    try:
        # Attempt to find the price element by its xPath
        price = WebDriverWait(browser, 20).until(EC.presence_of_element_located((By.XPATH, '//*[@id="PDP"]/div[2]/div/div[4]/div[1]/div/div[2]/div/div/div/div/div')))
        # If found, extract the text and add to the DataFrame
        df = df.append({"Product": website, "Price": price.text}, ignore_index=True)
        print("Price for", website, ":", price.text)
    except:
        # If the element is not found, print an error message
        print("Price not found for", website)

# Close the browser
browser.quit()

# Save data frame data into an Excel CSV file
df.to_csv(r'PriceList.csv', index=False)

The code is suppose to take the element where the price data lives and push into a CSV file
A visual aid of what the code shoudl do

2

Answers


  1. df = df.append({"Product": website, "Price": price.text}, ignore_index=True)
    

    append was removed from pandas.
    Instead you can use _append.

    df = df._append({"Product": website, "Price": price.text}, ignore_index=True)
    

    For some reason, text doesn’t seem to work (on my computer).
    Instead, you can use the get_attribute("innerHTML")

    Here is the final code.

    from selenium import webdriver
    from selenium.webdriver.common.by import By
    from selenium.webdriver.support.ui import WebDriverWait
    from selenium.webdriver.support import expected_conditions as EC
    from selenium.webdriver.chrome.service import Service
    from chromedriver_py import binary_path
    import time
    import numpy as np
    import pandas as pd
    
    svc = Service(executable_path=binary_path)
    
    # Get the website using the Chrome web driver
    browser = webdriver.Chrome(service=svc)
    
    # Create an empty DataFrame to store the results
    df = pd.DataFrame(columns=["Product", "Price"])
    
    # Define a list of websites to scrape
    websites = [
        'https://www.nike.com/gb/t/dunk-low-retro-shoe-Kd1wZr/DD1391-103',
        'https://www.nike.com/gb/t/dunk-low-retro-shoe-QgD9Gv/DD1391-100',
        'https://www.nike.com/gb/t/dunk-low-retro-shoes-p6gmkm/DV0833-400'
    ]
    
    # Loop through the websites
    for website in websites:
        browser.get(website)
        try:
            # Attempt to find the price element by its xPath
            price = WebDriverWait(browser, 20).until(EC.presence_of_element_located((By.XPATH, '//*[@id="PDP"]/div[2]/div/div[4]/div[1]/div/div[2]/div/div/div/div/div')))
    
            # If found, extract the text and add to the DataFrame
            df = df._append({"Product": website, "Price": price.get_attribute('innerHTML')}, ignore_index=True)
            print("Price for", website, ":", price.get_attribute('innerHTML'))
        except:
            # If the element is not found, print an error message
            print("Price not found for", website)
    
    # Close the browser
    browser.quit()
    
    # Save data frame data into an Excel CSV file
    df.to_csv(r'PriceList.csv', index=False)
    
    Login or Signup to reply.
  2. This answer is only ment to get a better XPath. I don’t know if that is the cause of your problem.

    I would use a @id-attribute that is closer to your target. Which is this one:

    @id='pdp_product_title'

    But it seems like there are somehow two html-blocks with the same h1/@id, which is not valid…but to overcome this you could use this explicit XPath:

    (//div[h1[@id='pdp_product_title']])[2]/div[1]/div[1]/div[1]/div[1]
    

    An simpler XPath version would be this one:

    (//div[h1[@id='pdp_product_title']])[2]//div[data-test='product-price'][1]
    

    The [1] predicats are just to tell the XPath engine not to look any further than the first one. Could be a little performance-advantage.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search