I am trying to webscrape the name, price, and description of products listed on an online shop. The website link is https://eshop.nomin.mn/n-foods.html
When I look through the HTML code of the page, I get the relevant div class containers but when I reference it in my code as such, I get no values when I run my spider. I think one reason would be if the website is Javascript based and is dynamic which would require me to use Splash. However, I don’t think this is the case for my issue.
def parse(self, response, **kwargs):
cards = response.xpath('//div[@class="item-itemmainroot-1lZ"]')
# parse details
for card in cards:
price = card.xpath(".//a[contains(@class, 'item-nameLenght-K5Z item-name-3TH')]/span()/text()").extract()
Full Code:
import scrapy
import re
class TempSpider(scrapy.Spider):
name = 'temp_spider'
allowed_domains = ['https://eshop.nomin.mn/']
start_urls = ['https://eshop.nomin.mn/n-foods.html']
def parse(self, response, **kwargs):
cards = response.xpath('//div[@class="item-itemmainroot-1lZ"]')
# parse details
for card in cards:
price = card.xpath(".//a[contains(@class, 'item-nameLenght-K5Z item-name-3TH')]/span()/text()").extract()
item = {'price': price
}
yield item
[1]: https://i.stack.imgur.com/iokmo.png
All and any help is greatly appreciated. I can’t seem to figure out what I am doing wrong.
2
Answers
Use json.
Use the websites data api instead of the website url that you visit in your browser. It will return a json object that has all the information you are looking for.
Partial OUTPUT
You can find the url for the api in the network tab in your browsers devtools…