I am creating web scraper using scrapy python. Here is my code
import scrapy
class BlogSpider(scrapy.Spider):
name = 'blogspider'
start_urls = [
'https://perfumehut.com.pk/shop/',
]
def parse(self, response):
yield {
'product_link': response.css('a.product-image-link::attr("href")').get(),
'product_title': response.css('h3.product-title>a::text').get(),
'product_price': response.css('span.price > span > bdi::text').get(),
}
next_page = response.css('ul.page-numbers>li>a.next.page-numbers::attr("href")').get()
if next_page is not None:
print()
print(next_page)
print()
yield scrapy.Request(next_page)
def parse(self, response):
yield {
'title': response.css('h1::text').get(),
'batt': response.css('td.woocommerce-product-attributes-item__value p::text')[3].get(),
'brand': response.css('div.woodmart-product-brand img::attr(alt)').get(),
'brandimg': response.css('div.woodmart-product-brand img::attr(src)').get(),
'price': response.css('p.price').xpath('./span/bdi/text()').get(),
'r-price': response.css('p.price').xpath('./del/span/bdi/text()').get(),
's-sale': response.css('p.price').xpath('./ins/span/bdi/text()').get(),
'breadcrumbs': response.css('nav.woocommerce-breadcrumb a::text').getall(),
'tags': response.css('span.tagged_as a::text').getall(),
'attributes': response.css('td.woocommerce-product-attributes-item__value p::text').getall(),
'img': response.css('figure.woocommerce-product-gallery__image a::attr("href")').getall(),
'description': response.css('div.woocommerce-product-details__short-description p::text').get(),
'description1': response.css('#tab-description > div > div > p::text').getall(),
'description2': response.css('#tab-description > div > div > div > div > div > div > div > div > p::text').getall()
}
It’s a woocommerce website.
There are total of 57 pages and 12 products per page.
Total of 684 products estimated.
But my code returns nothing.
What I did wrong while scraping the URLs ?
2
Answers
To extract the all page information you need to extract the next page url and then parse the url.
Here is a simple example, I think that help you to sort out the issue.
Okay, this should do it: