I’m trying to scrape a website using scrapy.
When I scrape a specific page, pagination scraping works but when I try to scrape all the pages with one jump pagination does not work.
I tried creating an extra function for the pagination but this does not fix the problem. All help would be appreciated. What am I doing wrong ? Here’s my code:
# -*- coding: utf-8 -*-
import scrapy
from scrapy.loader.processors import MapCompose, Join
from scrapy.loader import ItemLoader
from scrapy.http import Request
from avtogumi.items import AvtogumiItem
class BasicSpider(scrapy.Spider):
name = 'gumi'
allowed_domains = ['avtogumi.bg']
start_urls = ['https://bg.avtogumi.bg/oscommerce/index.php' ]
def parse(self, response):
urls = response.xpath('//div[@class="brands"]//a/@href').extract()
for url in urls:
url = response.urljoin(url)
yield scrapy.Request(url=url, callback=self.parse_params)
def parse_params(self, response):
l = ItemLoader(item=AvtogumiItem(), response=response)
l.add_xpath('title', '//h4/a/text()')
l.add_xpath('subtitle', '//p[@class="ft-darkgray"]/text()')
l.add_xpath('price', '//span[@class="promo-price"]/text()',
MapCompose(str.strip, str.title))
l.add_xpath('stock', '//div[@class="product-box-stock"]//span/text()')
l.add_xpath('category', '//div[@class="labels hidden-md hidden-lg"][0]//text()')
l.add_xpath('brand', '//h4[@class="brand-header"][0]//text()',
MapCompose(str.strip, str.title))
l.add_xpath('img_path', '//div/img[@class="prod-imglist"]/@src')
yield l.load_item()
next_page_url = response.xpath('//li/a[@class="next"]/@href').extract_first()
if next_page_url:
next_page_url = response.urljoin(next_page_url)
yield scrapy.Request(url=next_page_url, callback=self.parse_params)
2
Answers
use/rewrite this code
The issue here is this:
This snippet of code will parse and load exactly one result. If you have a page with multiple results, you would have to put this code inside a
for
loop and iterate over all the search results you want to parse:Hope this helps