skip to Main Content

scrapy ignore my settins.py

my scraper.py

import scrapy



class BlogSpider(scrapy.Spider):
    name = 'blogspider'
    start_urls = ['https://www.doctolib.de/directory/a']

    def parse(self, response):

        if not response.xpath('//title'):
            yield Request(url=response.url, dont_filter=True)

        if not response.xpath('//lead'):
            yield Request(url=response.url, dont_filter=True)

        for title in response.css('.seo-directory-doctor-link'):
            yield {'title': title.css('a ::attr(href)').extract_first()}

        next_page = response.css('li.seo-directory-page > a[rel=next] ::attr(href)').extract_first()
        if next_page:
            yield scrapy.Request(response.urljoin(next_page), callback=self.parse)

In the same folder as the Script is placed is a settings.py with the following in it

# Retry many times since proxies often fail
RETRY_TIMES = 5
# Retry on most error codes since proxies fail for different reasons
RETRY_HTTP_CODES = [500, 503, 504, 400, 403, 404, 408]

DOWNLOADER_MIDDLEWARES = {
    'scrapy.downloadermiddlewares.retry.RetryMiddleware': 90,
    # Fix path to this module
    'botcrawler.randomproxy.RandomProxy': 600,
    'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware': 110,
}

PROXY_LIST = '/home/user/botcrawler/botcrawler/proxy/list.txt'

Why he don’t load this file? What i do wrong?

Thanks

2

Answers


  1. settings.py file should be in parallel of the spiders folder and your scraper.py should be in spiders folder. You can override the existing settings.py file.

    Login or Signup to reply.
  2. Judging by your other recent posts it looks like you are struggling to start a scrapy project. It would be a good idea to read the Scrapy Tutorial here

    In summary, it will describe how to start a scrapy project by using the command scrapy startproject Blogspider

    This will setup 3 linked folders: Blogspider >> Blogspider >> Spiders

    In the second folder will be the items.py and settings.py files and a couple of other files. You only really need to edit the items.py file.

    In the Spiders folder is where you put your spider and it will read the items.py and settings.py file etc from the prior folder.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search