Scrapy ignore settings.py - SEO

Joni
February 11, 2017
131 views
1 vote
2 Answers

scrapy ignore my settins.py

my scraper.py

import scrapy



class BlogSpider(scrapy.Spider):
    name = 'blogspider'
    start_urls = ['https://www.doctolib.de/directory/a']

    def parse(self, response):

        if not response.xpath('//title'):
            yield Request(url=response.url, dont_filter=True)

        if not response.xpath('//lead'):
            yield Request(url=response.url, dont_filter=True)

        for title in response.css('.seo-directory-doctor-link'):
            yield {'title': title.css('a ::attr(href)').extract_first()}

        next_page = response.css('li.seo-directory-page > a[rel=next] ::attr(href)').extract_first()
        if next_page:
            yield scrapy.Request(response.urljoin(next_page), callback=self.parse)

In the same folder as the Script is placed is a settings.py with the following in it

# Retry many times since proxies often fail
RETRY_TIMES = 5
# Retry on most error codes since proxies fail for different reasons
RETRY_HTTP_CODES = [500, 503, 504, 400, 403, 404, 408]

DOWNLOADER_MIDDLEWARES = {
    'scrapy.downloadermiddlewares.retry.RetryMiddleware': 90,
    # Fix path to this module
    'botcrawler.randomproxy.RandomProxy': 600,
    'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware': 110,
}

PROXY_LIST = '/home/user/botcrawler/botcrawler/proxy/list.txt'

Why he don’t load this file? What i do wrong?

Thanks

Tags: scrapy

Answers

- SHIVAMJINDAL
- February 11, 2017 at 8:25 pm
- 0 votes
0
settings.py file should be in parallel of the spiders folder and your scraper.py should be in spiders folder. You can override the existing settings.py file.

Login or Signup to reply.

- nevster
- February 12, 2017 at 1:08 am
- 0 votes
0
Judging by your other recent posts it looks like you are struggling to start a scrapy project. It would be a good idea to read the Scrapy Tutorial here

In summary, it will describe how to start a scrapy project by using the command scrapy startproject Blogspider

This will setup 3 linked folders: Blogspider >> Blogspider >> Spiders

In the second folder will be the items.py and settings.py files and a couple of other files. You only really need to edit the items.py file.

In the Spiders folder is where you put your spider and it will read the items.py and settings.py file etc from the prior folder.

Login or Signup to reply.

Please signup or login to give your own answer.

Click here to cancel reply.

Scrapy ignore settings.py – SEO

Answers