scrapy ignore my settins.py
my scraper.py
import scrapy
class BlogSpider(scrapy.Spider):
name = 'blogspider'
start_urls = ['https://www.doctolib.de/directory/a']
def parse(self, response):
if not response.xpath('//title'):
yield Request(url=response.url, dont_filter=True)
if not response.xpath('//lead'):
yield Request(url=response.url, dont_filter=True)
for title in response.css('.seo-directory-doctor-link'):
yield {'title': title.css('a ::attr(href)').extract_first()}
next_page = response.css('li.seo-directory-page > a[rel=next] ::attr(href)').extract_first()
if next_page:
yield scrapy.Request(response.urljoin(next_page), callback=self.parse)
In the same folder as the Script is placed is a settings.py with the following in it
# Retry many times since proxies often fail
RETRY_TIMES = 5
# Retry on most error codes since proxies fail for different reasons
RETRY_HTTP_CODES = [500, 503, 504, 400, 403, 404, 408]
DOWNLOADER_MIDDLEWARES = {
'scrapy.downloadermiddlewares.retry.RetryMiddleware': 90,
# Fix path to this module
'botcrawler.randomproxy.RandomProxy': 600,
'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware': 110,
}
PROXY_LIST = '/home/user/botcrawler/botcrawler/proxy/list.txt'
Why he don’t load this file? What i do wrong?
Thanks
2
Answers
settings.py file should be in parallel of the spiders folder and your scraper.py should be in spiders folder. You can override the existing settings.py file.
Judging by your other recent posts it looks like you are struggling to start a scrapy project. It would be a good idea to read the Scrapy Tutorial here
In summary, it will describe how to start a scrapy project by using the command
scrapy startproject Blogspider
This will setup 3 linked folders: Blogspider >> Blogspider >> Spiders
In the second folder will be the
items.py
andsettings.py
files and a couple of other files. You only really need to edit theitems.py
file.In the Spiders folder is where you put your spider and it will read the
items.py
andsettings.py
file etc from the prior folder.