skip to Main Content

This is the weirdest thing that happened to me
There are scrapy and channels in my Django. After I installed channels3.0.5, my chat room can run normally, but my scrapy can’t run normally, and scrapy stays at 2023-04-04 10:10:42 [scrapy.core .engine] DEBUG: Crawled (200) <POST http://pfsc.agri.cn/api/priceQuotationController/pageList?key=&order=> (referer: None), I debugged and found that I could not enter parse, I used wireshare to grab Get the data packet, there is a return value. I upgraded the channels to 4.0.0, and then my chat room can’t link to the ws server, but scrapy can run normally
thank you for your time

scrapy log:

['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware',
 'scrapy.spidermiddlewares.offsite.OffsiteMiddleware',
 'scrapy.spidermiddlewares.referer.RefererMiddleware',
 'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware',
 'scrapy.spidermiddlewares.depth.DepthMiddleware']
2023-04-04 10:10:42 [scrapy.middleware] INFO: Enabled item pipelines:
['spider.pipelines.SpiderPipeline_PFSC']
2023-04-04 10:10:42 [scrapy.core.engine] INFO: Spider opened
2023-04-04 10:10:42 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2023-04-04 10:10:42 [scrapy.extensions.telnet] INFO: Telnet console listening on 127.0.0.1:6023
2023-04-04 10:10:42 [scrapy.core.engine] DEBUG: Crawled (200) <POST http://pfsc.agri.cn/api/priceQuotationController/pageList?key=&order=> (referer: None)
2023-04-04 10:11:42 [scrapy.extensions.logstats] INFO: Crawled 1 pages (at 1 pages/min), scraped 0 items (at 0 items/min)

scrapy spider:

  def __init__(self, **kwargs):
        super().__init__(**kwargs)
        self.total_prices = 1
        self.url = 'http://pfsc.agri.cn/api/priceQuotationController/pageList?key=&order='
        self.date = datetime.today().strftime('%Y-%m-%d')
        print('today is ' + self.date)
        date_list = PFSC_Price.objects.filter(reportTime=self.date)  
        self.date_list = list(date_list.values())
        for data in self.date_list:
            del data['id']
        self.max_length = 900
        self.i = 0
    def start_requests(self):
        yield Request(
            url=self.url,
            method='POST',
            body='{"pageNum":1,"pageSize":' + f'{self.total_prices}' + ',"marketId":"","provinceCode":"","pid":"","varietyId":""}',
            callback=self.parse,
        )
    def parse(self, response, **kwargs):
        print('in parse')
        json_loads = json.loads(response.text)

requirements.txt

amqp==5.1.1
asgiref==3.6.0
async-timeout==4.0.2
attrs==22.2.0
autobahn==23.1.2
Automat==22.10.0
beautifulsoup4==4.11.1
billiard==3.6.4.0
celery==5.2.7
certifi==2022.9.24
cffi==1.15.1
channels==4.0.0
channels-redis==4.0.0
charset-normalizer==2.1.1
click==8.1.3
click-didyoumean==0.3.0
click-plugins==1.1.1
click-repl==0.2.0
constantly==15.1.0
cryptography==39.0.0
cssselect==1.2.0
daphne==3.0.2
Django==4.1.5
django-extensions==3.2.1
et-xmlfile==1.1.0
filelock==3.9.0
hyperlink==21.0.0
idna==3.4
incremental==22.10.0
itemadapter==0.7.0
itemloaders==1.0.6
jmespath==1.0.1
kombu==5.2.4
lxml==4.9.2
msgpack==1.0.5
mysql-connector-python==8.0.31
mysqlclient==2.1.1
numpy==1.24.1
openpyxl==3.0.10
packaging==22.0
pandas==1.5.2
parsel==1.7.0
Pillow==9.4.0
prompt-toolkit==3.0.38
Protego==0.2.1
protobuf==3.20.1
pyasn1==0.4.8
pyasn1-modules==0.2.8
pycparser==2.21
PyDispatcher==2.0.6
PyMySQL==1.0.2
pyOpenSSL==23.0.0
python-dateutil==2.8.2
pytz==2022.6
queuelib==1.6.2
redis==4.5.1
requests==2.28.1
requests-file==1.5.1
Scrapy==2.7.1
scrapy-djangoitem==1.1.1
sentry-sdk==1.12.1
service-identity==21.1.0
six==1.16.0
soupsieve==2.3.2.post1
sqlparse==0.4.3
tldextract==3.4.0
Twisted==22.10.0
txaio==23.1.1
typing_extensions==4.4.0
tzdata==2022.7
urllib3==1.26.13
vine==5.0.0
w3lib==2.1.1
wcwidth==0.2.6
zope.interface==5.5.2

in channels4.0.0
front end:

chatRoom.js:46 WebSocket connection to 'ws://127.0.0.1:8000/ws/chat/1111/' failed:

Backend:

Not Found: /ws/chat/1111/
[04/Apr/2023 10:44:49] "GET /ws/chat/1111/HTTP/1.1" 404 5746
[04/Apr/2023 10:44:49,600] - Broken pipe from ('127.0.0.1', 44486)

asgi.py

os.environ.setdefault('DJANGO_SETTINGS_MODULE', 'everything.settings')
os.environ.setdefault('DJANGO_SETTINGS_MODULE', 'core.settings')
application = ProtocolTypeRouter({
    'http': get_asgi_application(),
    'websocket': SessionMiddlewareStack( 
        URLRouter(
            website.routing.websocket_urlpatterns
        )
    ),  
})

routing.py

websocket_urlpatterns = [ re_path(r'ws/chat/(?P<room_name>w+)/$', consumers.ChatConsumer.as_asgi()), ]

2

Answers


  1. Chosen as BEST ANSWER

    I found the answer that clannels4.0.0 could not find the ws link, and added 'daphne' to the first line of the INSTALLED_APPS of the setting in Django. Fortunately, the chat room can run, but unfortunately scrapy can't run

    I found a solution, although I don't understand why there is a conflict, because I use pycharm, so I gave scrapy a separate virtual environment, although it didn't solve the problem, but it can be run as a small victory


  2. I use channels4.0 and scrapy, and I had the same problem. I still don’t know how to solve the problem that scrapy cannot run when using daphne as asgi in channels 4.0.

    My solution is: Remove ‘daphne’ of the INSTALLED_APPS, and running Django in uvicorn, not with the runserver command in my development environment, and then everything is alright.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search