This is the weirdest thing that happened to me
There are scrapy and channels in my Django. After I installed channels3.0.5, my chat room can run normally, but my scrapy can’t run normally, and scrapy stays at 2023-04-04 10:10:42 [scrapy.core .engine] DEBUG: Crawled (200) <POST http://pfsc.agri.cn/api/priceQuotationController/pageList?key=&order=> (referer: None)
, I debugged and found that I could not enter parse, I used wireshare to grab Get the data packet, there is a return value. I upgraded the channels to 4.0.0, and then my chat room can’t link to the ws server, but scrapy can run normally
thank you for your time
scrapy log:
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware',
'scrapy.spidermiddlewares.offsite.OffsiteMiddleware',
'scrapy.spidermiddlewares.referer.RefererMiddleware',
'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware',
'scrapy.spidermiddlewares.depth.DepthMiddleware']
2023-04-04 10:10:42 [scrapy.middleware] INFO: Enabled item pipelines:
['spider.pipelines.SpiderPipeline_PFSC']
2023-04-04 10:10:42 [scrapy.core.engine] INFO: Spider opened
2023-04-04 10:10:42 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2023-04-04 10:10:42 [scrapy.extensions.telnet] INFO: Telnet console listening on 127.0.0.1:6023
2023-04-04 10:10:42 [scrapy.core.engine] DEBUG: Crawled (200) <POST http://pfsc.agri.cn/api/priceQuotationController/pageList?key=&order=> (referer: None)
2023-04-04 10:11:42 [scrapy.extensions.logstats] INFO: Crawled 1 pages (at 1 pages/min), scraped 0 items (at 0 items/min)
scrapy spider:
def __init__(self, **kwargs):
super().__init__(**kwargs)
self.total_prices = 1
self.url = 'http://pfsc.agri.cn/api/priceQuotationController/pageList?key=&order='
self.date = datetime.today().strftime('%Y-%m-%d')
print('today is ' + self.date)
date_list = PFSC_Price.objects.filter(reportTime=self.date)
self.date_list = list(date_list.values())
for data in self.date_list:
del data['id']
self.max_length = 900
self.i = 0
def start_requests(self):
yield Request(
url=self.url,
method='POST',
body='{"pageNum":1,"pageSize":' + f'{self.total_prices}' + ',"marketId":"","provinceCode":"","pid":"","varietyId":""}',
callback=self.parse,
)
def parse(self, response, **kwargs):
print('in parse')
json_loads = json.loads(response.text)
requirements.txt
amqp==5.1.1
asgiref==3.6.0
async-timeout==4.0.2
attrs==22.2.0
autobahn==23.1.2
Automat==22.10.0
beautifulsoup4==4.11.1
billiard==3.6.4.0
celery==5.2.7
certifi==2022.9.24
cffi==1.15.1
channels==4.0.0
channels-redis==4.0.0
charset-normalizer==2.1.1
click==8.1.3
click-didyoumean==0.3.0
click-plugins==1.1.1
click-repl==0.2.0
constantly==15.1.0
cryptography==39.0.0
cssselect==1.2.0
daphne==3.0.2
Django==4.1.5
django-extensions==3.2.1
et-xmlfile==1.1.0
filelock==3.9.0
hyperlink==21.0.0
idna==3.4
incremental==22.10.0
itemadapter==0.7.0
itemloaders==1.0.6
jmespath==1.0.1
kombu==5.2.4
lxml==4.9.2
msgpack==1.0.5
mysql-connector-python==8.0.31
mysqlclient==2.1.1
numpy==1.24.1
openpyxl==3.0.10
packaging==22.0
pandas==1.5.2
parsel==1.7.0
Pillow==9.4.0
prompt-toolkit==3.0.38
Protego==0.2.1
protobuf==3.20.1
pyasn1==0.4.8
pyasn1-modules==0.2.8
pycparser==2.21
PyDispatcher==2.0.6
PyMySQL==1.0.2
pyOpenSSL==23.0.0
python-dateutil==2.8.2
pytz==2022.6
queuelib==1.6.2
redis==4.5.1
requests==2.28.1
requests-file==1.5.1
Scrapy==2.7.1
scrapy-djangoitem==1.1.1
sentry-sdk==1.12.1
service-identity==21.1.0
six==1.16.0
soupsieve==2.3.2.post1
sqlparse==0.4.3
tldextract==3.4.0
Twisted==22.10.0
txaio==23.1.1
typing_extensions==4.4.0
tzdata==2022.7
urllib3==1.26.13
vine==5.0.0
w3lib==2.1.1
wcwidth==0.2.6
zope.interface==5.5.2
in channels4.0.0
front end:
chatRoom.js:46 WebSocket connection to 'ws://127.0.0.1:8000/ws/chat/1111/' failed:
Backend:
Not Found: /ws/chat/1111/
[04/Apr/2023 10:44:49] "GET /ws/chat/1111/HTTP/1.1" 404 5746
[04/Apr/2023 10:44:49,600] - Broken pipe from ('127.0.0.1', 44486)
asgi.py
os.environ.setdefault('DJANGO_SETTINGS_MODULE', 'everything.settings')
os.environ.setdefault('DJANGO_SETTINGS_MODULE', 'core.settings')
application = ProtocolTypeRouter({
'http': get_asgi_application(),
'websocket': SessionMiddlewareStack(
URLRouter(
website.routing.websocket_urlpatterns
)
),
})
routing.py
websocket_urlpatterns = [ re_path(r'ws/chat/(?P<room_name>w+)/$', consumers.ChatConsumer.as_asgi()), ]
2
Answers
I found the answer that clannels4.0.0 could not find the ws link, and added 'daphne' to the first line of the INSTALLED_APPS of the setting in Django. Fortunately, the chat room can run, but unfortunately scrapy can't run
I found a solution, although I don't understand why there is a conflict, because I use pycharm, so I gave scrapy a separate virtual environment, although it didn't solve the problem, but it can be run as a small victory
I use channels4.0 and scrapy, and I had the same problem. I still don’t know how to solve the problem that scrapy cannot run when using daphne as asgi in channels 4.0.
My solution is: Remove ‘daphne’ of the INSTALLED_APPS, and running Django in uvicorn, not with the runserver command in my development environment, and then everything is alright.