scrapy Questions

Visual Studio Code – How to debug Scrapy in VS Code?

August 8, 2024
Donko
2 Answers

The problem is that I can't debug Scrapy crawlers in VS Code. The problem is that always when I start debugging it breaks on one of my imports. Of course, I played a lot with that import in order to…

VIEW QUESTION

Html – Selecting last-child's text with Scrapy

April 10, 2024
astrochicken
2 Answers

How do I extract the text from the last <li> in the following snippet? (Černošice.) <footer class="SearchResultCard__footer"> <ul class="SearchResultCard__footerList"> <li class="SearchResultCard__footerItem"> <svg xmlns="http://www.w3.org/2000/svg" width="16" height="16" viewBox="0 0 24 24" fill="none" id="7c37b661a1f4030a0673d3e5cb419678" aria-hidden="true"> <path fill-rule="evenodd" clip-rule="evenodd" d="M6.16146 2H9.83854C10.3657 1.99998 10.8205 1.99997…

VIEW QUESTION

Html – Web scraping with Scrapy and Python from one script and a javascript website

February 11, 2024
SpicyyRiice
2 Answers

Hi I'm trying to web scrape (with Scrapy) this website https://www.vaniercollege.qc.ca/sports-recreation/weekly-schedule/ from this script below script.py import scrapy from scrapy.crawler import CrawlerProcess from threading import Thread class CourtSpider(scrapy.Spider): name = 'full_page' allowed_domains = ['vaniercollege.qc.ca'] start_urls = ['https://www.vaniercollege.qc.ca/sports-recreation/weekly-schedule/'] def parse(self, response):…

VIEW QUESTION

Unwanted newline characters in JSON while web scraping

January 19, 2024
esyilmaz
2 Answers

I want to extract info from this website using Scrapy. But the info I need is in a JSON file; and this JSON file has unwanted literal newlines characters in only the description section. Here is an example page and…

VIEW QUESTION

Ubuntu – ScrapyRT Port Unreachable in Kubernetes Docker Container Pod

January 15, 2024
rom
2 Answers

I'm experiencing difficulties in accessing a ScrapyRT service running on specific ports within a Kubernetes pod. My setup includes a Kubernetes cluster with a pod running a Scrapy application, which uses ScrapyRT to listen for incoming requests on designated ports.…

VIEW QUESTION

Html – extract hidden links from web page

December 31, 2023
mohamed sultan
2 Answers

please check this link https://maroof.sa/businesses. it is a link for website from which i want to extract links. for example if you scroll down you would find a name for store "Marwa store" if you click on this card this…

VIEW QUESTION

Html – CSS Notation for a Scrapy Spider Script

December 20, 2023
Teron
2 Answers

I wrote the below python script to return the item name, price, and link for items listed on https://shop.doverstreetmarket.com/collections/shops-noah import scrapy class DSMUKSpider(scrapy.Spider): name = 'dsmuk' start_urls = ['https://shop.doverstreetmarket.com/collections/shops-noah'] def parse(self, response): for dsmuk_product in response.css('article.h-full'): try: yield { 'name':…

VIEW QUESTION

Html – How to select specific class with Scrapy

October 17, 2023
user2502913
2 Answers

I am trying to scrape a page that contains specific info. The url:https://www.artisans-du-batiment.com/trouver-un-artisan-qualifie/?job=Charpentier&place=35000%2F35900 I want to select a class for each carpenter, so I try response.css('div.a-artisanTease to-animate'), but it gives no selection. What might be the problem? Thanks. I've tried…

VIEW QUESTION

Ubuntu – Scrapy Spider not populating CSV and terminates early

August 28, 2023
J-K Equipment
2 Answers

We have a vendor that sells many products that we want to include. they unfortunately do not have an API setup to retrieve information. they do however give us product lists that show just the SKU of an item. I…

VIEW QUESTION

Json – Shopee API to get products data doesn't seem to work anymore (it worked before)

August 21, 2023
Ice Bear
2 Answers

Here's a simple scrapy spider that anyone can use for testing. from scrapy.utils.response import open_in_browser import scrapy import json class TestSpider(scrapy.Spider): name = "test-spider" allowed_domains = ["shopee.ph"] shopee_cookies = '[{"name": "csrftoken", "value": "RvxBdTixvBfdTR3xfQwbcYippqz8jEbF", "domain": "shopee.ph", "path": "/", "expires": -1, "httpOnly":…

VIEW QUESTION

Page 1
Page 2
Page 3
Next