skip to Main Content

I am trying to scrape a page that contains specific info.
The url:https://www.artisans-du-batiment.com/trouver-un-artisan-qualifie/?job=Charpentier&place=35000%2F35900
I want to select a class for each carpenter, so I try response.css(‘div.a-artisanTease to-animate’), but it gives no selection. What might be the problem?

Thanks.

I’ve tried several different paths.
I need the scrapy to select all separate carpenters that are on the page, so I can later collect info for all search results

2

Answers


  1. The reason you can’t retrieve data with Scrapy is that this webpage is written in JavaScript. Scrapy cannot assist you in this case. You need to use a library that can handle JavaScript, such as Selenium or Splash, to retrieve data from this webpage.

    I recommend using XPath selectors instead of CSS selectors, as XPath offers many useful options for searching text in the DOM. The equivalent XPath code for your code would be:

    //div[@class=’a-artisanTease to-animate’]

    Login or Signup to reply.
  2. The actual reason is because you need a specific cookie for the server to serve you the full html for the page. Also your css selector expression is wrong.

    The cookie needed is "tarteaucitron=!googletagmanager=wait" and the correct css expression would be div.a-artisanTease.to-animate

    For example using scrapy shell:

    In [1]: fetch(scrapy.Request("https://www.artisans-du-batiment.com/trouver-un-artisan-qualifie/?job=Charpentier&place=35000%2F35900", h
       ...: eaders = {"cookies": "tarteaucitron=!googletagmanager=wait"}))
    2023-10-17 00:43:16 [scrapy.core.engine] INFO: Spider opened
    2023-10-17 00:43:17 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.artisans-du-batiment.com/trouver-un-artisan-qualifie/?job=Charpentier&place=35000%2F35900> (referer: None)
    
    In [2]: response.css("div.a-artisanTease.to-animate")
    Out[2]:
    [<Selector query="descendant-or-self::div[(@class and contains(concat(' ', normalize-space(@class), ' '), ' a-artisanTease ')) and (@class and contains(concat(' ', normalize-space(@class), ' '), ' to-animate '))]" data='<div class="a-artisanTease to-animate...'>,
     <Selector query="descendant-or-self::div[(@class and contains(concat(' ', normalize-space(@class), ' '), ' a-artisanTease ')) and (@class and contains(concat(' ', normalize-space(@class), ' '), ' to-animate '))]" data='<div class="a-artisanTease to-animate...'>,
     <Selector query="descendant-or-self::div[(@class and contains(concat(' ', normalize-space(@class), ' '), ' a-artisanTease ')) and (@class and contains(concat(' ', normalize-space(@class), ' '), ' to-animate '))]" data='<div class="a-artisanTease to-animate...'>,
     <Selector query="descendant-or-self::div[(@class and contains(concat(' ', normalize-space(@class), ' '), ' a-artisanTease ')) and (@class and contains(concat(' ', normalize-space(@class), ' '), ' to-animate '))]" data='<div class="a-artisanTease to-animate...'>,
     <Selector query="descendant-or-self::div[(@class and contains(concat(' ', normalize-space(@class), ' '), ' a-artisanTease ')) and (@class and contains(concat(' ', normalize-space(@class), ' '), ' to-animate '))]" data='<div class="a-artisanTease to-animate...'>,
     <Selector query="descendant-or-self::div[(@class and contains(concat(' ', normalize-space(@class), ' '), ' a-artisanTease ')) and (@class and contains(concat(' ', normalize-space(@class), ' '), ' to-animate '))]" data='<div class="a-artisanTease to-animate...'>,
     <Selector query="descendant-or-self::div[(@class and contains(concat(' ', normalize-space(@class), ' '), ' a-artisanTease ')) and (@class and contains(concat(' ', normalize-space(@class), ' '), ' to-animate '))]" data='<div class="a-artisanTease to-animate...'>,
     <Selector query="descendant-or-self::div[(@class and contains(concat(' ', normalize-space(@class), ' '), ' a-artisanTease ')) and (@class and contains(concat(' ', normalize-space(@class), ' '), ' to-animate '))]" data='<div class="a-artisanTease to-animate...'>,
     <Selector query="descendant-or-self::div[(@class and contains(concat(' ', normalize-space(@class), ' '), ' a-artisanTease ')) and (@class and contains(concat(' ', normalize-space(@class), ' '), ' to-animate '))]" data='<div class="a-artisanTease to-animate...'>,
     <Selector query="descendant-or-self::div[(@class and contains(concat(' ', normalize-space(@class), ' '), ' a-artisanTease ')) and (@class and contains(concat(' ', normalize-space(@class), ' '), ' to-animate '))]" data='<div class="a-artisanTease to-animate...'>,
     <Selector query="descendant-or-self::div[(@class and contains(concat(' ', normalize-space(@class), ' '), ' a-artisanTease ')) and (@class and contains(concat(' ', normalize-space(@class), ' '), ' to-animate '))]" data='<div class="a-artisanTease to-animate...'>,
     <Selector query="descendant-or-self::div[(@class and contains(concat(' ', normalize-space(@class), ' '), ' a-artisanTease ')) and (@class and contains(concat(' ', normalize-space(@class), ' '), ' to-animate '))]" data='<div class="a-artisanTease to-animate...'>,
     <Selector query="descendant-or-self::div[(@class and contains(concat(' ', normalize-space(@class), ' '), ' a-artisanTease ')) and (@class and contains(concat(' ', normalize-space(@class), ' '), ' to-animate '))]" data='<div class="a-artisanTease to-animate...'>,
     <Selector query="descendant-or-self::div[(@class and contains(concat(' ', normalize-space(@class), ' '), ' a-artisanTease ')) and (@class and contains(concat(' ', normalize-space(@class), ' '), ' to-animate '))]" data='<div class="a-artisanTease to-animate...'>,
     <Selector query="descendant-or-self::div[(@class and contains(concat(' ', normalize-space(@class), ' '), ' a-artisanTease ')) and (@class and contains(concat(' ', normalize-space(@class), ' '), ' to-animate '))]" data='<div class="a-artisanTease to-animate...'>,
     <Selector query="descendant-or-self::div[(@class and contains(concat(' ', normalize-space(@class), ' '), ' a-artisanTease ')) and (@class and contains(concat(' ', normalize-space(@class), ' '), ' to-animate '))]" data='<div class="a-artisanTease to-animate...'>,
     <Selector query="descendant-or-self::div[(@class and contains(concat(' ', normalize-space(@class), ' '), ' a-artisanTease ')) and (@class and contains(concat(' ', normalize-space(@class), ' '), ' to-animate '))]" data='<div class="a-artisanTease to-animate...'>,
     <Selector query="descendant-or-self::div[(@class and contains(concat(' ', normalize-space(@class), ' '), ' a-artisanTease ')) and (@class and contains(concat(' ', normalize-space(@class), ' '), ' to-animate '))]" data='<div class="a-artisanTease to-animate...'>,
     <Selector query="descendant-or-self::div[(@class and contains(concat(' ', normalize-space(@class), ' '), ' a-artisanTease ')) and (@class and contains(concat(' ', normalize-space(@class), ' '), ' to-animate '))]" data='<div class="a-artisanTease to-animate...'>,
     <Selector query="descendant-or-self::div[(@class and contains(concat(' ', normalize-space(@class), ' '), ' a-artisanTease ')) and (@class and contains(concat(' ', normalize-space(@class), ' '), ' to-animate '))]" data='<div class="a-artisanTease to-animate...'>,
     <Selector query="descendant-or-self::div[(@class and contains(concat(' ', normalize-space(@class), ' '), ' a-artisanTease ')) and (@class and contains(concat(' ', normalize-space(@class), ' '), ' to-animate '))]" data='<div class="a-artisanTease to-animate...'>,
     <Selector query="descendant-or-self::div[(@class and contains(concat(' ', normalize-space(@class), ' '), ' a-artisanTease ')) and (@class and contains(concat(' ', normalize-space(@class), ' '), ' to-animate '))]" data='<div class="a-artisanTease to-animate...'>,
     <Selector query="descendant-or-self::div[(@class and contains(concat(' ', normalize-space(@class), ' '), ' a-artisanTease ')) and (@class and contains(concat(' ', normalize-space(@class), ' '), ' to-animate '))]" data='<div class="a-artisanTease to-animate...'>,
     <Selector query="descendant-or-self::div[(@class and contains(concat(' ', normalize-space(@class), ' '), ' a-artisanTease ')) and (@class and contains(concat(' ', normalize-space(@class), ' '), ' to-animate '))]" data='<div class="a-artisanTease to-animate...'>]
    
    fetch(scrapy.Request("https://www.artisans-du-batiment.com/trouver-un-artisan-qualifie/?job=Charpentier&place=35000%2F35900", headers = {"cookies": "tarteaucitron=!googletagmanager=wait"}))
    
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search