I wrote the below python script to return the item name, price, and link for items listed on https://shop.doverstreetmarket.com/collections/shops-noah
import scrapy
class DSMUKSpider(scrapy.Spider):
name = 'dsmuk'
start_urls = ['https://shop.doverstreetmarket.com/collections/shops-noah']
def parse(self, response):
for dsmuk_product in response.css('article.h-full'):
try:
yield {
'name': dsmuk_product.css('h2.font-display.text-xs.leading-2xs.md:text-sm.md:leading-xs.mb-2.5.a.span::text').get(),
'price': dsmuk_product.css('div.flex.flex-wrap.gap-x-2.uppercase.span::text').get().replace('£',''),
'link': dsmuk_product.css('h2.font-display.text-xs.leading-2xs.md:text-sm.md:leading-xs.mb-2.5.a').attrib['href'],
}
except:
yield {
}`
The desired output table is listed below:
name | price | link |
---|---|---|
Keith Haring Polaroid Long Sleeve Tee | 58 | /collections/shops-noah/products/noah-mens-noah-x-keith-haring-polaroid-l-white |
Keith Haring Leather Ornament | 48 | /collections/shops-noah/products/noah-noah-x-keith-haring-leather-or-brown |
However, running this spider with the scrapy crawl command yields a blank csv – no headers or cell values.
I suspect that the span class in the middle of the a class containing the item name is preventing the parse from returning the desired text; however, I’m not entirely sure how to tweak the CSS notation to account for this – I’d appreciate any help here. Please see below for a snippet of the underlying html I’m attempting to scrape:
<li class="col-span-6 sm:col-span-3">
<article class="
h-full flex flex-col relative
">
<div class="mb-2" style="background-color: #F5F5F5;">
<img src="//shop.doverstreetmarket.com/cdn/shop/files/T160FW23_KEITH_HARRING_POLAROID_LONG_SLEEVE_0393.jpg?v=1702847626&width=600" alt="Noah - Keith Haring Polaroid Long Sleeve Tee - (White)" srcset="//shop.doverstreetmarket.com/cdn/shop/files/T160FW23_KEITH_HARRING_POLAROID_LONG_SLEEVE_0393.jpg?v=1702847626&width=160 160w, //shop.doverstreetmarket.com/cdn/shop/files/T160FW23_KEITH_HARRING_POLAROID_LONG_SLEEVE_0393.jpg?v=1702847626&width=175 175w, //shop.doverstreetmarket.com/cdn/shop/files/T160FW23_KEITH_HARRING_POLAROID_LONG_SLEEVE_0393.jpg?v=1702847626&width=238 238w, //shop.doverstreetmarket.com/cdn/shop/files/T160FW23_KEITH_HARRING_POLAROID_LONG_SLEEVE_0393.jpg?v=1702847626&width=273 273w, //shop.doverstreetmarket.com/cdn/shop/files/T160FW23_KEITH_HARRING_POLAROID_LONG_SLEEVE_0393.jpg?v=1702847626&width=320 320w, //shop.doverstreetmarket.com/cdn/shop/files/T160FW23_KEITH_HARRING_POLAROID_LONG_SLEEVE_0393.jpg?v=1702847626&width=350 350w, //shop.doverstreetmarket.com/cdn/shop/files/T160FW23_KEITH_HARRING_POLAROID_LONG_SLEEVE_0393.jpg?v=1702847626&width=374 374w, //shop.doverstreetmarket.com/cdn/shop/files/T160FW23_KEITH_HARRING_POLAROID_LONG_SLEEVE_0393.jpg?v=1702847626&width=400 400w, //shop.doverstreetmarket.com/cdn/shop/files/T160FW23_KEITH_HARRING_POLAROID_LONG_SLEEVE_0393.jpg?v=1702847626&width=476 476w, //shop.doverstreetmarket.com/cdn/shop/files/T160FW23_KEITH_HARRING_POLAROID_LONG_SLEEVE_0393.jpg?v=1702847626&width=546 546w" width="600" height="600" class=" block w-full h-full aspect-[4/5] object-contain mix-blend-multiply" sizes="(min-width: 1360px) calc(calc(((100 - 16.2) / 100 * 1360px) - 1.5rem) / 4 - (0.5rem * 3 / 4)), (min-width: 1280px) calc(calc(((100vw - (16.2 / 100 * 100vw)) - ((1.5rem * 2) - ((1.5rem * 2) * (16.2 / 100)))) - 1.5rem) / 4 - (0.5rem * 3 / 4)), (min-width: 744px) calc(calc(100vw - (1.5rem * 2)) / 4 - (0.5rem * 3 / 4)), (min-width: 640px) calc(calc(100vw - (1rem * 2)) / 4 - (0.5rem * 3 / 4)), calc(calc(100vw - (1rem * 2)) / 2 - (0.5rem / 2))">
</div>
<h2 class="font-display text-xs leading-2xs md:text-sm md:leading-xs mb-2.5" data-id="7153074733318">
<a class="
before:absolute before:inset-0 before:z-10
focus:before:focus-ring focus-visible:before:focus-ring
[&:focus:not(:focus-visible)]:before:not-focus-ring focus:not-focus-ring
" href="/collections/shops-noah/products/noah-mens-noah-x-keith-haring-polaroid-l-white" @click.prevent="setHistory(7153074733318, '/collections/shops-noah/products/noah-mens-noah-x-keith-haring-polaroid-l-white')">
<span class="block uppercase">
Noah
</span>
Keith Haring Polaroid Long Sleeve Tee
</a>
</h2>
<div class="relative mt-auto">
<div class="group-hover:invisible text-xs leading-2xs md:text-sm md:leading-xs uppercase">
<div class="
flex flex-wrap gap-x-2 uppercase
">
<span class="sr-only">
Regular price
</span>
<span class="
">
£58
</span>
</div>
</div>
</div>
</article>
</li>
<li class="col-span-6 sm:col-span-3">
<article class="
h-full flex flex-col relative
">
<div class="mb-2" style="background-color: #F5F5F5;">
<img src="//shop.doverstreetmarket.com/cdn/shop/files/A217FW23_KEITH_HARRING_ORNAMENT_0969.jpg?v=1702827766&width=600" alt="Noah - Keith Haring Leather Ornament - (Brown)" srcset="//shop.doverstreetmarket.com/cdn/shop/files/A217FW23_KEITH_HARRING_ORNAMENT_0969.jpg?v=1702827766&width=160 160w, //shop.doverstreetmarket.com/cdn/shop/files/A217FW23_KEITH_HARRING_ORNAMENT_0969.jpg?v=1702827766&width=175 175w, //shop.doverstreetmarket.com/cdn/shop/files/A217FW23_KEITH_HARRING_ORNAMENT_0969.jpg?v=1702827766&width=238 238w, //shop.doverstreetmarket.com/cdn/shop/files/A217FW23_KEITH_HARRING_ORNAMENT_0969.jpg?v=1702827766&width=273 273w, //shop.doverstreetmarket.com/cdn/shop/files/A217FW23_KEITH_HARRING_ORNAMENT_0969.jpg?v=1702827766&width=320 320w, //shop.doverstreetmarket.com/cdn/shop/files/A217FW23_KEITH_HARRING_ORNAMENT_0969.jpg?v=1702827766&width=350 350w, //shop.doverstreetmarket.com/cdn/shop/files/A217FW23_KEITH_HARRING_ORNAMENT_0969.jpg?v=1702827766&width=374 374w, //shop.doverstreetmarket.com/cdn/shop/files/A217FW23_KEITH_HARRING_ORNAMENT_0969.jpg?v=1702827766&width=400 400w, //shop.doverstreetmarket.com/cdn/shop/files/A217FW23_KEITH_HARRING_ORNAMENT_0969.jpg?v=1702827766&width=476 476w, //shop.doverstreetmarket.com/cdn/shop/files/A217FW23_KEITH_HARRING_ORNAMENT_0969.jpg?v=1702827766&width=546 546w" width="600" height="600" class=" block w-full h-full aspect-[4/5] object-contain mix-blend-darken" sizes="(min-width: 1360px) calc(calc(((100 - 16.2) / 100 * 1360px) - 1.5rem) / 4 - (0.5rem * 3 / 4)), (min-width: 1280px) calc(calc(((100vw - (16.2 / 100 * 100vw)) - ((1.5rem * 2) - ((1.5rem * 2) * (16.2 / 100)))) - 1.5rem) / 4 - (0.5rem * 3 / 4)), (min-width: 744px) calc(calc(100vw - (1.5rem * 2)) / 4 - (0.5rem * 3 / 4)), (min-width: 640px) calc(calc(100vw - (1rem * 2)) / 4 - (0.5rem * 3 / 4)), calc(calc(100vw - (1rem * 2)) / 2 - (0.5rem / 2))">
</div>
<h2 class="font-display text-xs leading-2xs md:text-sm md:leading-xs mb-2.5" data-id="7153075290374">
<a class="
before:absolute before:inset-0 before:z-10
focus:before:focus-ring focus-visible:before:focus-ring
[&:focus:not(:focus-visible)]:before:not-focus-ring focus:not-focus-ring
" href="/collections/shops-noah/products/noah-noah-x-keith-haring-leather-or-brown" @click.prevent="setHistory(7153075290374, '/collections/shops-noah/products/noah-noah-x-keith-haring-leather-or-brown')">
<span class="block uppercase">
Noah
</span>
Keith Haring Leather Ornament
</a>
</h2>
<div class="relative mt-auto">
<div class="group-hover:invisible text-xs leading-2xs md:text-sm md:leading-xs uppercase">
<div class="
flex flex-wrap gap-x-2 uppercase
">
<span class="sr-only">
Regular price
</span>
<span class="
">
£48
</span>
</div>
</div>
</div>
</article>
</li>
<li class="col-span-6 sm:col-span-3">
<article class="
h-full flex flex-col relative
">
<div class="mb-2" style="background-color: #F5F5F5;">
<img src="//shop.doverstreetmarket.com/cdn/shop/files/SS158FW23_THE_CURE_HOODIE_0045.jpg?v=1702827758&width=600" alt="Noah - The Cure Men's Raglan Hoodie - (Black)" srcset="//shop.doverstreetmarket.com/cdn/shop/files/SS158FW23_THE_CURE_HOODIE_0045.jpg?v=1702827758&width=160 160w, //shop.doverstreetmarket.com/cdn/shop/files/SS158FW23_THE_CURE_HOODIE_0045.jpg?v=1702827758&width=175 175w, //shop.doverstreetmarket.com/cdn/shop/files/SS158FW23_THE_CURE_HOODIE_0045.jpg?v=1702827758&width=238 238w, //shop.doverstreetmarket.com/cdn/shop/files/SS158FW23_THE_CURE_HOODIE_0045.jpg?v=1702827758&width=273 273w, //shop.doverstreetmarket.com/cdn/shop/files/SS158FW23_THE_CURE_HOODIE_0045.jpg?v=1702827758&width=320 320w, //shop.doverstreetmarket.com/cdn/shop/files/SS158FW23_THE_CURE_HOODIE_0045.jpg?v=1702827758&width=350 350w, //shop.doverstreetmarket.com/cdn/shop/files/SS158FW23_THE_CURE_HOODIE_0045.jpg?v=1702827758&width=374 374w, //shop.doverstreetmarket.com/cdn/shop/files/SS158FW23_THE_CURE_HOODIE_0045.jpg?v=1702827758&width=400 400w, //shop.doverstreetmarket.com/cdn/shop/files/SS158FW23_THE_CURE_HOODIE_0045.jpg?v=1702827758&width=476 476w, //shop.doverstreetmarket.com/cdn/shop/files/SS158FW23_THE_CURE_HOODIE_0045.jpg?v=1702827758&width=546 546w" width="600" height="600" loading="lazy" class=" block w-full h-full aspect-[4/5] object-contain mix-blend-darken" sizes="(min-width: 1360px) calc(calc(((100 - 16.2) / 100 * 1360px) - 1.5rem) / 4 - (0.5rem * 3 / 4)), (min-width: 1280px) calc(calc(((100vw - (16.2 / 100 * 100vw)) - ((1.5rem * 2) - ((1.5rem * 2) * (16.2 / 100)))) - 1.5rem) / 4 - (0.5rem * 3 / 4)), (min-width: 744px) calc(calc(100vw - (1.5rem * 2)) / 4 - (0.5rem * 3 / 4)), (min-width: 640px) calc(calc(100vw - (1rem * 2)) / 4 - (0.5rem * 3 / 4)), calc(calc(100vw - (1rem * 2)) / 2 - (0.5rem / 2))">
</div>
<h2 class="font-display text-xs leading-2xs md:text-sm md:leading-xs mb-2.5" data-id="7153073717510">
<a class="
before:absolute before:inset-0 before:z-10
focus:before:focus-ring focus-visible:before:focus-ring
[&:focus:not(:focus-visible)]:before:not-focus-ring focus:not-focus-ring
" href="/collections/shops-noah/products/noah-mens-noah-x-the-cure-raglan-hoodie-black" @click.prevent="setHistory(7153073717510, '/collections/shops-noah/products/noah-mens-noah-x-the-cure-raglan-hoodie-black')">
<span class="block uppercase">
Noah
</span>
The Cure Men's Raglan Hoodie
</a>
</h2>
<div class="relative mt-auto">
<div class="group-hover:invisible text-xs leading-2xs md:text-sm md:leading-xs uppercase">
<div class="
flex flex-wrap gap-x-2 uppercase
">
<span class="sr-only">
Regular price
</span>
<span class="
">
£198
</span>
</div>
</div>
</div>
</article>
</li>
2
Answers
If tempted to use a pure python approach, you can extract the JSON returned by the script :
To make the (
.csv
), you can use :A variant with pandas‘
to_csv
:Output (144 rows x 3 columns) :
This should achieve what you are looking for:
Explanation:
For the name property all we simply need to get all the inner text content for all elements that are children of the h2 element, for this we need to use the
getall()
method which returns a list of strings, so it is neccessary to concat the strings and then strip the whitespace.For the price property we need the text contents of the second child span element of the div.flex so again we use the
getall()
method but this time we just select the content in index 1 of the resulting list, then we strip whitespace again.for the link property we simply need the href attribute for the a tag that is a child of the h2 element… keep it simple.
partial log output