I am trying to extract the img src from the following xml tag inside of an item
I am calling cheerio.load on my response data like so
const $ = cheerio.load(response.data, { xmlMode: true });
$("item").each((i, item) => {
and I am coming across this specific tag in item that I want to extract the img src from
<figure class="wp-block-image size-large">
<img decoding="async" loading="lazy" width="800" height="572" src="http://wmcmuaythai.org/wp-content/uploads/2023/04/WhatsApp-Image-2023-04-07-at-3.18.13-PM-2-800x572.jpeg" alt="" class="wp-image-43535" srcset="http://wmcmuaythai.org/wp-content/uploads/2023/04/WhatsApp-Image-2023-04-07-at-3.18.13-PM-2-800x572.jpeg 800w, http://wmcmuaythai.org/wp-content/uploads/2023/04/WhatsApp-Image-2023-04-07-at-3.18.13-PM-2-350x250.jpeg 350w, http://wmcmuaythai.org/wp-content/uploads/2023/04/WhatsApp-Image-2023-04-07-at-3.18.13-PM-2-768x549.jpeg 768w, http://wmcmuaythai.org/wp-content/uploads/2023/04/WhatsApp-Image-2023-04-07-at-3.18.13-PM-2.jpeg 1024w" sizes="(max-width: 800px) 100vw, 800px" />
</figure>
I have tried the following cheerio queries and either keep getting undefined or not what I want.
$(item).find("figure").find("img").attr("src")
$(item).find("img").attr("src")
$(item).find("figure").children().find("img").attr("src")
$(item).find("figure").first().find("img").attr("src")
This is the rss feed in which I am trying to extract the figure from
2
Answers
You can use the
$("img", item)
selector to find the img tag within the item element and then use the.attr("src")
I’m not too familiar with XML but the tags you want look like they’re inside CDATA. I’ve had success in the past by loading the CDATA text into Cheerio, then traversing that inner structure.
I also don’t know how to select
content:encoded
(the elements containing the CDATA) since Cheerio thinks:
is a pseudoselector rather than part of the tag name, so the following approach is a bit crude.As you can see, this picks up some duplicate images so you may wish to refine the selectors a bit further or unflatten the map to maintain the groupings, depending on whatever your expected result is.