skip to Main Content

Using cheerio, how can i grab 2 separate html contents which follow an html element, and not are inside a specific html element?
what i want to grab is from:

<div>
   <time>
   <svg>...<svg/>
   "first string I want to grab"
    <svg>...<svg/>
   "second string I want to grab"
   </time>
</div>
 $(item).find('div').find('time').find('svg:nth-of-type(2)').text();
   const result = [...$(item).find('header').find('div').find('span:nth-of-type(1)').find('time').childNodes]
                    .filter(e =>
                        e.nodeType === Node.TEXT_NODE && e.textContent.trim()
                    )
                    .map(e => e.textContent.trim());

2

Answers


  1. Your example isn’t reproducible, but if you fix your selectors and/or use correct closing tags, </svg> rather than <svg/>, this answer should work out of the box:

    const cheerio = require("cheerio"); // 1.0.0-rc.12
    
    const html = `<div>
      <time>
        <svg>...</svg>
        "first string I want to grab"
        <svg>...</svg>
        "second string I want to grab"
      </time>
    </div>`;
    
    const $ = cheerio.load(html);
    const result = [...$("div time").contents()]
      .filter(e => e.type === "text" && $(e).text().trim())
      .map(e => $(e).text().trim());
    console.log(result);
    

    Output:

    [ '"first string I want to grab"', '"second string I want to grab"' ]
    

    As mentioned in the comments, CSS already handles descendants, so you can use

    .find("header div span:nth-of-type(1) time")
    

    rather than

    .find('header').find('div').find('span:nth-of-type(1)').find('time')
    

    If this doesn’t work, please share the actual site or full HTML structure you’re working with. In addition to the </svg> typo, there is no <span> in your snippet.

    It’s surprising there are no class names here. Usually, classes, attributes and ids are more reliable than nth tag selectors. Instead of retyping an incorrect excerpt, it’s better to provide the actual HTML, copy-pasted to preserve syntax and attributes.

    Note that Cheerio only works on static HTML. If the site uses JavaScript to create these elements, that might explain why you can’t find them if you’re pulling down the page with fetch or axios. Ensure the elements are visible in the view-source: version of the site–the dev tools element inspector might be misleading. If they’re not in the static HTML, consider using Playwright rather than fetch/cheerio to scrape them.

    Additional "get text node in Cheerio" threads:

    Login or Signup to reply.
  2. You have to use the parse5 methods for "text nodes":

    $('svg').get().map(svg => $(svg.nextSibling).text())
    
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search