skip to Main Content

I’m trying to get the specific text strings below as separated outputs e.g. (scrape them from the HTML below):

let text = "Thats the first text I need";
let text2 = "The second text I need";
let text3 = "The third text I need";

I really don’t know how to get a text that’s separated by different HTML tags.

<p>
   <span class="hidden-text"><span class="ft-semi">Count:</span>31<br></span>
   <span class="ft-semi">Something:</span> That's the first text I need
   <span class="hidden-text"><span class="ft-semi">Something2:</span> </span>The second text I need
   <br><span class="ft-semi">Something3:</span> The third text I need
</p>

2

Answers


  1. Try something like this and see if it works:

    html = `your sample html above`
    
    domdoc = new DOMParser().parseFromString(html, "text/html")
    result = domdoc.evaluate('//text()[not(ancestor::span)]', domdoc, null, XPathResult.ORDERED_NODE_SNAPSHOT_TYPE, null);
    
    for (let i = 0; i < result.snapshotLength; i++) {
      target = result.snapshotItem(i).textContent.trim()
      if (target.length > 0) {
        console.log(target);
      }
    }
    

    Using your sample html, the output should be:

    "That's the first text I need"
    "The second text I need"
    "The third text I need"
    
    Login or Signup to reply.
  2. You can iterate the child nodes of the <p> and grab any nodeType === Node.TEXT_NODEs that have nonempty content:

    for (const e of document.querySelector("p").childNodes) {
      if (e.nodeType === Node.TEXT_NODE && e.textContent.trim()) {
        console.log(e.textContent.trim());
      }
    }
    
    // or to make an array:
    const result = [...document.querySelector("p").childNodes]
      .filter(e =>
        e.nodeType === Node.TEXT_NODE && e.textContent.trim()
      )
      .map(e => e.textContent.trim());
    console.log(result);
    <p>
      <span class="hidden-text">
        <span class="ft-semi">Count:</span>
        31
        <br>
      </span>
      <span class="ft-semi">Something:</span>
      That's the first text I need
      <span class="hidden-text">
        <span class="ft-semi">Something2:</span>
      </span>
      The second text I need
      <br>
      <span class="ft-semi">Something3:</span>
      The third text I need
    </p>

    In Cheerio:

    const cheerio = require("cheerio"); // 1.0.0-rc.12
    
    const html = `
    <p>
      <span class="hidden-text">
        <span class="ft-semi">Count:</span>
        31
        <br>
      </span>
      <span class="ft-semi">Something:</span>
      That's the first text I need
      <span class="hidden-text">
        <span class="ft-semi">Something2:</span>
      </span>
      The second text I need
      <br>
      <span class="ft-semi">Something3:</span>
      The third text I need
    </p>
    `;
    
    const $ = cheerio.load(html);
    const result = [...$("p").contents()]
      .filter(e => e.type === "text" && $(e).text().trim())
      .map(e => $(e).text().trim());
    
    console.log(result);
    
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search