skip to Main Content

I’m struggling trying to load an HTML fragment with Cheerio.

var htmlString = '<div class="artist"><i class="user blue circle icon"></i> Skyy</div>';
var $ = cheerio.load(data);
console.info($.html());

Outputs

<html><head></head><body><div class="artist"><i class="user blue circle icon"></i> Skyy</div></body></html>

My problem is, I think, that Cheerio wraps my content within an HTML document, which makes it difficult to access the node directly.

I could eventually use this selector, it works pretty fine:

var el = $('body').children().first();

But it doesn’t always work. For instance,

var htmlString = '<meta name="description" content="My description">';
var $ = cheerio.load(data);
console.info($.html());

Will output a different kind of document, where var el = $('body').children().first(); will not work:

<html><head><meta name="description" content="My description"></head><body></body></html>

So, is there a way to load an HTML fragment and to access it as a Cheerio element without using a selector?

I want to be able to use the Cheerio functions like .text(), .html() or .attr(), on the populated node.

3

Answers


  1. Chosen as BEST ANSWER

    I found out a solution.

    By loading a blank document, I can add my html string manually to it - so i'm sure it will be in the <body/>, even if it's a meta element that Cheerio would normally load in the <head/>.

    var $ = cheerio.load('');
    $('body').append(htmlString);
    var el = $('body').children().first();
    

  2. There is an option in cheerio to disable wrapping your html in other tags, the third argument (the second takes an object containing additional options; we can set it to null) of cheerio.load:

    const $ = cheerio.load(htmlString, null, false)
    console.log($.html()) // <-- just the html, not wrapped
    

    You can view the source for more info.

    Login or Signup to reply.
  3. This answer offers a great start, but it doesn’t show a consistent way to access the tag.

    $(":root").first(); seems like a good approach to extract the first tag, not extensively tested but looks promising on a spot check.

    const cheerio = require("cheerio"); // 1.0.0-rc.12
    const {strict: assert} = require("node:assert");
    
    const loadOneTag = html => {
      const $ = cheerio.load(html, null, /*isDocument=*/false);
      assert.equal($.html(), html);
      return $(":root").first();
    };
    
    {
      // one top-level child
      const html = `
        <div class="artist"><i class="user blue circle icon"></i> Skyy</div>
      `;
      assert.equal(loadOneTag(html).attr("class"), "artist");
    }
    
    {
      // multiple top-level children
      const html = `
        <div class="artist"><i class="user blue circle icon"></i> Skyy</div>
        <p>asdf</p>
      `;
      assert.equal(loadOneTag(html).attr("class"), "artist");
    }
    
    {
      // meta tag
      const html = `
        <meta name="description" content="My description">
      `;
      assert.equal(loadOneTag(html).attr("name"), "description");
    }
    
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search