skip to Main Content

I have an HTML string containing a <script> tag which contains the javascript to create a shadow DOM element via window.customElements.define(…) this in turn contains an innerHTML definition which defines the custom element’s HTML as a string.

This is valid HTML which I’m attempting to process using PHP’s DOMDocument, however it appears that DOMDocument is confused by the content of the innerHTML and starts treating it’s content as nodes it needs to process.

Is there any way to work around this so it no longer confuses DOMDocument?

the pertinent part of the HTML looks somewhat like this:

<script>
class ExampleElement extends HTMLElement {
   constructor() {
      super();
      this.attachShadow({ mode: 'open' })
          .innerHTML = '<label>this is what confuses DOMDocument</label>'
  }
}
window.customElements.define('example-element', ExampleElement);
</script>

this is then processed in PHP like this

$doc = new DOMDocument();
$doc->loadHTML($html, LIBXML_HTML_NODEFDTD | LIBXML_HTML_NOIMPLIED);

libxml then generates an error about the </label> not matching : "Unexpected end tag : label in Entity"

obviously I can either
– break up the innerHTML so that DOMDocument no longer identifies the <label> and </label> as tags using string concatenation
or
– build the element’s content via document.createElement(…) etc

however since this is valid HTML it would be useful to know if it can be parsed as i stands.

2

Answers


  1. class ExampleElement extends HTMLElement {
       constructor() {
          super();
          this.attachShadow({ mode: 'open' })
              .innerHTML = '<label>this is what confuses DOMDocument</label>'
      }
    }
    window.customElements.define('example-element', ExampleElement);
    </script>';
    
    $doc = new DOMDocument();
    libxml_use_internal_errors(true); // Disable libxml errors
    $doc->loadHTML($html, LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD);
    
    libxml_use_internal_errors(false); // Enable libxml errors 
    
    // Continue processing the DOMDocument object as required
    

    You can use the following code to parse html containing the javascript using PHPDOM Document

    Login or Signup to reply.
  2. Per: https://bugs.php.net/bug.php?id=80095

    libxml uses HTML 4 rules which say that </ is an ending tag. Even if the tag doesn’t match the last opening tag. To avoid this problem, write the ending tags in your script as "</".

    So change </label> to </label>.

    It will parse clean and JS should interpret / as a literal / in the string.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search