skip to Main Content

I want to read an external HTML page after JS scripts of this page are executed.

When I read the page as usual it doesn’t includes content that is added by JS script of this page later.

How is this possible?

const xmlHttp = new XMLHttpRequest.XMLHttpRequest()
xmlHttp.open("GET", path)
        xmlHttp.onreadystatechange = function () {
    if (xmlHttp.readyState === 4) 
        if (xmlHttp.status === 200) 
            callback(JSON.parse(xmlHttp.responseText))
}
xmlHttp.send()

2

Answers


  1. If it actually works to load the page, then you can try this quite error prone and insecure way

    fetch('page.html')
      .then(response => response.text())
      .then(html => {
        const container = document.getElementById('someContainer');
        container.innerHTML = html;
    
        // Extract and execute scripts
        const scripts = container.querySelectorAll('script');
        scripts.forEach(script => {
          const newScript = document.createElement('script');
          const src = script.getAttribute('src');
          if (src) {
            // Handling external script
            newScript.src = src;
            newScript.addEventHandler('load',() => console.log(`Script with src ${src} loaded`));
            newScript.addEventHandler('error',() => console.error(`Error loading script with src ${src}`));
          } else {
            // Handling inline script
            newScript.text = script.textContent;
          }
          document.body.appendChild(newScript);
          document.body.removeChild(newScript); // Clean up after appending to execute
        });
      })
      .catch(error => console.error('Failed to fetch page:', error));
    Login or Signup to reply.
  2. You could create an iframe -> load the url in it -> read the iframe’s contents -> then delete it. This could be your JS code:

    var url = "https://example.com/page.html";
    var iframe = document.createElement("iframe");
    
    iframe.style.display = "none";
    iframe.src = url;
    
    document.body.appendChild(iframe);
    
    iframe.onload = function() {
        var iframeDocument = iframe.contentDocument || iframe.contentWindow.document;
        var iframeContent = iframeDocument.documentElement.outerHTML;
    
        document.body.removeChild(iframe);
        
        console.log(iframeContent);
    };
    

    For example if this would be the content of page.html:

    <html>
    <head>
    <script>
    document.addEventListener("DOMContentLoaded", function() {
        document.querySelector("body").textContent = 'test';
    });
    </script>
    </head>
    <body>
    </body>
    </html>
    

    then the console.log would write out this result:

    <html><head>
    <script>
    document.addEventListener("DOMContentLoaded", function() {
        document.querySelector("body").textContent = 'test';
    });
    </script>
    </head>
    <body>test</body></html>
    
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search