Getting page content via XMLHTTPRequest - Javascript

Zeus
April 13, 2023
162 views
0 votes
2 Answers

I am using Protractor for my testing. I have 100 links on a page, rather than clicking on each link and checking the page content I am trying to get the page content via XMLHTTPRequest. I am passing all the contents in the request header that I see on the browser for that page. However, when I run the below code I am getting the log in page, apparently because I do not have active session. However, my UI tests in the browser continues to work fine so I am not logged out on my UI tests.
My questions are:

Is there a way for passing current browser session to the XMLHTTPRequest so that I am not redirected to the log in page?
Is there any other way I can get the page content without having to manually open each of the pages?

Here is my code

            var request = new XMLHttpRequest();
            let cookies = await browser.manage().getCookies();
            let cookie = "";
            for(let obj of cookies){
                cookie+= Object.keys(obj).map(() => obj.name + "=" + obj.value);
            }
            cookie.replaceAll(",", ";")
            request.open('GET', 'https://mybaseURL.com/member/XX001', false);
            request.withCredentials = true;
            request.setRequestHeader("sec-ch-ua", ""Chromium";v="112", "Google Chrome";v="112", "Not:A-Brand";v="99"");
            request.setRequestHeader("sec-ch-ua-mobile", "?0");
            request.setRequestHeader("sec-ch-ua-platform", "Windows");
            request.setRequestHeader("Sec-Fetch-Dest", "empty");
            request.setRequestHeader("Sec-Fetch-Mode", "cors");
            request.setRequestHeader("Sec-Fetch-Site", "same-origin");
            request.setRequestHeader("User-Agent", "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/112.0.0.0 Safari/537.36");
            request.setRequestHeader("Accept", "application/json");
            // request.setRequestHeader("Accept-Encoding", "gzip, deflate, br");
            request.setRequestHeader("Accept-Language", "en-US,en;q=0.9");
            // request.setRequestHeader("Connection", "keep-alive");
            request.setRequestHeader("Content-Type", "application/json");
            request.setRequestHeader("Cookie", cookie);
            request.send(null);
            console.log(request.responseText)

Response:

<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
<html><head>
<title>302 Found</title>
</head><body>
<h1>Found</h1>
<p>The document has moved <a href="https://mybaseurl.com/signon.do?TYPE=123456&amp;REALMOID=06-123456-1234-1234-a702-3a6f0a060000&amp;GUID=&amp;SMAUTHREASON=0&amp;METHOD=GET&amp;SMAGENTNAME=$SM$vO344545Q%245454545543C%2bOm645545en5s3&amp;TARGET=$SM$https%3a%2f%2fmybaseurl%2ecom%2fmember%2fXX001">here</a>.</p>
</body></html>

Answers

Chosen as BEST ANSWER

I found the way to get the DOM of the page without physically opening the page using Fetch. It worked like a charm for me.

            var cookies = await browser.manage().getCookies();
            var myCookie = "";

            for(let obj of cookies){
                myCookie+= Object.keys(obj).map(() => {
                    obj.name + "=" + obj.value
                });
            }
            await browser.executeScript(function() {
                var url = 'https://mybaseURL.com/member/XX001';
                var headers = new Headers();
                headers.append('Cookie', window.myCookie);
                var request = new Request(url, {
                  method: 'GET',
                  headers: headers,
                  credentials: 'include'
                });
              
                return fetch(request).then(function(response) {
                    return response.text();
                });
              }).then(function(html) {
                console.log(html);
            });
    ```

(Edit)

- HeikoThei223en
- April 21, 2023 at 4:59 pm
- 0 votes
0
Clicking on a link causes the browser to make a top-level navigation request, whereas sending an XMLHttpRequest makes a cross-site request (assuming the target is on a different website). The sending of (session) cookies is very different between these two cases, as explained here.

Even if the session cookies had SameSite=None, they might still be blocked as third-party cookies based on user preferences or browser policies.

What’s more, even if authentication succeeded or the target site did not require it, you cannot expect to read its contents via an XMLHttpRequest from a different origin, unless the page sets the Access-Control-Allow-Origin header, which would be uncommon for HTML pages.

To summarize: What you want is not generally possible. And that’s probably a good thing, because otherwise a malicious web page could employ the technique to get the contents of your online-banking page (to which you might be logged on), including your account balance whenever you visit the malicious page.

Login or Signup to reply.

Please signup or login to give your own answer.

Click here to cancel reply.

Getting page content via XMLHTTPRequest – Javascript

Answers