I am using Protractor for my testing. I have 100 links on a page, rather than clicking on each link and checking the page content I am trying to get the page content via XMLHTTPRequest. I am passing all the contents in the request header that I see on the browser for that page. However, when I run the below code I am getting the log in page, apparently because I do not have active session. However, my UI tests in the browser continues to work fine so I am not logged out on my UI tests.
My questions are:
- Is there a way for passing current browser session to the XMLHTTPRequest so that I am not redirected to the log in page?
- Is there any other way I can get the page content without having to manually open each of the pages?
Here is my code
var request = new XMLHttpRequest();
let cookies = await browser.manage().getCookies();
let cookie = "";
for(let obj of cookies){
cookie+= Object.keys(obj).map(() => obj.name + "=" + obj.value);
}
cookie.replaceAll(",", ";")
request.open('GET', 'https://mybaseURL.com/member/XX001', false);
request.withCredentials = true;
request.setRequestHeader("sec-ch-ua", ""Chromium";v="112", "Google Chrome";v="112", "Not:A-Brand";v="99"");
request.setRequestHeader("sec-ch-ua-mobile", "?0");
request.setRequestHeader("sec-ch-ua-platform", "Windows");
request.setRequestHeader("Sec-Fetch-Dest", "empty");
request.setRequestHeader("Sec-Fetch-Mode", "cors");
request.setRequestHeader("Sec-Fetch-Site", "same-origin");
request.setRequestHeader("User-Agent", "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/112.0.0.0 Safari/537.36");
request.setRequestHeader("Accept", "application/json");
// request.setRequestHeader("Accept-Encoding", "gzip, deflate, br");
request.setRequestHeader("Accept-Language", "en-US,en;q=0.9");
// request.setRequestHeader("Connection", "keep-alive");
request.setRequestHeader("Content-Type", "application/json");
request.setRequestHeader("Cookie", cookie);
request.send(null);
console.log(request.responseText)
Response:
<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
<html><head>
<title>302 Found</title>
</head><body>
<h1>Found</h1>
<p>The document has moved <a href="https://mybaseurl.com/signon.do?TYPE=123456&REALMOID=06-123456-1234-1234-a702-3a6f0a060000&GUID=&SMAUTHREASON=0&METHOD=GET&SMAGENTNAME=$SM$vO344545Q%245454545543C%2bOm645545en5s3&TARGET=$SM$https%3a%2f%2fmybaseurl%2ecom%2fmember%2fXX001">here</a>.</p>
</body></html>
2
Answers
I found the way to get the DOM of the page without physically opening the page using Fetch. It worked like a charm for me.
Clicking on a link causes the browser to make a top-level navigation request, whereas sending an
XMLHttpRequest
makes a cross-site request (assuming the target is on a different website). The sending of (session) cookies is very different between these two cases, as explained here.Even if the session cookies had
SameSite=None
, they might still be blocked as third-party cookies based on user preferences or browser policies.What’s more, even if authentication succeeded or the target site did not require it, you cannot expect to read its contents via an
XMLHttpRequest
from a different origin, unless the page sets theAccess-Control-Allow-Origin
header, which would be uncommon for HTML pages.To summarize: What you want is not generally possible. And that’s probably a good thing, because otherwise a malicious web page could employ the technique to get the contents of your online-banking page (to which you might be logged on), including your account balance whenever you visit the malicious page.