How to check if a url is valid (actually loads a page with content) efficiently? - Jquery ajax

TheProgrammer
February 18, 2020
126 views
3 votes
4 Answers

QUESTION:

How to check if a url is valid and actually loads a page ?

With my current code, only the status code is checked, which means that a url like http://fsd.com/ will be considered as valid although it does not load anything.

How can I check that the url actually points to a website that can be loaded ?

CODE:

$.ajax({
                    url: link,
                    dataType: 'jsonp', 
                    statusCode: {
                        200: function() {
                            console.log( "status code 200 returned");
                            validURL = true;
                        },
                        404: function() {
                            console.log( "status code 404 returned");
                            validURL = false;
                        }
                    },
                    error:function(){
                        console.log("Error");
                    }
                });

EDIT: By valid, I mean that the page is at last partially loaded (as in at least the html & css are loaded) instead of loading forever or somehow failing without the status code being 404.

EDIT2: http://fsd.com actually returns a 404 now as it should…

EDIT3: Another example: https://dsd.com loads an empty page (status code 200) and http://dsd.com actually loads a page with content (status code 200). On my Node.js backend, the npm package “url-exists” indicates that https://dsd.com is invalid, while my frontend with the code shown in my question indicates it is a valid url. This is what the package code looks like: https://github.com/boblauer/url-exists/blob/master/index.js but I wanted to know what would be the best way according to SO users.

EDIT4:

Sadly, the request provided by Addis is apparently blocked by CORS which blocks the execution of the rest of my code while my original request did not.

$.ajax({
                    type: "HEAD",
                    url: link,
                    dataType: 'jsonp', 
                }).done(function(message,text,response){
                    const size = response.getResponseHeader('Content-Length');
                    const status = response.status;
                    console.log("SIZE: "+size);
                    console.log("STATUS: "+status);
                    if(size > 0 && status == "200") {
                        $("#submitErrorMessage").css("display","none");
                        $('#directoryForm').submit();
                    }
                    else {
                        $("#submitErrorMessage").css("display","block");
                        $("#submitLoading").css("display","none");
                    }
                });

EDIT 5:

To be more precise, both requests trigger a warning message in the browser console indicating that the response has been blocked because of CORS but my original code is actually executed in its entirety while the the other request doesn’t get to the console.log().

EDIT 6:

$.ajax({
                    async: true,
                    url: link,
                    dataType: 'jsonp', 
                    success: function( data, status, jqxhr ){
                        console.log( "Response data received: ", data );
                        console.log("Response data length: ", data.length);
                        console.log("Response status code: ", status);
                        if (status == "200" && data.length > 0) {
                            $("#submitErrorMessage").css("display","none");
                            $('#directoryForm').submit();
                        }
                        else {
                            $("#submitErrorMessage").css("display","block");
                            $("#submitLoading").css("display","none"); 
                        }

                    },
                    error:function(jqXHR, textStatus, errorThrown){
                        console.log("Error: ", errorThrown);
                    }
                });

Error:

Error:  Error: jQuery34108117853955031047_1582059896271 was not called
    at Function.error (jquery.js:2)
    at e.converters.script json (jquery.js:2)
    at jquery.js:2
    at l (jquery.js:2)
    at HTMLScriptElement.i (jquery.js:2)
    at HTMLScriptElement.dispatch (jquery.js:2)
    at HTMLScriptElement.v.handle (jquery.js:2)

Answers

- CarlArmbruster
- February 18, 2020 at 6:49 pm
- 0 votes
0
A successful response without content “should” return a 204: No Content but it doesn’t mean that every developer implements the spec correctly. I guess it really depends on what you consider “valid” to mean for your business case.

Valid = 200 && body has some content?

If so you can the test this in the success callback.
```
$.ajax({
    url: link,
    dataType: 'jsonp',
    success: function (response) {  
        // todo: test the response for "valid"
        // proper length? contains expected content?
    },  
    statusCode: {
        200: function() {
            console.log( "status code 200 returned");
            validURL = true;
        },
        404: function() {
            console.log( "status code 404 returned");
            validURL = false;
        }
    },
    error:function(){
        console.log("Error");
    }
});
```
Login or Signup to reply.

- maverick
- February 18, 2020 at 6:53 pm
- 0 votes
0
I think the word “valid” is used a bit wrongly here. Looking at the code snippet, I can see that you are using HTTP error codes to decide whether the URL is valid or not. However, based on the description, it is clear that you consider the resource (pointed by the URL) to be valid only if it is a web page. I would like to urge the fact that HTTP can be used to access resources which need not have a web page representation.

I think you need to go a bit deeper and retrieve that info (whether it is a web-page representation) from the HTTP response that you receive and just relying on the status code would be misleading for you. One clear indicator would be looking at the response header for content-type: text/html.

Sample response from accessing http://www.google.com:
```
date: Tue, 18 Feb 2020 17:51:12 GMT
expires: -1
cache-control: private, max-age=0
content-type: text/html; charset=UTF-8
strict-transport-security: max-age=31536000
content-encoding: br
server: gws
content-length: 58083
x-xss-protection: 0
```
Login or Signup to reply.

- RobertoMurguia
- February 18, 2020 at 7:00 pm
- 0 votes
0
What you are trying to accomplish is not very specific, I’m not going to give you a code example on how to do this but here are some pointers.

There are different ways you could get a response: the status code is not tied to the response you get, you could have a 200 response and have no data, or have a 500 error with some data, this could be an html page showing the error or a json object, or even a string specifying what went wrong.

when you say “actually loads a page”, I guess you are referring to an html response, you can check for the Content-Type header on your response headers and look for text/html and also check for Content-Length header to check if there is content in your response, and even if you check for those things it’s hard to tell if the html actually displays any content.

It really depends on what are you looking specifically, my suggestion is check the Content-Type header and Content-Length and it also depends on the implementation of the website as every one might have different ways of implementing the HTTP protocol.

Login or Signup to reply.

- Addis
- February 18, 2020 at 7:04 pm
- 0 votes
0
The HEAD request is used to get meta-information contained in the HTTP headers. The good thing is that the response doesn’t contain the body. It’s pretty speedy and there shouldn’t be any heavy processing going on in the server to handle it. This makes it handy for quick status checking.

The HEAD method is identical to GET except that the server MUST NOT
return a message-body in the response. The metainformation
contained in the HTTP headers in response to a HEAD request SHOULD be
identical to the information sent in response to a GET request. This
method can be used for obtaining metainformation about the entity
implied by the request without transferring the entity-body itself.
This method is often used for testing hypertext links for validity,
accessibility, and recent modification.
http://www.w3.org
```
$.ajax({
    type: "HEAD",
    async: true,
    url: link,
    dataType: 'json', 
}).done(function(message,text,response){
    const size = response.getResponseHeader('Content-Length');

    //optionally you may check for the status code to know if the request has been successfully completed
    const status = response.status;
});
```
Content-Length is one of the meta-data available in the head request which gives the size of the body in bytes, so by checking the size only without loading the whole page you could check if some content is available in the response body.
–

EDIT:
The above code is for dataType of json. For dataType of jsonp, callback functions for success and error properties will take of the response like the following:
```
$.ajax({
    url: link,
    dataType: 'jsonp', 
    crossDomain: true,
    data: data,
    success: function( data, status, jqxhr ){
        console.log( "Response data received: ", data );
        console.log("Response data length: ", data.length);
        console.log("Response status code: ", status);
    },
    error:function(jqXHR, textStatus, errorThrown){
        console.log("Error: ", errorThrown);
    }
}
```
Login or Signup to reply.

Please signup or login to give your own answer.

Click here to cancel reply.

How to check if a url is valid (actually loads a page with content) efficiently? – Jquery ajax

Answers