skip to Main Content

Let’s say you have a website hosted on example.org. That website has a single page, whose content is static if the client requesting is not logged in, but dynamic (according to the logged client) if the client requesting it is logged in.

To properly handle this in terms of indexing on search engines, we currently thought of creating two separate files, let’s say logged_out.html and logged_in.php. When the URL (e.g. example.org) is requested, the PHP code checks if the current user is logged in, and if so, it requires logged_in.php, otherwise logged_out.html.
The logged_in.php has this in its head:

<meta name="robots" content="noindex,nofollow,noarchive">

As it does not make sense to index websites with dynamic contents in this system.

My question thus basically is how to program the serving / the routing of two pages, accessible under the same URL, with only one of them getting indexed. Such that the SE see only one and completely ignore the other one. Our current solution could be summed up like this:

// HTTP Request incoming to https://example.org/sample, which is routed to 
// present file via server configs 

if ($logged_in) {
  require ("logged_in.php") // should never be indexed / crawled
} else {
  require ("logged_out.html") // should be the indexed / crawled page
}

exit();

This should result in exclusively the contents of logged_out.html being indexed, and that for the URL https://example.org/sample, while the contents of logged_in.php should neither be indexed under any URL of example.org, nor be crawled.

Does our approach yield that intended result?

2

Answers


  1. Search engines are not logged in by default and by all means. So the robots meta tag on the logged_in.php page is not needed.

    Also, they are unaware of pages that are being required by the main index.php they visit. So what will most likely happen, the page index.php will get indexed with content of logged_out.html. Just make sure to hide the real page to prevent duplicate content. Just keep it out of reach for the robots.

    This is your code as I understand it.

    if ($logged_in) {
      require ("logged_in.php")
    } else {
      require ("logged_out.html")
    }
    

    More thoughts:

    In order to truly hide a page from getting indexed you should not expose its URL in links. You should place it in a deny access directory, or out of the web root. It could still be required don’t worry.

    Bonus: you can declare a duplicate page to be canonical of another.

    Login or Signup to reply.
  2. A page gets indexed only if

    • You have an internal link from another page
    • There is an external link to the page
    • It’s in the sitemap.

    One way is to have a parameter such as

    example.com/?logged=true
    

    A page with different parameters can be regarded as a different pages (for the Search Engine). This would work if there will be no internal links to the page with a parameter.


    The other, better way to do it is to have a session variable that you write to when someone is logged in. When the Search Engine looks at it, this data is not there. Your code can set whatever data you need in it and it can retrieve it before presenting the modified page.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search