I am working on a CMS and in our latest test for a new multilingual site, I can see that some of the pages in GOOGLE SEARCH CONSOLE, are maked as "Duplicate without user-selected canonical".
Example, the following is maked as "Duplicate without user-selected canonical":
https://www.gotomdz.com/en/place/detail/c5ac334a-6d5e-473b-a46f-70e81c559cf6/cerro-de-la-gloria
Now in my sitemap.xml for this page I have:
<url>
<loc>https://www.gotomdz.com/en/place/detail/c5ac334a-6d5e-473b-a46f-70e81c559cf6/cerro-de-la-gloria</loc>
<lastmod>2023-10-03</lastmod>
<xhtml:link rel="alternate" hreflang="en" href="https://www.gotomdz.com/en/place/detail/c5ac334a-6d5e-473b-a46f-70e81c559cf6/cerro-de-la-gloria"/>
<xhtml:link rel="alternate" hreflang="es" href="https://www.gotomdz.com/es/place/detail/c5ac334a-6d5e-473b-a46f-70e81c559cf6/cerro-de-la-gloria"/>
<xhtml:link rel="alternate" hreflang="pt" href="https://www.gotomdz.com/pt/place/detail/c5ac334a-6d5e-473b-a46f-70e81c559cf6/cerro-de-la-gloria"/>
</url>
As you can see, I am letting google knows that there are multiple language versions of the same page. The content is almost the same, but is fully translated, so my goal is that this pages are all indexed.
Looking at google documentation I can read the following:
There are three ways to indicate multiple language/locale versions of
a page to Google:
- HTML
- HTTP Headers
- Sitemap
The three methods are equivalent from Google’s perspective and you can
choose the method that’s the most convenient for your site. While you
can use all three methods at the same time, there’s no benefit in
Search
So, I think my sitemap.xml is enough, right?
Now, about the "rel=canonical", I dont think this is right in my case, since they are different pages. I do not use "rel=canonical" in any part of the site.
I am afraid that not all the content is indexed.
Now, looking at the indexed content:
- https://gotomdz.com/en/place/detail/c5ac334a-6d5e-473b-a46f-70e81c559cf6/cerro-de-la-gloria
- https://www.gotomdz.com/pt/place/detail/c5ac334a-6d5e-473b-a46f-70e81c559cf6/cerro-de-la-gloria
- https://www.gotomdz.com/es/place/detail/c5ac334a-6d5e-473b-a46f-70e81c559cf6/cerro-de-la-gloria
- https://gotomdz.com/en/places/detail/c5ac334a-6d5e-473b-a46f-70e81c559cf6/cerro-de-la-gloria
The last url, is wrong. That is an old link, I have to see the way to remove this in search console.
Besides this, I can see that ALL pages are indexed.
BUT, as you see
https://gotomdz.com/en/place/detail/c5ac334a-6d5e-473b-a46f-70e81c559cf6/cerro-de-la-gloria
is indexed, but the following is not indexed:
https://www.gotomdz.com/en/place/detail/c5ac334a-6d5e-473b-a46f-70e81c559cf6/cerro-de-la-gloria
Not sure how to handle this issue
In the sitemap.xml I am using the full site url (https://www.gotomdz.com) instead of the domain wihtout www (https://gotomdz.com).
Do I have to also add the "https://gotomdz.com" urls into the sitemap.xml and mark the "https://www.gotomdz.com" with "rel=canonical"? What do you think?
Thank you
2
Answers
Your wrong page needs to return a 404 HTTP status code to tell Google it does not exist. Then goog will report it as out of the index due to it being a 404.
You should 301 redirect all your www. pages to their equivalent without the www. That will stop you from having duplicate pages that are not indexed. They will then be reported as 301ing which is fine.
If you use canonical tags, have the pages canonicalise to themselves.
1. www / non-www duplication
It seems that your entire website is duplicated on both www and "non-www" hosts. You do not want both versions to be indexed, as this would be a waste of time for Google. Instead, you need to pick one, and 301-redirect the other to it using a global redirect rule (see Redirect non-www to www in .htaccess for guidance on how to achieve this on Apache).
Canonical tags are not the solution in this case, because they should only be used in cases where duplicate pages cannot be redirected. This is because:
2. XML sitemap
Regarding your sitemap.xml file, you should only specify the URLs belonging to the version you chose (www or non-www). The other version should never appear anywhere on your website to avoid feeding duplicate URLs to Google.
It is not possible to tag URLs in an XML sitemap with
rel="canonical"
, because by definition an XML sitemap should only list canonical URLs that you want indexed. You should never list duplicate URLs in an XML sitemap.3. Handling old URL patterns
It seems you have updated the URL pattern of the "place" route from
/{language}/places/detail/{uuid}/{place-slug}
to/{language}/place/detail/{uuid}/{place-slug}
, e.g. https://gotomdz.com/en/places/detail/c5ac334a-6d5e-473b-a46f-70e81c559cf6/cerro-de-la-gloria is now https://gotomdz.com/en/place/detail/c5ac334a-6d5e-473b-a46f-70e81c559cf6/cerro-de-la-gloriaThis wouldn’t be a problem if Google hadn’t already crawled and indexed the old URLs, in which case you could just let them end up as 404 errors. But if Google has already indexed your old URLs, you should make sure to set up a 301-redirect rule from the old URL pattern to the new one so Google can update its index and forget about the old URLs (and remove them from Google Search Console error reports).
Make sure you also update all your internal links to point to the new URLs, and never again link to the old ones.
4. Handling 404 errors
Finally, the old URL discussed above is currently displaying a "404 error" message, but the HTTP status code for this page is 200 "OK".
You should fix this by making sure that your web server returns an HTTP 404 status code whenever it displays the "404 error" message. This will ensure that invalid URLs are properly identified and removed from Google’s index.
Hope this helps!