skip to Main Content

I’m struggling to find a solution to how we should structure our canonical urls in a marketplace built with nextjs.

We have a sitemap structure for a marketplace where users can filter collections like so:

/collections/{category}/{subCategory}/{type} + query params for additional filtering
/brand/{brand} + query params for additional filtering
/room/{room} + query params for additional filtering
/style/{style} + query params for additional filtering
/location/{location} + query params for additional filtering

The query params can be a combination of any of the other filters so that:

/collections/seating/chairs?brand=pottery-barn 
**has the same content as** 
/brand/pottery-barn?category=seating&subcategory=chairs 

or

/style/mcm?brand=knoll&category=seating 
**has the same content as** 
/brand/knoll&category=seating&style=mcm
**has the same content as** 
/collections/seating?brand=knoll&style=mcm

I’d love to know what the best practices here are. Should the pages still have separate canonicals for example, eventhough they have similar content, or should i consolidate the pages with canonical urls to potentially improve seo ?

2

Answers


  1. It depends.

    But it is typically best to limit the crawling and indexing of facets.

    The main thing you want to do is provide routes to your products that each add to what the product is. e.g. category, sub category, brand.

    Avoid creating too many combinations, as that adds work for Google.

    In particular, make sure you don’t suggest that Google crawls different sort orders, page sizes and other things that just repeat the same information in a different way.

    As it depends, check your Search Console to see if you have indexing issues with those pages.

    Login or Signup to reply.
  2. Pages that have the same content are considered duplicates by Google, which can be harmful as it will waste Googlebot’s time by forcing it to crawl multiple versions of the same content.

    Canonical tags will help Google understand which of these duplicate URLs you prefer to have indexed, but it will not prevent it from wasting time crawling the duplicate pages.

    That is why it is much preferable to avoid exposing the same content across multiple URLs at all costs, and use Canonical tags to mitigate duplication when it is impossible to avoid it in the first place.

    In your case, I understand that these URLs are necessary to allow users to filter Product List Pages, regardless of which part of the catalog they have navigated to, therefore these duplicate URLs are a functional requirement. However, it is possible to successfully prevent Googlebot from discovering and crawling your duplicate URLs by following these recommendations:

    1. Avoid linking to URLs containing query string parameters using HTML <a> tags to prevent Googlebot from discovering these URLs. Only use client-side JavaScript interactions to send the user to these URLs.
    2. Disallow all query string parameters in your /robots.txt file to prevent Googlebot from crawling these URLs, using the following rules:
    Disallow: *?brand=*
    Disallow: *&brand=*
    Disallow: *?category=*
    Disallow: *&category=*
    Disallow: *?location=*
    Disallow: *&location=*
    Disallow: *?room=*
    Disallow: *&room=*
    Disallow: *?style=*
    Disallow: *&style=*
    Disallow: *?subcategory=*
    Disallow: *&subcategory=*
    
    1. Do not include URLs containing query string parameters in your XML sitemap. Only list the following URL patterns for PLPs:
    /collections/{category}
    /collections/{category}/{subCategory}
    /collections/{category}/{subCategory}/{type}
    /brand/{brand}
    /room/{room}
    /style/{style}
    /location/{location}
    
    1. Use Canonical tags as usual: every page should have a Canonical tag pointing to its own path, without query string parameters, i.e.
      • /collections/seating/chairs should canonical to itself, i.e. /collections/seating/chairs
      • /collections/seating/chairs?foo=bar should canonical to /collections/seating/chairs

    ⚠️ It is worth noting that the above recommendations are tailored to the specific case described in the question. In no way is it a general recommendation to block discovery and crawling of URLs containing query string parameters.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search