skip to Main Content

My website has about 500.000 pages. I made sitemap.xml and listed all pages in it (I know about limitation 50.000 links per file, so I have 10 sitemaps). Anyway I submitted sitemaps in webmastertool and everything seems ok (no error and I can see submitted and index links). Hoverer I have a problem with spidering frequently. GoogleBot spiders the same page 4 times per day but in sitemap.xml I tell that the page would be changed yearly.

This is an example

<url>
    <loc>http://www.domain.com/destitution</loc>
    <lastmod>2015-01-01T16:59:23+02:00</lastmod>
    <changefreq>yearly</changefreq>
    <priority>0.1</priority>
</url>

1) So how to tell GoogleBot not to spider so frequently as it overload my server?

2) the website has several pages like http://www.domain.com/destitution1, http://www.domain.com/destitution2 … and I put canonical url to http://www.domain.com/destitution. Might it be the reason of multi spidering ?

2

Answers


  1. You can report this to Google crawling team, see here :

    In general, specific Googlebot crawling-problems like this are best
    handled through Webmaster Tools directly. I’d go through the Site
    Settings for your main domain, Crawl Rate, and then use the “Report a
    problem with Googlebot” form there. The submissions through this form
    go to our Googlebot team, who can work out what (or if anything) needs
    to be changed on our side. They generally won’t be able to reply, and
    won’t be able to process anything other than crawling issues, but they
    sure know Googlebot and can help tweak what it does.

    https://www.seroundtable.com/google-crawl-report-problem-19894.html

    Login or Signup to reply.
  2. The crawling will slow down progressively. Bots are likely revisiting your pages because there are internal links between your pages.

    In general, canonicals tend to reduce crawling rates. But at the beginning, Google bots need crawl both the source and target page. You will see the benefit later.

    Google bots don’t necessarily take lastmod and changefreq information into account. But if they establish content is not modified, they will come back less often. It is a matter of time. Every URL has a scheduler for revisits.

    Bots adapt to the capaccity of the server (see crawling summary I maintain for more details). You can temporarily slow down bots by returning them http error code 500 if that is an issue. They will stop and come back later.

    I don’t believe there is a crawling issue with your site. What you see is normal behavior. When several sitemaps are submitted at once, the crawling rates can be temporarily raised.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search