skip to Main Content

I have setup rollbar in my rails application. It keeps reporting recordnotfound which is as a result of SEO scrawlers (i.e Google bot, Baidu, findxbot etc..) searching for deleted post.

How to prevent rollbar from reporting SEO scrawler activities.

3

Answers


  1. Looks like you are using rollbar-gem, so you’d want to use Rollbar::Ignore to tell Rollbar to ignore errors that were caused by a spider

    handler = proc do |options|
      raise Rollbar::Ignore if is_crawler_error(options)
    end
    
    Rollbar.configure do |config|
        config.before_process << handler
    end
    

    where is_crawler_error detects if the request that led to the error was from a crawler.

    If you are using rollbar.js to detect errors in client-side Javascript, then you can use the checkIgnore option to filter out client-side errors caused by bots:

    _rollbarConfig = {
      // current config...
      checkIgnore: function(isUncaught, args, payload) {
         if (window.navigator.userAgent && window.navigator.userAgent.indexOf('Baiduspider') !== -1) {
           // ignore baidu spider
           return true;
         }
         // no other ignores
         return false;
       }
    }
    
    Login or Signup to reply.
  2. Here’s what I did:

    is_crawler_error = Proc.new do |options|
      return true if options[:scope][:request]['From'] == 'bingbot(at)microsoft.com'
      return true if options[:scope][:request]['From'] == 'googlebot(at)googlebot.com'
      return true if options[:scope][:request]['User-Agent'] =~ /Facebot Twitterbot/
    end
    
    handler = proc do |options|
      raise Rollbar::Ignore if is_crawler_error.call(options)
    end
    
    config.before_process << handler
    

    Based on these docs.

    Login or Signup to reply.
  3. TL;DR:

    # ./initializers/rollbar.rb
    #
    # https://stackoverflow.com/questions/36588449/how-to-prevent-rollbar-from-reporting-seo-crawlers-activities
    # 
    # frozen_string_literal: true
    
    crawlers = %w[Facebot Twitterbot YandexBot bingbot AhrefsBot crawler MJ12bot Yahoo GoogleBot Mail.RU_Bot SemrushBot YandexMobileBot DotBot AppleMail SeznamBot Baiduspider]
    regexp = Regexp.new(Regexp.union(*crawlers).source, Regexp::IGNORECASE)
    
    Rollbar.configure do |config|
      ignore_bots = lambda do |options|
        agent = options.fetch(:scope).fetch(:request).call.fetch(:headers)['User-Agent']
        raise Rollbar::Ignore if agent.match?(regexp)
      end
    
      config.before_process << ignore_bots
    
      ...
    end
    

    ======================

    Be careful with magic comment frozen_string_literal and use =~ instead of match? if you have Ruby version less than 2.3.

    Here I use an array that will be transformed into regexp. I did this because I wanted to prevent syntax and escaping related errors of developers in future and add ignorecase thing for same reason.

    So in regexp you will see a Mail.RU_Bot, instead of anything wrong.

    Also in your case you can use simply word bot instead of many crawlers, but be careful with unusual user-agents. In my case, I want to know all crawlers on my site, so I came up with this solution. Yet another example of working part: there are crawler and crawler4j on my production site. I use just crawler in array to prevent notifing for both of them.

    Last thing I want to say — my solution is not very optimal, but it just works. I hope someone will share an optimized version of my code. That’s also the main reason I recommend to send data asynchronously, i.e. use sidekiq, delayed_job or whatever you want, don’t forget to check related wikis.

    My answer is based on @AndrewSouthpaw’s solution (?), that wasn’t working for me. Hoping that approved wiki-copy-pasted @Jesse Gibbs will be moderated some way.

    =======

    EDIT1: it’s nice idea to check the https://github.com/ZLevine/rollbar-ignore-crawler-errors repo if you need to prevent rollbar to notify on js.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search