skip to Main Content

One of our production magento 2.4.5 website was getting frequent down issues and we have checked the case in detail and we could see that the we are getting excessive crawler request from meta-externalagent. See sample log entry below.

57.141.0.9 - - [xx/Dec/2024:12:xx:09 +0530] "GET /our-xxxxxxxxxs/ms-xxx?karat=23&size=6%2C7%2C13%2C16%2C17%2C18%2C24 HTTP/1.1" 200 780515 "-" "meta-externalagent/1.1 (+https://developers.facebook.com/docs/sharing/webmasters/crawler

While checking further further i could see from logs that we have received "64941" requests from "meta-externalagent/1.1" in 12 hours.

I can see lots of people are facing similar issues, But no clear solution is mentioned for magento version 2.4.5.

excessive traffic from facebookexternalhit bot

https://developers.facebook.com/community/threads/992798532416685/

Is there any possible option we can do some rate limiting for the meta crawler ? As we are doing Facebook ads we cannot completely block requests from meta-externalagent.

Currently i i have blocked meta-externalagent using 7G Firewall in nginx.

2

Answers


  1. Chosen as BEST ANSWER

    I have gone through lots of discussion topics and read the below as well. I have tried a combination of solution posted by Ivan Shatsky and below. It is working. I have to monitor the performance for few days.

    https://github.com/kbourdakos/facebook-UA-facebookexternalhit-1.1---RateLimit-Using-nginx/blob/main/configuration


  2. You can rate-limit this bot using nginx built-in rate limiting functionality. According to the "Meta Web Crawlers" Facebook artice, the User-Agent HTTP header for this bot can be either

    meta-externalagent/1.1 (+https://developers.facebook.com/docs/sharing/webmasters/crawler)
    

    or

    meta-externalagent/1.1
    

    However, since the article may not list all possible User-Agent values, you should check your nginx access logs to be sure.

    To limit requests from this bot, you can add the following snippet to your nginx configuration:

    map $http_user_agent $meta_crawler {
        "meta-externalagent/1.1 (+https://developers.facebook.com/docs/sharing/webmasters/crawler)"  1;
        "meta-externalagent/1.1"                                                                     1;
    }
    # Limit meta crawler to 30 requests per minute; adjust according to your needs
    limit_req_zone $meta_crawler zone=meta:10m rate=30r/m;
    
    server {
        # your magento server section
        limit_req zone=meta burst=5 nodelay;
        limit_req_status 429;
        ... # the rest of your nginx configuration
    }
    

    It is recommended to include the Retry-After HTTP header in the HTTP 429 response; you can do it as follows:

    server {
        limit_req zone=meta burst=5 nodelay;
        limit_req_status 429;
        error_page 429 @err429;
        location @err429 {
            add_header Retry-After 30 always;
            return 429;
        }
        ... # the rest of your nginx configuration
    }
    

    Although you can use regular expressions in the map block, e.g.:

    map $http_user_agent $meta_crawler {
        "~^meta-externalagent/"  1;
    }
    

    I do not recommend doing it due to the performance considerations (read this nginx support forum post for the details).

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search