skip to Main Content

I’m trying to detect for either of the following 2 options:

  • A specific list of bots (FacebookExternalHit|LinkedInBot|TwitterBot|Baiduspider)
  • Any bots that don’t support the Crawable Ajax Specification

I’ve seen similar questions (How to recognize Facebook User-Agent) but nothing that explains how to do this in Node and Express.

I need to do this in a format like this:

app.get("*", function(req, res){ 
  if (is one of the bots) //serve snapshot
  if (is not one of the bots) res.sendFile(__dirname + "/public/index.html");
});

3

Answers


  1. What you can do is use the request.headers object to check if the incoming request contains any UA information specific to that bot. A simple example.

    Node

    var http = require('http');
    
    var server = http.createServer(function(req, res){
    
        if(req.headers['user-agent'] === 'facebookexternalhit/1.1') /* do something for the Facebook bot */
    
    
    });
    
    server.listen(8080);
    

    Express

    var http = require('http');
    var express = require('express');
    var app = express();
    
    app.get('/', function(req, res){
    
        if(req.headers['user-agent'] === 'facebookexternalhit/1.1') /* do something for the Facebook bot */
    
    
    });
    
    app.listen(8080);
    
    Login or Signup to reply.
  2. You can check the header User-Agent in the request object and test its value for different bots,

    As of now, Facebook says they have three types of User-Agent header values ( check The Facebook Crawler ), Also twitter has a User-Agent with versions ( check Twitter URL Crawling & Caching ), the below example should cover both bots.

    Node

    var http = require('http');
    var server = http.createServer(function(req, res){
    
        var userAgent = req.headers['user-agent'];
        if (userAgent.startsWith('facebookexternalhit/1.1') ||
           userAgent === 'Facebot' ||
           userAgent.startsWith('Twitterbot') {
    
            /* Do something for the bot */
        }
    });
    
    server.listen(8080);
    

    Express

    var http = require('http');
    var express = require('express');
    var app = express();
    
    app.get('/', function(req, res){
    
        var userAgent = req.headers['user-agent'];
        if (userAgent.startsWith('facebookexternalhit/1.1') ||
           userAgent === 'Facebot' ||
           userAgent.startsWith('Twitterbot') {
    
            /* Do something for the bot */
        }
    });
    
    app.listen(8080);
    
    Login or Signup to reply.
  3. This node express middleware will analyze a bunch of different user agent strings and give you just a “bot==true” or “desktop==true” way to determine. I haven’t used it and the readme sounds like it was just a trial project so I don’t know how maintained it will be going forward, but it will detect all sorts of bots.

    https://github.com/rguerreiro/express-device

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search