I am trying to write a script that will let me identify bots that are trying to flood my oscommerce site, but some bots change their IPs frequently so I cant really use the IP to ban them.
I was thinking may be if I enable sessions for bots, I could use the session IDs to block the bot.
I did some research on this and found that enabling sessions for bots is not recommended but I am still not sure why?
3
Answers
Here is something I found -
Search bots might also get session ids and might index the same page hundreds of time or more since most bots will not retain their cookie state. This will mean duplicate content indexing and might seriously impact our Search Engine Ranking.
Because a bot, by design, ignores cookie headers, it does not send back the acknowledgement with each subsequent request. In effect, every request of the bot is creating a new session. An aggressive bot on a large site can create hundreds and even thousands of phantom sessions that take up space in memory until they expire
The problem with allowing bots to have a session is that a malicious bot in some cases won’t maintain a cookie state across the pages that it crawls on your site. So each hit on your site by a bot will generate a new session.
Most bots will just ignore session id because they know it’s not really a part of the URL. Otherwise they will have to index pages like index.php?sid=ABC, index.php?sid=BBC, index.php?sid=CBC, etc. Since they know this is the same page, they will ignore the session id’s.
Why don’t you just block the bots based on user_agent? Bots that don’t identify themselves using the user_agent can’t really be blocked other than ip address.