Skip to Main Content (press enter)



I was asked recently about bots, agents and non-human traffic breakdown in web statistics. My reply reminded me how critical it is to have an in-depth understanding of web analytics whilst undertaking search engine optimisation work, so I thought I’d share some of it with you.

Most web site owners want to know the difference between real and automated visitors to their web sites. Using log file analysis tools alone can make this a tricky proposition. Some of the industry organisations who care deeply about the accuracy of web reports – usually because advertising sales depend on them – such as the ABCe, depend on a ‘recognised exclusions’ approach. This is a notoriously tricky area, and ‘recognised exclusions’ are never likely to prove accurate or responsive enough to what is going on in the real world.

Also you have to bear in mind that not everyone wants to exclude all
scrapers from their sites, however perverse that may seem. And not all bad
bots are equally bad. Once identified by User Agent or IP address in some
official list, they will simply mutate and carry on – just like viruses.

Good bots should do as they are told – see WebmasterWorld’s exclusion list for how scary managing this issue is.

Bad bots ignore a robots.txt file and frequently spoof their real identity
so actually identifying them and their usually-hopefully-not-human behaviour
is a matter of diligent log file analysis and patient htaccess modification
and monitoring to exclude what YOU want excluded. Don’t forget they don’t
appear in your JavaScript tagging web stats because they don’t process the

If your site has unique content, that forms part of your commercial
differentiation, you can guarantee it is already splattered over the web,
helping some spammy directory compete with you in the search results. For
most sites the horse has already bolted, sadly.

Unless you have people actually doing the above, in anger, day-to-day, the information will be garbage-in garbage-out because a cursory glance at your WebTrends, WebAbacus or IndexTools stats will not give the percentages needed for the analysis – the bad bots are pretty well hidden.

This as a big issue for popular sites, also, because voracious bots can generate a whole lot of expensive bandwidth usage.

Subjects: , , ,
Filed in Blog, September 14th, 2005. Leave a comment, or trackback from your own site. Follow comments via RSS

Leave a comment