Waving goodbye to spam bots has made both me and my blog seriously happy! The other week I saw that one of my referral channels was pointing towards Vice. To say I was excited was a slight understatement…Vice had shared my post! That’s kind of cool, right? Wrong. In fact, Vice hadn’t shared my post at all. It was just a spam bot. A spam bot which had cleverly disguised itself as a credible referral channel. It was at this point that I decided enough was enough and I trawled through the internet to figure out how to block spam bots from visiting my blog.
Spam bots might make your blog views look mighty fine but they aren’t honest views and can actually be damaging to your blog’s credibility. I’d rather take a handful of good quality, honest views over thousands of spam bots. To name a few, spam referrals can skew your page views, click through rate and bounce rate. So, it really is best to nip them in the bud and increase your chances of viewing reliable statistics.
Now, I’m not saying I’m a blogging genius. I’m far from it. But these are the steps I followed to reduce the number of spam bots crawling my blog. Fingers crossed it can help a few of you guys out too!
HOW TO SPOT SPAM BOTS
It’s pretty easy to spot when traffic is coming from spam bots. Under the referrals section on Google Analytics, you may notice some of the following referral channels:
I’ve only listed a few of many, many spam bots. Basically if anything stands out as looking a little bit odd, chances are it probably is.
CREATE A NEW VIEW IN GOOGLE ANALYTICS
If you’re reading this, I’m going to go ahead and assume that you’re already tracking your blog stats through a google analytics account!
Before making any changes to your analytics account, I highly recommend that you create a filtered view so that you always have the ability to see the raw, untouched and unfiltered data. To figure out how to create a new view, you can follow these simple steps on Google Analytics. I’ve named my view: Grandiose Days FILTERED VIEW.
APPLY A NEW FILTER
This stage will work through creating new filters to exclude any traffic from spam bots.
Before adding new filters, remember that you’ll want to apply the filter to the new view that you just created. In my case, that would be ‘Grandiose Days FILTERED VIEW’.
In the View column under Admin, select Filters.
On the next page, click ‘+ ADD FILTERS’
The next page will allow you to create a new filters. For this task, we will create 2 filters to block spam referrals.
At this stage, most people will just create a filter that excludes spam bots. However, I did things a little differently. Instead, I created two filters. The first one I created was a Host Name filter to eliminate any ghost spam traffic. Second, I then created the more commonly used exclude filter for crawler spam bots. Ghost spam refers to spam bots that directly affect your analytics, whereas crawler spam actually leaves a mark on your blog. Or at least, that’s my very simplistic view of the two.
First of all, I’ll focus on how to create the hostname filter. However, if you’re only interested in the exclude filter then jump to the end of this post and just read follow the final stages to block crawler spam bots.
APPLY AN INCLUDE FILTER TO BLOCK GHOST SPAM BOTS
This step will be a little bit longer as I talk through how to create a hostname filter. Bear with me, it’ll be worth it!
To create a hostname filter you’ll need:
- A list of your valid hostnames
- To create a hostname regular expression (REGEX)
You can find your hostnames by going to reporting and then in sidebar click Audience > Technology > Network.
Click the blue ‘hostname’ text above the table and this will display a list full of hostnames currently being used for your website. Make note of all of the real hostnames. These will be recognisable as they are likely to be your primary domain or a subdomain – for instance, some of mine were www.grandiosedays.com (primary domain) and then subdomains from my blogspot redirect (e.g. grandiosedays.blogspot.com). Remember to make a list of all of these hostnames!
The invalid (or fake) hostnames would be any other listed host that you do not own or control. As an example of some fake hostnames, sites such as foxnews.com, (not set), bbc.com and usatoday.com were appearing in my hostname list. We can ignore these ones.
Now that you’ve got a list of your valid hostnames, you can begin building your hostname REGEX.
Your hostname REGEX will match your list of valid hostnames. To separate each hostname, use this symbol | (Alt + 124) – otherwise known as the bar character. If you want to include the dot . or hyphen – symbols, add a backslash in front of them so that they are recognised in REGEX.
For your hostname REGEX, you’ll be pleased to know that you do not have to type every single hostname in full. If several of your hostnames include the same key word, simply entering that key word should match to all valid hostnames that it is included in. For example, if I enter ‘grandiosedays’ in my host name, the REGEX will include all hostnames including this phrase. Make sure that your hostname includes all of the valid hostnames you wrote down, or else you’ll lose out on significant data.
My hostname REGEX looks like this:
Now that you have your hostname REGEX (hooray!), you can start adding this information to the new filter you added earlier. So, let’s jump back to that filter.
It should currently look nice and blank, like this:
You will want to name this filter ‘hostname’, change the filter type to Custom and ensure that the ‘Include’ option is selected.
Set the filter field as Hostname and then in the Filter Pattern box type (or copy and paste) the hostname REGEX you created a few moments ago. This page in Google Analytics, should now look something like this:
If you click ‘Verify this filter’ it should show you a sneak peek of your filter in action. Once you are happy with your filter, hit the Save button!
And you’re done! Yay! The hostname filter should block most of the spam traffic that was hitting your site. This method should be more effective than creating an exclude filter as it means you don’t need to update it every time a new spam bot tries to access your blog. However, the odd one or two might still slip through the net. Although this method should protect your blog and your statistics from 99% of pesky spam bots, pair it with an exclude filter (the one I talked about earlier) to increase your protection against spam bots. If you’re not bored by my wittering on…keep reading to learn how to apply an exclude filter.
APPLY AN EXCLUDE FILTER TO BLOCK CRAWLER SPAM BOTS
If you want to create a filter to exclude spam referrals, then you can also create a crawler spam REGEX. This is done in a similar way to the hostname REGEX, except this time you will focus on the source (referral) name.
Make a list of all of the spam referrals that appear in your Referrals Report and create a REGEX based on this list. There are websites out there that keep an updated list of spam sites, so you can always use these to include keywords or additional site names that you are sure are the result of spam bots.
My crawler spam REGEX looks like this (feel free to copy it for your own use and add any others you’ve also spotted):
Then, double-checking that you’re working in your filtered view on google analytics, add another new filter.
Name the filter as crawler spam and set the filter type to Custom.
This time, select the Exclude option and in the drop-down menu select ‘Campaign Source’.
In the filter pattern, paste the crawler spam REGEX you just created. This filter should now look something like this:
Again, once you are happy that you’ve followed all of the steps accurately, hit the save button!
WAVE GOODBYE TO THE SPAM BOTS
And voila! You can wave goodbye to spam bots. Hopefully with these two filters now in place you should no longer see spam referrals for any future analytics. Keep a close watch on your analytics for the next week or so to make sure the filters have worked their magic. If it hasn’t worked, then it’s possible that the filter pattern needs amending.
I am by no means an expert when it comes to Google Analytics so fingers crossed this technique works for you as well as it has worked for me.
Please let me know if it has been of any help at all :)!