From time to time I do like to post a bit more about running sites like STH. As many of our longtime readers (and many first time visitors) may have figured out, we do use WordPress as our CMS. Why? Well, it is simple and when the site was started five years ago it was easy to setup. A crucial part of any WordPress site is comment spam management.
Just about everyone knows that spam or junk messages are a part of Internet life. Fun fact: For those wondering the history of how unsolicited junk messages ended up getting the name of a canned meat product produced by Hormel the answer seems to be a 1970′s era Monty Python skit. This is something I personally did not know even having used the term myself for around two decades.
In terms of WordPress (and forum software) STH has now passed the 1 million spam messages mark. We use a combination of firewall rules as well as Akismet to help filter out most of the WordPress comment spam. It turns out that the vast majority of comments posted are spam. There are a number of techniques that can be used to limit the number of submitted comments but that is an arms race hard to win and fraught with trade-offs. One can use external logins or captchas to try keeping WordPress spam messages away but these often lead to also inhibiting potential legitimate commentators. Comments can be handled by an external service that one must register to but that again inhibits commentators and significantly increased STH load times.
For those wondering, I did some analysis on 2013 data (mind you this is dated now) but the trend was we had a comment to reader participation rate of about 0.07% in 2013. That is a big reason I am reluctant to entertain systems which inhibit participation and instead use filtering software. For larger sites, certainly supporting Akismet is a noble cause.
Of course, WordPress is not uniquely targeted for spam. We also have to deal with spam on the forums both using the old software, vBulletin as well as the new platform XenForo.
When I started STH, I was excited about the few WordPress spam comments that came in. At least some robot, somewhere was able to find the site and surely humans would soon find it too. It is nice to not be completely invisible. The problem was one that for the first few years I could manage myself. The site was small enough where it was practical to read through every captured message. Now that is impossible to do. Just for a quick reference, here is what the spam trend looks like:
One can see, as the site gained in popularity, it also attracted unnecessary attention.
On May 12, 2014 we finally received our 1 millionth spam message. For someone that is just starting out with a blog on WordPress this rate is almost unfathomable. As an example, in March 2014 we averagedone unsolicited WordPress spam comment every 17.4 seconds.
Akismet is awesome so this is just a lot of wasted electricity sending, receiving then filtering out this spam. It does make one ponder the old idea with e-mail solutions like the 2009 Yahoo CentMail concept where one would have to pay to send a message. That type of solution would have raised over $100,000 for charities assuming comment spammers had to pay (and the donations could be collected.)
Alas, this is an area on the list of to do’s for the site. I have some ideas sketched up for potential solutions. Still, many of the “how to make a blog” books will say install Akismet without any mention of the magnitude of the WordPress spam comment problem. It is very interesting that the more popular STH became, spam scaled accordingly. That shows sites are being targeted for their SEO value, not just blindly based on vulnerabilities.
Hopefully this is a good data point for some of our (human) readers. There are precious few sites that provide this kind of data for aspiring WordPress users. Over the next month or so I do plan to write-up a bit more about some of the operational lessons of running STH over the last 5 years.