A Plan for Comment Spam
Everybody fights comment spam sooner or later. I was quite (and almost pleasantly) surprised to note that my new blog started receiving comment SPAM within days since I launched it—and I only put a single link to it so far, which isn't even very visible.
I've been bitten so many times by comment spammers that in this new blog I put together a solution from the start. It's based on a few principles—and so far seems quite effective, though it's too early to drive conclusions.
1. Turing tests suck
I think we all agree that if you want someone to comment on your website, you should make it as easy as possible. Don't require an email address. Don't require them to enter the sum of 2 and 7. Or the name of the president of Romania.
But the worst thing: captchas! I hate it when an ordinary blog asks me to enter the letters that I can read in a mutilated image; sometimes I get them wrong, and get a new comment form losing everything I wrote! that's real crap.
So my motto was: “no Turing tests”, if I want anyone to comment on my site.
That's a cool idea I got from AltBlue. Each time I display the comment form, I generate an unique ID and put it in a hidden field. When the message is submitted, I add that ID in the Comments table in the DB, into a column that has an unique index on it. Therefore, the second time one uses the same ID, the insertion would automatically fail—and I don't even care. It's most probably SPAM.
In order to prevent ID-s generated outside my application, I also save them in the server-side session. So, when I execute a “post comment” request, I fetch the ID that has been passed through the form, and compare it to what was in the session. If they don't match, it's SPAM. If they do match, I add the message to the DB (which will fail if the ID was previously used—SPAM, don't care.).
In order to defeat this simple system, spammers would need to request and parse a new comment page for each crap message they want to put to my website—in order to fetch the ID. Their script would also need to support cookies (so I can maintain the ID in the session). I think most spammers wouldn't take the trouble to do all this, but if they do...
However if they do, I'll just have to seriously consider content filters. I just learned today about CRM114—a great tool that makes statistical text analysis a piece of cake. They even wrote an email spam filter that is (they say) highly effective, and it has about 20 lines of code. Now that's amazing.