HomeBlogA Plan for Comment Spam

A Plan for Comment Spam

Everybody fights comment spam sooner or later.  I was quite (and almost pleasantly) surprised to note that my new blog started receiving comment SPAM within days since I launched it—and I only put a single link to it so far, which isn't even very visible.

I've been bitten so many times by comment spammers that in this new blog I put together a solution from the start.  It's based on a few principles—and so far seems quite effective, though it's too early to drive conclusions.

1. Turing tests suck

I think we all agree that if you want someone to comment on your website, you should make it as easy as possible.  Don't require an email address.  Don't require them to enter the sum of 2 and 7.  Or the name of the president of Romania.

Captcha.jpgBut the worst thing: captchas!  I hate it when an ordinary blog asks me to enter the letters that I can read in a mutilated image; sometimes I get them wrong, and get a new comment form losing everything I wrote! that's real crap.

So my motto was: “no Turing tests”, if I want anyone to comment on my site.

2. Unique ID for each comment

That's a cool idea I got from AltBlue.  Each time I display the comment form, I generate an unique ID and put it in a hidden field.  When the message is submitted, I add that ID in the Comments table in the DB, into a column that has an unique index on it.  Therefore, the second time one uses the same ID, the insertion would automatically fail—and I don't even care.  It's most probably SPAM.

In order to prevent ID-s generated outside my application, I also save them in the server-side session.  So, when I execute a “post comment” request, I fetch the ID that has been passed through the form, and compare it to what was in the session.  If they don't match, it's SPAM.  If they do match, I add the message to the DB (which will fail if the ID was previously used—SPAM, don't care.).

In order to defeat this simple system, spammers would need to request and parse a new comment page for each crap message they want to put to my website—in order to fetch the ID.  Their script would also need to support cookies (so I can maintain the ID in the session).  I think most spammers wouldn't take the trouble to do all this, but if they do...

3. JavaScript magic

So if they do, I have another surprise.  Somewhere, deep down in my JS code, I modify the comment form.  The modifications can be diverse—I just add a new hidden field.  Which, obviously, if not present in the submitted form, then the comment will be tagged as SPAM.  It would still go through all the stuff, like checking for a valid ID and insertion to the DB, but it will have the SPAM flag set by default—so it won't show up.  From time to time, I'll manually check the SPAM-marked messages and if I find legitimate comments I'll remove the SPAM flag.  This could happen if the comment was submitted from a browser not supporting JavaScript (or with JS disabled), or if my script failed to work correctly.

Assuming a spammer gets through the ID requirements, would they also add a fully-fledged JavaScript interpreter with DOM support in their spamming scripts?  I guess not...

However if they do, I'll just have to seriously consider content filters.  I just learned today about CRM114—a great tool that makes statistical text analysis a piece of cake.  They even wrote an email spam filter that is (they say) highly effective, and it has about 20 lines of code.  Now that's amazing.

Comments — add your comment

(not published)
    
Notes
  • We don't publish your email address. It's only useful if you wish to receive a notification when someone replies to your comment.

  • Notifications work by thread. That is, you'll be notified even if someone replies to a reply to one of your comments.

  • Each notification includes a "remove me" link that removes your notification option from that comment forever.

  • If you want to reply a certain comment, be sure to click the "reply to this comment" link into it (will automatically setup threads).

Page info
Created:
2007/03/02 20:20
Modified:
2007/03/03 12:14
Author:
Mihai Bazon
Comments:
14 (add yours)
Tags:
spam, this site
See also