The Geekess   Linux, bicycling, open source, gardening, amateur rockets, and other seemingly unrelated hobbies.

reCaptcha Comments work!

I now have reCaptcha comment verification working on this blog, which means comments are re-enabled. This is my first time dabbling with python, so the results are not polished. :) Details and patches below.

reCaptcha is a service that provides captchas (the distorted word images you see in the box below) and a way to verify that a user has typed in the correct words. reCaptcha is cool because it helps digitize books.

reCaptcha provides one word that has already been encoded using standard distortion tools. This control word is checked for correctness. If the user gets this word wrong, they are not allowed to submit a comment. The second word is from a book that was being digitized. A computer spell checker (or perhaps a human reader) has flagged the word as being digitized wrong. The word is then passed off to the reCaptcha database to be decoded by people who comment on my blog. When enough people provide the same answer, it is replaced in the digitized book.

reCaptcha is an awesome example of using our collective intelligence for good. It's great for me because I'll stop receiving 400+ spam comments a day.

To enable reCaptcha comments:

  1. I got a public and private reCaptcha API key for my domain.
  2. I installed the python-recaptcha Debian package on my server.
  3. I patched the PyBlosxom comment plugin to call the python-recaptcha library at the correct times.
  4. I edited the HTML skin of my blog (the "flavor") to hold a variable that would be replaced with the HTML for the reCaptcha image.

Number 4 was fairly simple:


diff --git a/flavors/cherry.flav/comment-form.cherry b/flavors/cherry.flav/comment-form.cherry
index eb112b3..a6b50ba 100644
--- a/flavors/cherry.flav/comment-form.cherry
+++ b/flavors/cherry.flav/comment-form.cherry
@@ -19,6 +19,7 @@ URL:<br />
 Comment:<br />
 <textarea cols="50" name="body" rows="12"></textarea><br />
 <br />
+$recaptcha_html
 <input name="Submit" type="submit" value="Submit" />
 </form>
 </p>

Number 3 was the tricky part. Since I hadn't used Python before, I was trying to figure out syntax and what the code was doing at the same time. PyBlosxom has this concept of "callbacks" that are basically hooks into the HTML generation and comment processing. I could have defined a comment hook instead of patching the comment plugin, but it took me about three hours to write this simple patch, and I didn't feel like perfecting it.


diff --git a/plugins/comments.py b/plugins/comments.py
index c7f9dbe..088fb46 100644
--- a/plugins/comments.py
+++ b/plugins/comments.py
@@ -203,6 +203,7 @@ from xml.sax.saxutils import escape
 from Pyblosxom import tools
 from Pyblosxom.entries.base import EntryBase
 from Pyblosxom.renderers import blosxom
+from recaptcha.client import captcha

 LATEST_PICKLE_FILE = 'LATEST.cmt'

@@ -767,12 +768,16 @@ def cb_prepare(args):
     cdict['email'] = form['email'].value

 cdict['ipaddress'] = pyhttp.get('REMOTE_ADDR', '')
+        recaptcha_ipaddr = cdict['ipaddress']
+        if recaptcha_ipaddr.startswith("::ffff:"):
+            recaptcha_ipaddr = recaptcha_ipaddr[7:len(recaptcha_ipaddr)-1]

 # record the comment's timestamp, so we can extract it and send it
 # back alone, without the rest of the page, if the request was ajax.
 data['cmt_time'] = float(cdict['pubDate'])

This code imports the reCaptcha python library and translates an IPv6 address into an IPv4 address. reCaptcha can't deal with IPv6 addresses right now. It took me quite a bit of debugging to figure that out. I made the code print out all the variables passed to the reCaptcha API by appending the HTML to the comment_message variable in code below.


 argdict = { "request": request, "comment": cdict }
+       # sas - XXX - probably want to check with recaptcha here.
 reject = tools.run_callback("comment_reject",
     argdict,
     donefunc=lambda x:x != 0)

The comment is where I noticed a comment_reject callback that might be useful. I would probably use that if I was developing a reCaptcha PyBlosxom plugin.


@@ -781,9 +786,19 @@ def cb_prepare(args):
     reject_code, reject_message = reject
 else:
     reject_code, reject_message = reject, "Comment rejected."
+
+        recaptcha_reject = captcha.submit(
+                form['recaptcha_challenge_field'].value,
+                form['recaptcha_response_field'].value,
+                config['comment_recaptcha_private'],
+                recaptcha_ipaddr)
+
 if reject_code == 1:
     data["comment_message"] = reject_message
     data["rejected"] = True
+        elif recaptcha_reject.is_valid == False:
+            data["comment_message"] = recaptcha_reject.error_code
+            data["rejected"] = True
 else:
     data["comment_message"] = writeComment(request, config, data, \
    cdict, encoding)

This code in cb_prepare() submits the words the user typed through the reCaptcha API.


@@ -1009,6 +1024,7 @@ def cb_story_end(args):
     rejected['cmt_description'] = msg
     rejected['cmt_description_escaped'] = escape(msg)
     renderer.outputTemplate(output, rejected, 'comment')
+        entry['recaptcha_html'] = captcha.displayhtml(config['comment_recaptcha_public'], True)
 renderer.outputTemplate(output, entry, 'comment-form')
 args['template'] = template +u"".join(output)

Finally, the added line in cb_story() is replacing the recaptcha_html variable in the flavor HTML with the HTML for the reCaptcha images and text box.

The full comment plugin patch is here. It builds on my patch to remove the sendmail arguments from the comment plugin, which is here. The HTML flavor patch is here. If you're having problems leaving comments, feel free to send me an email.

Maybe later I'll clean up this patch and submit it for the PyBlosxom plugin registry. Someone else is free to take the code, clean it up, and submit too. :) Open source is wonderful that way.

Tags: | link | 5 comment(s)


Posted by S. Pin at Thu Dec 18 22:43:27 2008

Hi,
How is Python running on your server side? You are running a Twisted server or Django or something else?

Posted by Sarah Sharp at Fri Dec 19 00:53:05 2008

S. Pin: The server-side blog software is called Pyblosxom.  It's older, and a little hard to configure, but I have it mostly setup the way I want it now.

Posted by jobu at Sun Feb 22 08:31:29 2009

Just testing your recaptcha thing.

Posted by Philip Paeps at Fri Mar 20 04:51:19 2009

Thanks for this post, Sarah.  I've just added this to my blog too.  I had to change the patch a little to fit in with my mangled comments.py, but it seems to be working.

Why is Python such a nightmare? :-)

Posted by torrents search at Mon Nov 9 07:27:12 2009

thanks for sharing this! useful and much appreciated


Name:


E-mail:


URL:


Comment: