<?xml version="1.0" encoding="utf-8"?>
<!-- name="generator" content="pyblosxom/1.4.3 01/10/2008" -->
<!DOCTYPE rss PUBLIC "-//Netscape Communications//DTD RSS 0.91//EN" "http://my.netscape.com/publish/formats/rss-0.91.dtd">

<rss version="0.91">
<channel>
<title>The Geekess   </title>
<link>http://sarah.thesharps.us</link>
<description>Linux, bicycling, open source, gardening, amateur rockets, and other seemingly unrelated hobbies.</description>
<language>en</language>
<item>
  <title>reCaptcha Comments work!</title>
  <link>http://sarah.thesharps.us/2008-12-07-13-14.html</link>
  <description><![CDATA[
<p><img
src="http://lh5.ggpht.com/_KnE2M8e3X8Q/STw6walH8nI/AAAAAAAABMA/vSyq6lOidNo/s400/smallCaptchaSpaceWithRoughAlpha.png"
/></a></p>

<p>I now have <a href="http://recaptcha.net/">reCaptcha</a> comment verification
working on this blog, which means comments are re-enabled.  This is my first
time dabbling with python, so the results are not polished. :)  Details and
patches below.</p>

<p></p>

<p>reCaptcha is a service that provides captchas (the distorted word images you see
in the box below) and a way to verify that a user has typed in the correct
words.  reCaptcha is cool because it helps digitize books.</p>

<p>reCaptcha provides one word that has already been encoded using standard
distortion tools.  This control word is checked for correctness.  If the user
gets this word wrong, they are not allowed to submit a comment.  The second word
is from a book that was being digitized.  A computer spell checker (or perhaps a
human reader) has flagged the word as being digitized wrong.  The word is then
passed off to the reCaptcha database to be decoded by people who comment on my
blog.  When enough people provide the same answer, it is replaced in the
digitized book.</p>

<p>reCaptcha is an awesome example of using our collective intelligence for good.
It's great for me because I'll stop receiving 400+ spam comments a day.</p>

<p>To enable reCaptcha comments:</p>

<ol>
<li>I <a href="http://recaptcha.net/whyrecaptcha.html">got a public and
private reCaptcha API key</a> for my domain.</li>
<li>I installed the python-recaptcha Debian package on my server.</li>
<li>I patched the PyBlosxom comment plugin to call the python-recaptcha library
at the correct times.</li>
<li>I edited the HTML skin of my blog (the "flavor") to hold a variable that
would be replaced with the HTML for the reCaptcha image.</li>
</ol>

<p>Number 4 was fairly simple:</p>

<hr />

<pre><code>diff --git a/flavors/cherry.flav/comment-form.cherry b/flavors/cherry.flav/comment-form.cherry
index eb112b3..a6b50ba 100644
--- a/flavors/cherry.flav/comment-form.cherry
+++ b/flavors/cherry.flav/comment-form.cherry
@@ -19,6 +19,7 @@ URL:&lt;br /&gt;
 Comment:&lt;br /&gt;
 &lt;textarea cols="50" name="body" rows="12"&gt;&lt;/textarea&gt;&lt;br /&gt;
 &lt;br /&gt;
+$recaptcha_html
 &lt;input name="Submit" type="submit" value="Submit" /&gt;
 &lt;/form&gt;
 &lt;/p&gt;
</code></pre>

<hr />

<p>Number 3 was the tricky part.  Since I hadn't used Python before, I was trying
to figure out syntax and what the code was doing at the same time.  PyBlosxom
has this concept of "callbacks" that are basically hooks into the HTML
generation and comment processing.  I could have defined a comment hook instead
of patching the comment plugin, but it took me about three hours to write this
simple patch, and I didn't feel like perfecting it.</p>

<hr />

<pre><code>diff --git a/plugins/comments.py b/plugins/comments.py
index c7f9dbe..088fb46 100644
--- a/plugins/comments.py
+++ b/plugins/comments.py
@@ -203,6 +203,7 @@ from xml.sax.saxutils import escape
 from Pyblosxom import tools
 from Pyblosxom.entries.base import EntryBase
 from Pyblosxom.renderers import blosxom
+from recaptcha.client import captcha

 LATEST_PICKLE_FILE = 'LATEST.cmt'

@@ -767,12 +768,16 @@ def cb_prepare(args):
     cdict['email'] = form['email'].value

 cdict['ipaddress'] = pyhttp.get('REMOTE_ADDR', '')
+        recaptcha_ipaddr = cdict['ipaddress']
+        if recaptcha_ipaddr.startswith("::ffff:"):
+            recaptcha_ipaddr = recaptcha_ipaddr[7:len(recaptcha_ipaddr)-1]

 # record the comment's timestamp, so we can extract it and send it
 # back alone, without the rest of the page, if the request was ajax.
 data['cmt_time'] = float(cdict['pubDate'])
</code></pre>

<hr />

<p>This code imports the reCaptcha python library and translates an IPv6 address
into an IPv4 address.  reCaptcha can't deal with IPv6 addresses right now.  It
took me quite a bit of debugging to figure that out.  I made the code print out
all the variables passed to the reCaptcha API by appending the HTML to the
comment_message variable in code below.</p>

<hr />

<pre><code> argdict = { "request": request, "comment": cdict }
+       # sas - XXX - probably want to check with recaptcha here.
 reject = tools.run_callback("comment_reject",
     argdict,
     donefunc=lambda x:x != 0)
</code></pre>

<hr />

<p>The comment is where I noticed a comment_reject callback that might be useful.
I would probably use that if I was developing a reCaptcha PyBlosxom plugin.</p>

<hr />

<pre><code>@@ -781,9 +786,19 @@ def cb_prepare(args):
     reject_code, reject_message = reject
 else:
     reject_code, reject_message = reject, "Comment rejected."
+
+        recaptcha_reject = captcha.submit(
+                form['recaptcha_challenge_field'].value,
+                form['recaptcha_response_field'].value,
+                config['comment_recaptcha_private'],
+                recaptcha_ipaddr)
+
 if reject_code == 1:
     data["comment_message"] = reject_message
     data["rejected"] = True
+        elif recaptcha_reject.is_valid == False:
+            data["comment_message"] = recaptcha_reject.error_code
+            data["rejected"] = True
 else:
     data["comment_message"] = writeComment(request, config, data, \
    cdict, encoding)
</code></pre>

<hr />

<p>This code in cb_prepare() submits the words the user typed through the
reCaptcha API.</p>

<hr />

<pre><code>@@ -1009,6 +1024,7 @@ def cb_story_end(args):
     rejected['cmt_description'] = msg
     rejected['cmt_description_escaped'] = escape(msg)
     renderer.outputTemplate(output, rejected, 'comment')
+        entry['recaptcha_html'] = captcha.displayhtml(config['comment_recaptcha_public'], True)
 renderer.outputTemplate(output, entry, 'comment-form')
 args['template'] = template +u"".join(output)
</code></pre>

<hr />

<p>Finally, the added line in cb_story() is replacing the recaptcha_html
variable in the flavor HTML with the HTML for the reCaptcha images and text box.</p>

<p>The full comment plugin patch is <a
href="http://minilop.net/~sarah/blog-stuff/0001-Add-reCaptcha-spam-filtering-to-comments.py.patch">here</a>.  It builds on my patch to remove the sendmail arguments
from the comment plugin, which is <a
href="http://minilop.net/~sarah/blog-stuff/pyblosxom-Use-sendmail-arguments-with-comments-plugin.patch">here</a>.
The HTML flavor patch is <a
href="http://minilop.net/~sarah/blog-stuff/0002-Recaptcha-change-to-comment-form-template.patch">here</a>.
If you're having problems leaving comments, feel free to send me an email.</p>

<p>Maybe later I'll clean up this patch and submit it for the <a
href="http://pyblosxom.sourceforge.net/registry/">PyBlosxom plugin registry</a>.
Someone else is free to take the code, clean it up, and submit too.  :)  Open
source is wonderful that way.</p>

]]></description>
</item>

</channel>
</rss>

