Monday, October 16, 2006

quest for a better CAPTCHA

Recently, Geoff got concerned with SPAM and the current alternatives to try to avoid it, ever since he has been working on a simpler approach (for the user) where he provides pictures instead of text, and you have to click on a few of the pictures, depending on the given instructions; while the idea is original, and it does make it easier on the user, I can see a few issues with the approach

  • images have to be downloaded to all your readers (bandwidth)
  • you have to have a considerable amount of pictures to make this usable (and more bandwidth problems associated with that)
  • Each implementation of this would have to have it's own set of pictures and rules associated with them (kinda hardcoded)
  • it's kinda big

This, however, has given me another idea, why don't we combine regular text with Geoff's idea?

we could have something like:

click on even numbers: 123641

click on mathematical symbols: 123+45/6*21

click on uppercase characters 1A654abPK

etc, the font of course can be different (and bigger), the advantages are:

  • very small
  • infinite number of CAPTCHAS can be generated programmatically
  • it's readable
  • taking some precautions it can be very clear and simple what the user needs to do (some times I can't tell if Geoff is smiling or he's angry =o( )

am I missing something, or did I just come up with the greatest idea against SPAM? =o)

if you still like Geoff's idea, I think at least the images (and questions associated with that) could be generated programmatically, e.g. lines in different angles, circles, triangles, etc, that by itself would solve most of the problems I see with Geoff's approach

anyway, time to go to sleep


Geoff Appleby said...

Hey Man,
You've got valid points there, but I can't help resist but at least respond to some of your concerns :)

-image download. I made sure they were small - most are around 5k, max of 12. I can probably drop that more by making them gifs instead of jpgs. Fair call, that one - although, given the amount of data that comes down in a community server page (or most pages, for that matter) it's an extremely small percentage. The img srcs are direct to the images, not auto generated, which means the browser can cache them, too.

- image amounts. I've only made 16. That's not much :)

-hard-coded: not really. It's all stored in the database, and customisable via the admin screens. That's about as far away from hard-coded as you can get, really :)

-big. Yeah, it is. I've still gotta play with that and see what the majority think.

-am i smiling? Cool - i wasn't sure about some of those photos either :) I was just taking shots of myself with my phone last night, i really need to have someone else take them with a decent camera. Let me know which ones suck, and i'll get them fixed/replaced/removed.

Thanks for the feedback though - most appreciated!

Oh, and your idea - pretty cool. If you write it, let me know - i'd love to see it :)

BTW - I'm having trouble right now filling in your captcha! *laughs* is it a lower case 'a' or a 9? :)

BlackTigerX said...

what do you think about progammatically generating the images and questions though?

Geoff Appleby said...

I think it would be an interesting exercise - one of those ideas that you don't know if it will work until you write it and see :) (which is why i wrote gaptcha)

Sounds pretty hard to write though - I wouldn't know where to begin :) I can't think of any reason why you couldn't write it, just that I'm not familiar with System.Drawing yet!

Mark Thomas said...

Here's the problem with text-based CAPTCHA. Text can be parsed. In fact, if someone wanted to write code to defeat your CAPTCHA, they can start with your code, find the list of possible phrases, and get it working without too much trouble. However, image recognition is orders of magnitude harder. If someone built a CAPTCHA that said "Click on three ponies" it is not likely to be defeated.

BlackTigerX said...

what I meant was that the text to be clicked on would be programmatically generated images