A CAPTCHA is a challenge and response test used to verify that the end-user is a human and not a computer – CAPTCHA is an acronym standing for a Completely Automated Public Turing test to tell Computers and Humans Apart.
Captchas seem to have become increasingly popular as a method to prevent the submission of spam and automated responses. You can see a captcha generated by the popular reCAPTCHA service when you post a comment on this blog.
The main problem with the Captcha is that sometimes the people who implement them are lazy or have no knowledge about how create an image that a computer would find hard to decode. Captchas must be generated server side and over the last few months I have seen an increase in the number of client-side captchas generated by software such as Adobe Flex. If you generate a Captcha client side it is not secure.
When designing a Captcha its important to understand what computers find it hard to do.
Its hard for a computer to segment an image. Computers need to segment an image in order to classify each character of text. Anything you can do – such as running letters together – that makes the image harder to segment will make it much harder for a computer to segment your image.
Lets look at the following example to see how easy it is to segment the text from a badly designed captcha.

Image segmentation in 3 steps, (1) Acquire the image, (2) Apply thresholding (3) Apply a simple Convolution Matrix
Two very simple algorithms have been applied to the Captcha above. Firstly thresholding and secondly a convolution matrix to remove the vertical and horizontal lines. If we look at the captcha below we can see that some captchas can be segmented just by using thresholding alone.
Even worse - The background can easily be removed by just simple thresholding
Once the image has been segmented the computer then has to then classify each character. The more options there are for each character the higher the chance of the computer classifying the letter incorrectly – so when your designing your captcha it pays to use the entire alphabet and not restrict yourself to just numbers or letters – It follows that the longer your captcha the more chance there is of the computer making a mistake in classification. Google makes its captchas between 8 and 11 characters in length.
Both of the captchas we have seen so far are easy to segment – they are also easy to classify. This is due to the similarity of the characters within the image (both 4’s look the same in the second captcha) – if we want to make it tricky for the computer to classify each character we need to use different fonts for each character or warp each character by applying rotation or other image morphing operators. The following Captcha from Yahoo! is much harder to segment due to there being little space between each character, the captcha also uses both upper and lower case characters and has been morphed so that the string is harder to classify.

A set of captchas from Yahoo! that are hard to segment and classify
Sadly as captchas become harder for computers to read they also become harder for humans to read – there is a fine line between providing the necessary security and frustrating a user with a captcha that is impossible to read.
Google has an interesting solution to this using Markov Chains – here random strings that appear to be words are generated using a statistical method known as a Markov Chain. These words are much easier for a human to read because they seem to be a normal word, however they are not words and this is important. If dictionary words were use then a dictionary could be introduced to improve captcha classification rates.

Google captchas use Markov chains to make them easier to read
Its pretty easy to design and write a good captcha using PHP GD or something similar. If you cant be bothered to write a captcha then services such as reCAPTCHA exist which can provide you with an effective captcha solution (although this is vulnerable to the “penis flood attack“)
No captcha will ever be 100% secure, rumor has it that even google’s captcha has been broken with a classification rate of 20%; there are even stories of captcha sweatshops emerging around the world where people are paid to solve Captchas – a kind of mechanical Turk.
As algorithms become more sophisticated an alternative to captchas need to be found but until these have been found you may as well make sure that your captchas are secure.
Tags: Captcha, PHP GD, Security, Web development
Nice article and well documented. But I can see that you didn’t work with Flex or Actionscript. Indeed it is a client side language as HTML is but is more powerful.
First of all the article you refer to has the purpose to show how you can graphically create the image for CAPTCHA. Securing it afterward is the job of the server side programmer (the server side language can be any other language and I decided not to give a solution in PHP although I know PHP very well) together with the Flex/Actionscript programmer.
Hope this is clear now. I’ll move forward.
The CAPTCHA’s purpose is to keep away the robots (to distinguish the human from a robot). Well if you had been working with flex you should know that there are no cheap application to interact over a flash application. As far as I know there are no robots for flash movies/applications – the only ones I know about are complex expensive testing applications that need to be installed. You cannot make a script to read an image from flash because a SWF file is a compiled files that contain executable code. That is the difference between the HTML and Flash regarding security. The code is not visible to any user as HTML is. In my opinion a sign-up screen built in flex will not need any CAPTCHA at all because the robots won’t know what is in that sign-up screen not even if it is a sign-up screen.
A solution for the Flex/Actionscript CAPTCHA would be this…
Create a web service that will generate random codes for the CAPTCHA. The Flex/Actionscript CAPTCHA, when initialized will send a request over https protocol for a new code, will take it, will apply an algorithm on it, then displays it. The user will enter the visible code which will be sent back on the server (over HTTPS) which will compare with what was send, applying the same algorithm on the source code. The two should match. In order to know from which user the code has come we will use a server session variable when the HTML containing the CAPTCHA application is created. This will contain a code just for identification purposes (linking together the user with the server). This session variable will be made available in the Flex/Actionscript CAPTCHA (flashvars, url, etc). When the request for a new CAPTCHA code is send we will send also this server session code. The service will store both server session identification variable and the code sent in a database table then will send the CAPTCHA code needed. At submission we will send the code the user entered together with the server session variable. The session variable should match the one in the database and the code sent by the user should match the source code (saved also in the database) after applying the same algorithm used on client side.
This is one way to use it. And I think there are other ways too.
If you wonder why I used that CAPTCHA? Because a client really wanted it. Even after he understood that it is no need for a CAPCTCHA, he stressed that he wants it and that the users will feel more secure just seeing it there.
Hope now everything is clear.
For more info read my article on flexer.info.
At the end I want to rephrase this: Don’t use CAPTCHA on flash based applications… there is no practical need for it.
You seem to be relying on security through obscurity. You say that
“You cannot make a script to read an image from flash because a SWF file is a compiled files that contain executable code”
and
“The code is not visible to any user as HTML is”
but I feel that you are missing the point. When you generate a CAPTCHA string server side and send the string to the client to be processed into an image you are effectively providing the end-user with the solution to the CAPTCHA. This makes the CAPTCHA pointless as a rogue application could simply connect to your service and return the string that it was sent. I’m not quite sure why you have suggested using an SSL connection as this would only prevent eavesdropping between the client and server. By suggesting that you implement a CAPTCHA in this way you have just created a Token that would merely protect against Cross Site Request Forgery (XSRF) attacks.
To create a secure CAPTCHA in flex you would have to generate the image server side, not the string. This would have to be passed as a PNG or JPEG with a session variable to the Flex Client. The end user would then have to solve the CAPTCHA and return the solution with the session – this would be then validated server-side to determine if the end user was human.
CAPTCHAs do have a place within Flex applications, for example to challenge a user connecting to your AMF gateway when they get their password wrong multiple times. The key fact is that you MUST generate the CAPTCHA server side and pass it to the end-user as an image – if you don’t pass it this way you may as well just not bother.
If a client came to me and requested that I implement a client-side captcha to make the users “feel more secure” I would point blankly refuse. Giving users a false sense of security is plainly irresponsible – it would be far better to give them a true sense of insecurity and not implement the CAPTCHA at all.