Posts Tagged ‘Web development’

How not to advertise for a PHP programming job

Wednesday, October 28th, 2009

So I got an email today for a job in Tower Hill (thats central london). The job came with a simple programming test to write a script that parsed a tab separated file and produced a batch script as the output. They kindly provided a working copy of their solution on their website so you could validate the output of your code.

If I was going to advertise a job in a company and provide an online example of my own code I’d make darn sure that, unless the sole purpose of my online code was to find someone who knew what an XSS flaw was, that the link to the script I sent a prospective employee to wasnt vunerable to Cross Site Scripting attacks. Eeek. Worse still as their script seemed to accept either GET or POST variables as inputs (they were probably checking $_REQUEST rather than $_POST or $_GET in their code) it was possible to format a link that injected HTML code straight into their website.

Screenshot of the flaw with 'Cheese!!' being injected.

Screenshot of the flaw

You can mitigate the threat from these types of attacks by properly sanitizing your variables before they are displayed. If this is on a HTML page and you are expecting an integer value then intval might be a good function to use, if its a text field you might try htmlentities. If any data is going into a database then you need to be using mysql_escape_string on all of your variables.

As I’ve not alerted the company to the flaw I wont post the URL to the exploit. Luckily the page in question can’t be found within googles’ indexes. I wonder if anyone else will notice…

Reduce load times, speed up your website, increase revenue

Sunday, June 14th, 2009

Page load speed is everything. Tests done by Amazon have shown that an increase in page loading times by 100ms can reduce sales by 1%; when Google added 500ms to its response times traffic dropped 20%. The premise is simple: a faster website means faster feedback to the user which enables a faster user learning curve.

If like me you have a website that is powered by the LLMP (Linux Lighttpd MySQL PHP) stack then there are some simple steps you can take to decrease your page load times. If your running Apache and not Lighttpd then maybe its time to move :) (more…)

Breaking a CAPTCHA – rules for good design

Sunday, June 7th, 2009

A CAPTCHA is a challenge and response test used to verify that the end-user is a human and not a computer – CAPTCHA is an acronym standing for a Completely Automated Public Turing test to tell Computers and Humans Apart.

Captchas seem to have become increasingly popular as a method to prevent the submission of spam and automated responses. You can see a captcha generated by the popular reCAPTCHA service when you post a comment on this blog.

The main problem with the Captcha is that sometimes the people who implement them are lazy or have no knowledge about how create an image that a computer would find hard to decode. Captchas must be generated server side and over the last few months I have seen an increase in the number of client-side captchas generated by software such as Adobe Flex. If you generate a Captcha client side it is not secure.

When designing a Captcha its important to understand what computers find it hard to do.

Its hard for a computer to segment an image. Computers need to segment an image in order to classify each character of text. Anything you can do – such as running letters together – that makes the image harder to segment will make it much harder for a computer to segment your image.

Lets look at the following example to see how easy it is to segment the text from a badly designed captcha.

Image segmentation in 3 steps, (1) Capture the image, (2) Apply thresholding (3) Convolution Matrix

Image segmentation in 3 steps, (1) Acquire the image, (2) Apply thresholding (3) Apply a simple Convolution Matrix

Two very simple algorithms have been applied to the Captcha above. Firstly thresholding and secondly a convolution matrix to remove the vertical and horizontal lines. If we look at the captcha below we can see that some captchas can be segmented just by using thresholding alone.

Even worse - The background can easily be removed by just simple thresholding

Even worse - The background can easily be removed by just simple thresholding

Once the image has been segmented the computer then has to then classify each character. The more options there are for each character the higher the chance of the computer classifying the letter incorrectly – so when your designing your captcha it pays to use the entire alphabet and not restrict yourself to just numbers or letters – It follows that the longer your captcha the more chance there is of the computer making a mistake in classification. Google makes its captchas between 8 and 11 characters in length.

Both of the captchas we have seen so far are easy to segment – they are also easy to classify. This is due to the similarity of the characters within the image (both 4’s look the same in the second captcha) – if we want to make it tricky for the computer to classify each character we need to use different fonts for each character or warp each character by applying rotation or other image morphing operators. The following Captcha from Yahoo! is much harder to segment due to there being little space between each character, the captcha also uses both upper and lower case characters and has been morphed so that the string is harder to classify.

A set of captchas from Yahoo! that are very hard to segment

A set of captchas from Yahoo! that are hard to segment and classify

Sadly as captchas become harder for computers to read they also become harder for humans to read – there is a fine line between providing the necessary security and frustrating a user with a captcha that is impossible to read.

Google has an interesting solution to this using Markov Chains – here random strings that appear to be words are generated using a statistical method known as a Markov Chain. These words are much easier for a human to read because they seem to be a normal word, however they are not words and this is important. If dictionary words were use then a dictionary could be introduced to improve captcha classification rates.

Google captchas use markov chains to make them easier to read

Google captchas use Markov chains to make them easier to read

Its pretty easy to design and write a good captcha using PHP GD or something similar. If you cant be bothered to write a captcha then services such as reCAPTCHA exist which can provide you with an effective captcha solution (although this is vulnerable to the “penis flood attack“)

No captcha will ever be 100% secure, rumor has it that even google’s captcha has been broken with a classification rate of 20%; there are even stories of captcha sweatshops emerging around the world where people are paid  to solve Captchas – a kind of mechanical Turk.

As algorithms become more sophisticated an alternative to captchas need to be found but until these have been found you may as well make sure that your captchas are secure.

POST causing a 404 error in Wordpress

Saturday, May 23rd, 2009

I’ve just struck a problem while implementing my sudoku solver. It appeared that every time I posted a variable called ’s’ to my page Wordpress would return a 404 – page not found error.

You can post to a page and read the variables in PHP using $_POST – which is good – but you can’t name the variable “s” as this seems to conflict with existing Wordpress functionality.

If your having trouble posting to a wordpress page try renaming your variables.