October 9th, 2009
So I was at the Perl Mongers meeting last night in London, got talking to someone who assured me there was a story behind the terms big endian and little endian. I was somewhat skeptical but they went on to explain that the terms origin comes from the 1726 novel, Gulliver’s Travels by Johnathan Swift and has something to do with an egg.
Anyhow, since then I’ve done a little bit of research and it turns out that the terms do indeed come from Gulliver’s Travels. Basically Lilliput and Blefuscu were to rival groups, at war over the way they ate their soft boiled eggs. The Lilliput said that the best way was to open them at the little end (small endian) while the Blefuscu considered it better to open them at the big end (big endian). This is apparently where the terms originate.
I’d still love to know who first coined the terms and if there are any more weird computer terms that have their origins from bizarre places… as a side note, did you know that Charles Babbage invented the Cowcatcher?
Tags: big endian, bizarre, little endian
Posted in Random | 2 Comments »
October 5th, 2009
Being ultra paranoid about using other peoples Wifi connections I’ve come up with a solution to make things a little safer. Its by no means new having been around for quite a while but it works well. Ive setup Apache on my web server to act as a proxy server for connections originating from 127.0.0.1. I then create a secure tunnel from my local machine using SSL and direct my web browser to connect using my new secure Proxy. This is great for extra security when browsing the internet and checking emails on insecure wifi networks.
If you want to setup your own Proxy you’ll need Apache installed with mod_proxy, mod_proxy_http and mod_proxy_ftp, you’ll also need ssh access to a server thats secure. Once Apache and mod_proxy are installed you need to add the following lines to your Apache config file.
ProxyRequests Off
Listen 127.0.0.1:80
<VirtualHost 127.0.0.1>
ProxyRequests On
ProxyPreserveHost On
LogFormat "%h %l %u %t \"%r\" %>s %b" common
CustomLog /tmp/proxy_log common
</VirtualHost>
The proxy requests off line is very important as you dont want anyone else who cant connect to 127.0.0.1 from using your proxy server.
Once you’ve done that you just need to setup your SSH tunnel
ssh -p 22 user@yourserver.com -N -f -L 127.0.0.1/4444/127.0.0.1/80
This will connect from your computer to the sshd server on port 22, listen on the local port 4444 and connect to your proxy running on port 80 on 127.0.0.1 on your server. Once that has been done just change your Browser Proxy Settings to connect to 127.0.0.1:4444
Your setup will go from looking like this where your data is being sent over an insecure wifi connection

A normal browsing using a WiFi enabled laptop
To this setup where your data is encrypted via a tunnel and passed to a server that is connected to the internet.

Browsing using an SSH tunnel and Proxy server via WiFi
Now your crummy wifi connection is a little bit more secure (for all requests over the proxy at least)…
Tags: apache, mod_proxy, Proxy, ssh, tunnels
Posted in Linux, Security | No Comments »
September 25th, 2009
Got a problem with dodgy users from obscure countries causing havoc on your website? I recently noticed a huge number of people using Google translate to access a website. If you want to prevent people using Google translate on your website you can use.
<meta name=”google” value=”notranslate” />
In your HTML page inside the head section. Users don’t seem to get an error message from Google, it just gives them a blank screen instead of their translated page.
Tags: HTML
Posted in HTML, Security | No Comments »
September 11th, 2009
I recently had an issue with the 404 page not displaying in Wordpress. I Googled heavily and couldnt find the solution to my problem. The issue was causing a blank page to appear (the theme itself appeared but not the text in 404.php) – there were no visible errors. I was pretty certain there was nothing wrong with my theme, mainly due to the fact that there were no PHP errors showing and get_404_template() returned the correct location of my 404.php file.
After reading this post on the wordpress support forums I wondered if it might be a server setting. Looking further into the is_404 function I discovered that was being set by the set_404 function which was being called in wp-includes/classes.php from handle_404. This was where I found my issue. Inside handle_404 you find the following line
<?
if ( (0 == count($wp_query->posts)) && !is_404() && !is_search() && (
$this->did_permalink || (!empty($_SERVER['QUERY_STRING']) && (false ===
strpos($_SERVER['REQUEST_URI'], ‘?’))) ) ) {
?>
Which checks the $_SERVER['QUERY_STRING'] – for some reason this wasn’t being set on my server for reasons that I’m still not 100% clear about. Anyway, removing the $_SERVER['QUERY_STRING'] from the if statement solved the problem for the mean time and my 404 pages now work like a charm. I’m currently still trying to work out whats causing QUERY_STRING to become unset – i’ll let you know when I’ve worked it out.
Tags: Wordpress
Posted in PHP, Wordpress | 1 Comment »
August 22nd, 2009
Markov chains are a set of states where any state is only dependant on the previous state. These can be used to generate “real-looking” words from a given set of text. By the same methods we can decide if a string is a valid word or a load of garbage by assessing each letter and its subsequent letter in word. If the probability of letter N+1 coming after N is very small then we can probably say that the chance of the string being a word is very small.
When users sign up with a fake email address they tend not to put much thought into the name of the email. Something like sdfjsldkf87we@example.com is a good example. To filter these email addresses out we can take a dictionary and calculate the probability of the next letter (N+1) given the previous letter (N) and compare this to what we observe in the fake email address. If the probability of the next letter is repeatedly low then we can say that the email address is probably fake.
My algorithm scores each email, giving it a point each time a letter N+1 should never come after letter N and reducing the score by 1 for every 12 characters in the email address. This additional check helps to reduce the number of false positives. I only check the initial part of the domain – that is the part excluding the @example.com
You’ll probably wonder how the code deals with non alpha-numeric numbers? I just strip them out and convert the whole email to lower-case. There is probably a better method for doing this but my existing system seems to work quite well. The table below shows my algorithm running on a few sample email addresses. I consider an email with a score of 3 or more to be dodgy.
| E-mail |
Score |
| phil.hilton@markov-email.com |
0 |
| bill.gates@microsoft.com |
0 |
| sdfioghsjfkg@gmail.com |
3 |
| tracy93@wow-markov.net |
0 |
| pzrjmt@yahoo.com |
4 |
| gquixdmd@yahoo.com |
3 |
| svcmgr1461@yahoo.com |
3 |
| hjjjh_hjjh@yahoo.com |
7 |
This method isn’t fail-proof but it is pretty good at detecting bad email addresses and you could use it along with additional checks on the users account to detect fraudulent activity. There will be some false positives, mainly with people who use email addresses which heavily rely on their initials and I’m sure its only a matter of time before the people start committing the fraud start using Markov compliant email addresses.
Download my code There are 2 main files, markov.php which contains example code and markovChain.dat which contains a pre-calculated Markov chain.
Tags: Email Addresses, fraud, Markov, spam
Posted in MySQL, PHP, Security | No Comments »