Detecting low entropy tokens with massive bloom filters in Burp

Published on by

Hashing a known value to create a "secure random token" is a bad idea and hard to detect without access to the source code or running hashcat over every hex string you encounter. By storing bloom filters on a remote http server you can create Burp extension with no local storage requirements to instantly determine if someone is hashing a known or predictable string to create a token.

If you search Github you'll see that some of the most common ways to generate an insecure random token are:

Lets not forget Gravatar IDs either, which are md5 hashes of lowercase email addresses and help destroy the anonymity of the internet.

  • md5(strtolower(trim($email)))

The easiest way to detect these would be to store all the possible hashes of all the integers, dates, times and emails and stick them in a SQL database. This will be large to store and slow to query. However, if you don't care about the value of the hash and simply want to detect instances of the above you can decrease the storage requirements using a probabilistic data structure. The advantage is that this can be queried with a single disk operation.

Storage design

Storing a db of these hashes in RAM is not an option and if its stored on disk any random access will be slow. Exploiting the locality of reference you can assign millions of 1024 byte bloom filters each storing ~283 elements and place these in a single file. In total there will be about 70 million 1024 byte filters.

The specific bloom filter used to store each hash is referenced by casting the first 8 bytes of the hash to a 64 bit unsigned int and using this to reference the associated bloom filter. The advantage of this is that now only a single 1kb read is needed to determine if a hash is contained within the filter. A sub 1kb size means that the file can be stored remotely and each request to the filter will fit inside a single MTU.

We store the following ~26 billion hashed values in the filter:

  • 180,000,000 email addresses from various public data breaches
  • 400,000,000 passwords and strings from public data breaches and pastebin
  • Integers from -231 to 232
  • Dates and times in the common formats with 1 second resolution from January 1970 to December 2038 (Y-m-d, YmdHiS, ymdHis, ymdHis, ymdhis, Ymd His, ymd His, Y-m-d H:i:s)
  • All IPv4 addresses

The resulting file is 78gb and lookups performed against it have a false positive rate of approximately 0.0001 percent.

Populating the filter

The filter is created using a multi-threaded C program, it reads lines from a file, hashes them and adds them to a bloom filter. You can generate your own custom bloom filter - the only caveat is that the whole filter must fit in memory during the creation process. There is nothing to stop the creation of filters for other hashed formats, sha1 etc. Generation is fast and on good cloud instance is limited only by disk read speeds.

The the program to generate bloom filters only takes one argument, the input file - entries are expected to be in plain-text and separated by 0x0a. Each line is hashed before entering it into the filter. Below we generate a test dictionary containing integers from 1 to 10,000 and then populate a filter. Results are written to bloom.dat in the current directory.

$ seq 1 10000 > testme.txt
$ ./bloom -i testme.txt 
Counting items in file 'testme.txt'..
total number of lines = 10000
block size (bytes) = 1024
block size (bits) m = 8192
total number of blocks 35
hash rounds k = 19
max elements per block n = 283
Total memory required = 35840
memory allocated. creating jobs
using 8 threads
Reading from position 0

The important setting to take note of is the total number of blocks. This is needed to run queries against the constructed filter. The filter can be tested locally using the python script - this looks for a predefined list of strings within the filter and returns True or False depending on if the hashed value of the string was found.

$ ./ 
Total blocks = 35
1 : True
2 : True
3 : True
passw0rd : False
test12345 : False

Hosting the bloom filter on Nginx

By storing the resulting file within an Nginx server we can check the presence of a hash by issuing an http range request for the specific filter. Multiple hash values can be queried by issuing multi-range http requests. Once the specific bloom filters are downloaded we can then compute the relevant murmur3 hashes to query the file and determine if they are present in the filter. Only a few changes to the Nginx settings are needed: client_header_buffer_size is increased to allow for more range requests, gzip compression is turned off and all non-essential http response headers are removed (sample Nginx conf). By storing the filter on Nginx we avoid a 70gb download.

The whole process of querying the filter is incredibly quick, depending on your type of disk (disk vs ssd) performance varies with an average ssd able to achieve 6,000qps.

Privacy: As all processing is client side the Nginx server has no way to tell with any certainty which hash you queried. With ~70 million 1kb bloom filters only 26 bits of each hash can be re-constructed by the remote server. If privacy is a concern, host the filter locally.

Burp extension

The whole process easily integrates into Burps passive scanner, all 32 digit hex strings matching ([A-F0-9]{32}|[a-f0-9]{32}) are checked and the results stored locally in an in memory cache. If the value is detected as a hash an issue is raised, only newly encountered hashes are sent for checking and the extension removes duplicates before transmission.

The extension uses a public server hosting the generated file. The only restriction on the public server is a limit of 100 unique hex strings per http request. The extension does not currently support upstream proxy settings - if your connection requires you to proxy to the internet you must host the file locally.

Detecting one of these hashes doesn't necessarily mean your application is vulnerable - it'll depend on the context in which it is used. As a result the issue is only raised as "informational" in a similar way to how Burp detects email addresses and private ips.

Burp extender flow chart

Download the low entropy token detector extension; it uses a publicly accessible pre-computed bloom filter and is bundled with its required dependencies (Apache Commons Httpclient to communicate with the remote server and Google Guava for the MurmurHash3 library). Source code can be compiled with ant. Java 7 or above is required.