Data exfiltration through the VMware hypervisor

Its possible for two Virtual Machines with no network access or shared file system to communicate as long as they run under the same Hypervisor. This post will show you how this can be achieved by sending a square wave across the VMware CPU scheduler. This technique could be used to exfiltrate information from compromised systems that are firewalled /no longer connected to the network, or to evade a network based IDS.

There are some pre-requisites, you need access to two or more virtual machines on the same Hypervisor which between them have access to at least the same number of virtual CPUs as physical CPU cores exist within the Hypervisor - and for these machines to have unlimited CPU resources (MHz). At this point its worth mentioning that cores provided via Hyper-Threading are not considered separate.

When you oversubscribe a Hypervisor the machines within it end up sharing resources. The result of this is that when a VM runs a CPU intensive task it runs until another VM also requests the same resources, when this happens clock cycles are stolen from one VM and given to the other. The consequence of this is that one virtual machine can monitor how busy the Hypervisor is by observing the shift in number of calculations it can perform in a given time frame. It is by using this technique (a Timing Channel Attack) that two VM's can communicate.

Monitoring the hypervisor

If you plot a graph of how many calculations per second you can perform on all the virtual cores available to you v.s. time you can see your stolen CPU cycles appear as a dip in the graph.

Each bar represents 250ms of CPU cycles, the gap in the middle is CPU time stolen by another VM.

This got me wondering if you can encode data on one VM as a square wave and send it across the Hypervisors' CPU scheduler to another. As it turns out you can, but its a slow process (comparable to time based blind SQL injection) with the bit rate depending on your VMware configuration and the noise other VMs are generating.

Sending the Square Wave

To send a square wave I've got a Python script that increases the CPU load on a given CPU with a simple loop. When theloop runs it represents the dip of the square wave as observed by the receiver, this represents a binary 1. When the loop doesn't run the script sleeps, this represents a binary 0.

def sendBit(bit, cpuID, timeDelay = 1):

   pid     = os.getpid();
   affinity.set_process_affinity_mask(pid, 1);

   if (bit == '1'):
      print "Sending 1";
      start = time.time();
      while((time.time() - start) < timeDelay):
          x = 0 + 1;

   if (bit == '0'):
      print "Sending 0";
      time.sleep(timeDelay);

The maximum "Frequency" of the square wave transmission depends on the amount of "noise" (other VMs on the hypervisor)- noise is generated when these VMs try and steal CPU cycles. The more noise the lower the frequency that can be achieved and therefore the lower the bit rate between the two virtual machines. The ideal situation for an attacker would be if they controlled all the other VMs on the hypervisor - this would allow them to transmit at a higher frequency (more bits per second) as they could control the level of background noise.

From initial tests I've been able to achieve a frequency of between 0.5Hz - 4Hz or about about 0.5bits/sec - 4 bits/sec depending on the Hypervisor load. Not hugely impressive, but don't forget, we're not using networking.

Receiving the square wave

To listen for the square wave I use a python script that runs on all available cored, each thread counts how many calculations it can do in a given time slice, I then use the sum of all these threads to gauge the load of other machines on the hypervisor. When the other VMs are under load the number of calculations per given time slice will drop, so when the sender increases their CPU load this should show up as a dip in the receivers graph. You can then translate the peaks and troughs in the graph into binary 0's and 1's.

The reason we cant just listen on just one virtual CPU is because the Hypervisor may change which physical CPU the virtual CPU is executing on, this can happen randomly. This is also the reason its necessary for the attacker to obtain access to at least the same number of virtual CPUs on the hypervisor as their are hardware CPUs in the system. If one machine doesn't have enough cores available to it you can use multiple machines and have them communicate their performance over the network.

Structure of the transmission

Having a method to send and receive square waves is great, however to reliably send information we need to package it into a form so that the receiver can distinguish it from background noise. I've chosen a simple 32bit packet prefixed with a preamble (although there's nothing stopping you making the packet bigger) - Its pretty simple and looks like this:

Data block

The preamble

When data is about to be sent the sender constructs a "preamble" which is at a slightly higher frequency than that of the data chunk and is unlikely to occur as background noise. The idea of the preamble is that it allows the receiver to detect an incoming message. The receiver then uses the preamble signal to calibrate a threshold for what constitutes a 0 and 1 - this threshold is then used to decode the data chunk which is sent next.

Each bar is 250ms of time - The preamble as observed by the receiver is a 2.5 second stream of "10101" sent using 1 virtual core

The threshold for what is considered a 0 or a 1 in the data chunk is calculated as:

threshold = min(0) - (0.5 * (min(0) - max(1)))

The Data chunk

When the preamble is received the receiver starts detecting the next 32 bits of data, no data is ever sent back to the sender so this could be compared to a satellite TV transmission. Data is then sent at half the rate of the preamble. This is what the preamble with 32 bits of data looks like to the receiver.

The complete datagram - the preamble followed by 32 bits of data. Each bar represents the number of calculations performed by the receiver in a 250ms period.

Once 32 bits of data has been received the receiver waits for the next preamble to arrive which signifies the arrival of more data.

Dealing with noise

As you can probably tell this isn't going to be the most reliable way to send information, noise on other virtual processors could corrupt overall message. The easiest solution is to increase the "transmission power" of the signal - if your host that is sending the signal has access to more than one virtual CPU you can send the signal on multiple cores at the same time. Another way to combat noise is to decrease the frequency - this will make transmission slower but more reliable.

If you want to send lots of data you could build in checksums and even data redundancy using forward error correction. This is not something I have done in the proof of concept below but imagine it shouldn't be too difficult to implement.

Demo / PoC

In case you don't have access to an ESX/ESXi environment I made a quick video:

The PoC Python code used in the video.

Conclusions

Whether it be disk, memory or CPU - If two systems have unrestricted access to the same physical resources it should come as no surprise that they can probably communicate using this medium. The only way to prevent this appears to be to physically limit the amount of resources available to each machine, in the case of the CPU by assigning a separate core to each machine or by governing each machines clock speed.