Memory access optimization

Accessing large data stored in memory may be time consuming. As data is general stored in RAM, it requires a transfer from to the processor. This transfer is time consuming on a PC because RAM and processor are physically separated. When you want to program multiple access to data stored in RAM, the question of optimization may be an issue. I’ve read and heard many things about this topics, here is (almost) the truth. This post will focus on reading and writing data in large array. We will not consider memory allocation that is usually done only once. The following benchmarks have been performed on an Intel architecture (octo-core I7) under Ubuntu 14.04, data were stored in memory (no swap on disk).

Continuous versus Random access

The first thing I read is that continuous access to an array is faster than random access. Let’s consider the following codes:

// Continuous access
uint16_t* array=new uint16_t [SIZE];
uint16_t Val;
for (uint64_t i=0;i


// Random reading of data
uint16_t* array=new uint16_t [SIZE];
uint16_t Val;
for (uint64_t i=0;i

The TIC and TOC macros measure the computation time. Here are the results :


The first conclusion is that is is true. We may imagine that the requested byte is copied from memory to the processor. But in fact, this is not what happens. The processor always copy a segment of data, it means that during continuous access, the number of copy is limited because neighbored bytes just need one copy. In random access, a copy must be done at each access. It's not clear about why Writing is faster than Reading. If you have the answer, leave me a message.

New versus Malloc

Memory allocation can be done via new(c++) or malloc(c). The question here is not what the faster for allocating memory, the question are data access equivalent. Are data allocated with new faster compared to data allocated with malloc ? Here is the answer:


Obviously, this is equivalent. This is not really a surprise, in both cases data, are stored in memory. The tiny difference is probably due to other processes that took some resources during benchmark.

Multidimensional array access

Let's take the example of a two-dimensions array dynamically allocated (for example and image). We want to access to pixel at coordinates (x,y). The first solution consists in creating an array of size width*height and accessing to the elements by computing the index y*width+x.

// Test 1 
uint16_t* array=new uint16_t [SIZE];
// ...

Of course, this first solution requires the computation of a product and an addition that may be time consuming. A solution frequently proposed is to create a second array of pointers containing the address of each row. This solution requires another array, but don't need any calculation:

// Test 2
uint16_t* array=new uint16_t [SIZE];
uint16_t** pArray=new uint16_t* [HEIGHT];
for (int i=0;i

If the data don't have to be stored in a continuous way, it is even possible to allocate each row of the main array separately:

// Test 3
uint16_t** pArray=new uint16_t* [HEIGHT];
for (int i=0;i

Here are the results:

As you can see, the first solution is the fastest, event if there is a calculation. On modern computers, calculation requires in fact few cycles while large memory access is still an issue. The second solution requires two memory accesses, one for getting the address of row, and the second for the data. The third solution is still worth because it requires more transfers from memory to the processor since data are scattered in memory.

Data types

Of course, choosing the right type of data may optimize memory, but the last question is : does the type of data is important for fast computing ? I have heard that if data are aligned on 32 or 64 bits it may increase the speed because computers are optimized for this format of data. For example, an a 32 bits processor, accessing an array of char is slower than accessing an array of int. I performed read/write operations on several types of data. Here are the results:


Clearly, the type of data changes the time of access. The larger are the data is, the slower is the access. I guess this is because big data generally requires more copies from memory to processor due to the fact that an array of uint64_t is 8 times larger than the same array of uint8_t. This result must be consider with precaution because it is not necessary true when accessing few bytes.


Leave a Reply

Your email address will not be published. Required fields are marked *