Less (memory access) is more (speed):
by Johnny's Software Lab
From the article:
When we are trying to speed up a memory-bound loop, there are several different paths. We could try decreasing the dataset size. We could try increasing available instruction level parallelism. We could try modifying the way we access data. Some of these techniques are very advanced. But sometimes we should start with the basics.
One of the ways to improve on memory boundness of a certain piece of code is the old-fashioned way: decrease the total number of memory accesses (loads or stores). Once a piece of data is in the register, using it is very cheap, to the point of being free (due to CPU’s ability to execute up to 4 instructions in a single cycle and their out-of-order nature). So all techniques that try to lower the total number of loads and stores should result in speedups. ...