C++11 Language Extensions — Concurrency

Concurrency memory model

A memory model is an agreement between the machine architects and the compiler writers to ensure that most programmers do not have to think about the details of modern computer hardware. Without a memory model, few things related to threading, locking, and lock-free programming would make sense. The key guarantee is: Two threads of execution can update and access separate memory locations without interfering with each other. But what is a “memory location?” A memory location is either an object of scalar type or a maximal sequence of adjacent bit-fields all having non-zero width. For example, here S has exactly four separate memory locations:

    struct S {
        char a;         // location #1
        int b:5,        // location #2
        int c:11,
        int :0,         // note: :0 is "special"
        int d:8;        // location #3
        struct {int ee:8;} e;   // location #4

Why is this important? Why isn’t it obvious? Wasn’t this always true? The problem is that when several computations can genuinely run in parallel, that is several (apparently) unrelated instructions can execute at the same time, the quirks of the memory hardware can get exposed. In fact, in the absence of compiler support, issues of instruction and data pipelining and details of cache use will be exposed in ways that are completely unmanageable to the applications programmer. This is true even if no two threads have been defined to share data! Consider, two separately compiled “threads:”

    // thread 1:
    char c;
    c = 1;
    int x = c;

    // thread 2:
    char b;
    b = 1;
    int y = b;

For greater realism, we could have used separate compilation (within each thread) to ensure that the compiler/optimizer wouldn’t be able to eliminate memory accesses and simply ignore c and b and directly initialize x and y with 1. What are the possible values of x and y? According to C++11 the only correct answer is the obvious one: 1 and 1. The reason that’s interesting is that if you take a conventional good pre-concurrency C or C++ compiler, the possible answers are 0 and 0 (unlikely), 1 and 0, 0 and 1, and 1 and 1. This has been observed “in the wild.” How? A linker might allocate c and b right next to each other (in the same word) – nothing in the C or C++ 1990s standards says otherwise. In that, C++98 resembled all languages not designed with real concurrent hardware in mind. However, most modern processors cannot read or write a single character, it must read or write a whole word, so the assignment to c really is “read the word containing c, replace the c part, and write the word back again.” Since the assignment to b is similar, there are plenty of opportunities for the two threads to clobber each other even though the threads do not (according to their source text) share data!

So, C++11 guarantees that no such problems occur for “separate memory locations.” More precisely: A memory location cannot be safely accessed by two threads without some form of locking unless they are both read accesses. Note that different bitfields within a single word are not separate memory locations, so don’t share structs with bitfields among threads without some form of locking. Apart from that caveat, the C++ memory model is simply “as everyone would expect.”

However, it is not always easy to think straight about low-level concurrency issues. Consider:

    // start with x==0 and y==0

    if (x) y = 1;   // Thread 1 

    if (y) x = 1;   // Thread 2 

Is there a problem here? More precisely, is there a data race? (No there isn’t).

Fortunately, we have already adapted to modern times and every current C++ compiler (that we know of) gives the one right answer and have done so for years. They do so for most (but unfortunately not yet for all) tricky questions. After all, C++ has been used for serious systems programming of concurrent systems “forever.” The standard memory model further improves things.

See also:

Dynamic initialization and destruction with concurrency


Thread-local storage

In C++11 you can use the storage class thread_local to define a variable that should be instantiated once per thread.

Note that using thread_local storage requires care, and in particular does not work well with most parallel algorithms.