Articles & Books

C++ Benchmark: std::vector vs. std::list -- Baptiste Wicht

[Ed.: Another reason why vector is the default (not only, but preferred) standard container.]

C++ Benchmark: std::vector vs. std::list

by Baptiste Wicht

In C++, two of the most used data structures are std::vector and std::list. In this article, their performance is compared in practice on several different workloads.

It is generally said that a list should be used when random insert and remove will be performed (performed in O(1) versus O(n) for a vector). If we look only at the complexity, search in both data structures should be roughly equivalent, complexity being in O(n). When random insert/replace operations are performed on a vector, all the subsequent data needs to be moved and so each element will be copied. That is why the size of the data type is an important factor when comparing those two data structures.

However, in practice, there is a huge difference, the usage of the memory caches. All the data in a vector is contiguous where the std::list allocates separately memory for each element. How does that change the results in practice ?

This article shows the difference in performance between these two data structures. 

Continue reading...

On vector<bool> -- Howard Hinnant

On vector<bool>

by Howard Hinnant

 

vector<bool> has taken a lot of heat over the past decade, and not without reason. However I believe it is way past time to draw back some of the criticism and explore this area with a dispassionate scrutiny of detail.

There are really two issues here:

  1. Is the data structure of an array of bits a good data structure?
  2. Should the aforementioned data structure be named vector<bool>?

I have strong opinions on both of these questions. And to get this out of the way up front:

  1. Yes.
  2. No.

The array of bits data structure is a wonderful data structure. It is often both a space and speed optimization over the array of bools data structure if properly implemented. However it does not behave exactly as an array of bools, and so should not pretend to be one.

First, what's wrong with vector<bool>?

Because vector<bool> holds bits instead of bools, it can't return a bool& from its indexing operator or iterator dereference. This can play havoc on quite innocent looking generic code. For example:

template <class T>
void
process(T& t)
{
    // do something with t
}

template <class T, class A>
void
test(std::vector<T, A>& v)
{
    for (auto& t : v)
        process(t);
}

The above code works for all T except bool. When instantiated with bool, you will receive a compile time error along the lines of:

error: non-const lvalue reference to type 'std::__bit_reference<std::vector<bool, std::allocator<bool>>, true>' cannot bind to
      a temporary of type 'reference' (aka 'std::__bit_reference<std::vector<bool, std::allocator<bool>>, true>')
    for (auto& t : v)
               ^ ~
note: in instantiation of function template specialization 'test<bool, std::allocator<bool>>' requested here
    test(v);
    ^
vector:2124:14: note: selected 'begin' function
      with iterator type 'iterator' (aka '__bit_iterator<std::vector<bool, std::allocator<bool>>, false>')
    iterator begin()
             ^
1 error generated.

This is not a great error message. But it is about the best the compiler can do. The user is confronted with implementation details of vector and in a nutshell says that the vector is not working with a perfectly valid ranged-based for statement. The conclusion the client comes to here is that the implementation of vector is broken. And he would be at least partially correct.

But consider if instead of vector<bool> being a specialization instead there existed a separate class template std::bit_vector<A = std::allocator<bool>> and the coder had written:

template <class A>
void
test(bit_vector<A>& v)
{
    for (auto& t : v)
        process(t);
}

Now one gets a similar error message:

error: non-const lvalue reference to type 'std::__bit_reference<std::bit_vector<std::allocator<bool>>, true>' cannot bind to
      a temporary of type 'reference' (aka 'std::__bit_reference<std::bit_vector<std::allocator<bool>>, true>')
    for (auto& t : v)
               ^ ~
note: in instantiation of function template specialization 'test<std::allocator<bool>>' requested here
    test(v);
    ^
bit_vector:2124:14: note: selected 'begin' function
      with iterator type 'iterator' (aka '__bit_iterator<std::bit_vector<std::allocator<bool>>, false>')
    iterator begin()
             ^
1 error generated.

And although the error message is similar, the coder is far more likely to see that he is using a dynamic array of bits data structure and it is understandable that you can't form a reference to a bit.

I.e. names are important. And creating a specialization that has different behavior than the primary, when the primary template would have worked, is poor practice.

But what's right with vector<bool>?

For the rest of this article assume that we did indeed have a std::bit_vector<A = std::allocator<bool>> and that vector was not specialized on bool. bit_vector<> can be much more than simply a space optimization over vector<bool>, it can also be a very significant performance optimization. But to achieve this higher performance, your vendor has to adapt many of the std::algorithms to have specialized code (optimizations) when processing sequences defined by bit_vector<>::iterators.

find

For example consider this code:

template <class C>
typename C::iterator
test()
{
    C c(100000);
    c[95000] = true;
    return std::find(c.begin(), c.end(), true);
}

How long does std::find take in the above example for:

  1. A hypothetical non-specialized vector<bool>?
  2. A hypothetical bit_vector<> using an optimized find?
  3. A hypothetical bit_vector<> using the unoptimized generic find?

I'm testing on an Intel Core i5 in 64 bit mode. I am normalizing all answers such that the speed of A is 1 (smaller is faster):

  1. 1.0
  2. 0.013
  3. 1.6

An array of bits can be a very fast data structure for a sequential search! The optimized find is inspecting 64 bits at a time. And due to the space optimization, it is much less likely to cause a cache miss. However if the implementation fails to do this, and naively checks one bit at a time, then this giant 75X optimization turns into a significant pessimization.

count

std::count can be optimized much like std::find to process a word of bits at a time:

template <class C>
typename C::difference_type
test()
{
    C c(100000);
    c[95000] = true;
    return std::count(c.begin(), c.end(), true);
}

My results are:

  1. 1.0
  2. 0.044
  3. 1.02

Here the results are not quite as dramatic as for the std::find case. However any time you can speed up your code by a factor of 20, one should do so!

fill

std::fill is yet another example:

template <class C>
void
test()
{
    C c(100000);
    std::fill(c.begin(), c.end(), true);
}

My results are:

  1. 1.0
  2. 0.40
  3. 38.

The optimized fill is over twice as fast as the non-specialized vector<bool>. But if the vendor neglects to specialize fill for bit-iterators the results are disastrous! Naturally the results are identical for the closely related fill_n.

copy

std::copy is yet another example:

template <class C>
void
test()
{
    C c1(100000);
    C c2(100000);
    std::copy(c1.begin(), c1.end(), c2.begin());
}

My results are:

  1. 1.0
  2. 0.36
  3. 34.

The optimized copy is approaches three times as fast as the non-specialized vector<bool>. But if the vendor neglects to specialize fill for bit-iterators the results are not good. If the copy is not aligned on word boundaries (as in the above example), then the optimized copy slows down to the same speed as the copy for A. Results for copy_backward, move and move_backward are similar.

swap_ranges

std::swap_ranges is yet another example:

template <class C>
void
test()
{
    C c1(100000);
    C c2(100000);
    std::swap_ranges(c1.begin(), c1.end(), c2.begin());
}

My results are:

  1. 1.0
  2. 0.065
  3. 4.0

Here bit_vector<> is 15 times faster than an array of bools, and over 60 times as fast as working a bit at a time.

rotate

std::rotate is yet another example:

template <class C>
void
test()
{
    C c(100000);
    std::rotate(c.begin(), c.begin()+c.size()/4, c.end());
}

My results are:

  1. 1.0
  2. 0.59
  3. 17.9

Yet another example of good results with an optimized algorithm and very poor results without this extra attention.

equal

std::equal is yet another example:

template <class C>
bool
test()
{
    C c1(100000);
    C c2(100000);
    return std::equal(c1.begin(), c1.end(), c2.begin());
}

My results are:

  1. 1.0
  2. 0.016
  3. 3.33

If you're going to compare a bunch of bools, it is much, much faster to pack them into bits and compare a word of bits at a time, rather than compare individual bools, or individual bits!

Summary

The dynamic array of bits is a very good data structure if attention is paid to optimizing algorithms that can process up to a word of bits at a time. In this case it becomes not only a space optimization but a very significant speed optimization. If such attention to detail is not given, then the space optimization leads to a very significant speed pessimization.

But it is a shame that the C++ committee gave this excellent data structure the name vector<bool> and that it gives no guidance nor encouragement on the critical generic algorithms that need to be optimized for this data structure. Consequently, few std::lib implementations go to this trouble.

Rule of Zero -- R. Martinho Fernandes

Long-time C++98 developers know about the Rule of Three. But with C++11's addition of move semantics, we need to be talking instead about the Rule of Five.

This article argues for limiting the use of the Rule of Five, and instead mostly aiming for a "Rule of Zero" -- why you don't have and don't want to write logic for copying, for moving, and for destruction all the time in C++, and why you should encapsulate that logic as much as possible in the remaining situations when you actually need to write it. (And, in passing, he shows once again why C++11 developers seem to be drawn to reusing unique_ptrs with custom deleters like moths to flames bees to flowers. See also recent examples like this one on std-proposals.)

Rule of Zero

In C++ the destructors of objects with automatic storage duration are invoked whenever their scope ends. This property is often used to handle cleanup of resources automatically in a pattern known by the meaningless name RAII.

An essential part of RAII is the concept of resource ownership: the object responsible for cleaning up a resource in its destructor owns that resource. Proper ownership policies are the secret to avoid resource leaks...

Continue reading...

scope(exit) in C++11

A C++ note in passing from an indie game developer. He notes that Boost has a similar feature already (see the BOOST_SCOPE_EXIT macros and discussion of alternatives) but likes that you can write the basic feature yourself in under 10 lines, have it work like a statement instead of a BEGIN/END macro pair, and use it with other C++11 features like lambdas.

scope(exit) in C++11

I really like the scope(exit) statements in D. They allow you to keep the cleanup code next to the initialization code making it much easier to maintain and understand. I wish we had them in C++.

Turns out this can be implemented easily using some of the new C++ features...

Continue reading...

Storage Layout of Polymorphic Objects -- Dan Saks

It's great to see Dan writing about C++ again. Here's his latest today, in Dr. Dobb's:

Storage Layout of Polymorphic Objects

By Dan Saks

Adding at least one virtual function to a class alters the storage layout for all objects of that class type.

Adding at least one virtual function to a class alters the storage layout for all objects of that class type. In this article, I begin to explain how C++ compilers typically implement virtual functions by explaining how the use of virtual functions affects the storage layout for objects.

Continue reading...

(Not) Using std::thread -- Andrzej KrzemieĊ„ski

Andrzej Krzemieński writes:

(Not) using std::thread

This post is about std::thread but not about threads or multi-threading. This is not an introduction to C++ threads. I assume that you are already familiar with Standard Library components thread and async.

I encountered a couple of introductions to C++ threads that gave a code example similar to the one below:

void run_in_parallel(function<void()> f1, function<void()> f2)

{

    thread thr{f1}; // run f1 in a new thread

    f2();           // run f2 in this thread

    thr.join();     // wait until f1 is done

}

While it does give you a feel of what thread’s constructor does and how forking and joining works, I feel it does not stress enough how exception-unsafe this code is, and how unsafe using naked threads in general is. In this post I try to analyze this “safety” issues...

Continue reading...

On the Superfluousness of std::move -- Scott Meyers

Scott answers reader questions about whether std::move is superfluous... couldn't one size of std::forward fit well enough on everyone?

On the Superfluousness of std::move

Scott Meyers

During my presentation of "Universal References in C++11" at C++ and Beyond 2012, I gave the advice to apply std::move to rvalue reference parameters and std::forward to universal reference parameters.  In this post, I'll follow the convention I introduced in that talk of using RRef for "rvalue reference" and URef for "universal reference."

Shortly after I gave the advice mentioned above, an attendee asked what would happen if std::forward were applied to an RRef instead of std::move.  The question took me by surprise.  I was so accustomed to the RRef-implies-std::move and URef-implies-std::forward convention, I had not thought through the implications of other possibilities.  The answer I offered was that I wasn't sure what would happen, but I didn't really care, because even if using std::forward with an RRef would work, it would be unidiomatic and hence potentially confusing to readers of the code.

The question has since been repeated on stackoverflow, and I've also received it from attendees of other recent presentations I've given.  It's apparently one of those obvious questions I simply hadn't considered.  It's time I did.

Continue reading...

Post-Portland mailing available

The post-Portland mailing is now available. It includes meeting minutes, updated issues lists, and the papers adopted at this meeting (among other papers).

Of particular note is the Evolution Working Group paper status, now present again and maintained by Ville Voutilainen. Thanks, Ville!

 

WG21 Number PL22.16 Number Title Author Document Date Mailing Date Previous Version Subgroup Disposition
SD-1 12-0000R3 2012 PL22.16/WG21 document list Clark Nelson 2012-11-03 2012-11      
SD-2 12-0001R2 ISO WG21 and INCITS PL22.16 membership list Clark Nelson 2012-09-12 2012-09      
SD-3   SC22/WG21 (C++) Study Group Organizational Information Herb Sutter 2012-10-04 2012-09      
SD-5   WG21 and PL22.16 (C++) Joint Mailing and Meeting Information Herb Sutter 2010-09-20 2012-09      
N3453 12-0143 Minutes, WG21 Teleconference 2012-10-5 Kyle Kloepper 2012-10-5 2012-11      
N3454 12-0144 Minutes, WG21 Meeting No. 54, 15-19 October 2012 Portland, Oregon, USA Kyle Kloepper 2012-11-3 2012-11      
N3455 12-0145 Minutes, PL22.16 Meeting No. 59, 15-19 October 2012 Portland, Oregon, USA Kyle Kloepper 2012-11-3 2012-11      
N3456 12-0146 Range arguments for container constructors and methods, with wording Jeffrey Yasskin 2012-11-03 2012-11   Library  
N3457 12-0147 Algorithm std::iota and its modifications. Vladimir Grigoriev 2012-10-30 2012-11   Library  
N3458 12-0148 Simple Database Integration in C++11 Thomas Neumann 2012-10-22 2012-11   Library  
N3459 12-0149 Comparison of Two Database Access Methodologies Bill Seymour 2012-10-13 2012-11   Library  
N3462 12-0152 std::result_of and SFINAE E. Niebler, D. Walker, J. de Guzman 2012-10-18 2012-11 N3436= 12-0126 Library Adopted 2012-10
N3463 12-0153 Portable Program Source Files Beman Dawes 2012-11-02 2012-11   Evolution  
N3465 12-0155 Adding heterogeneous comparison lookup to associative containers for TR2 (Rev 2) Joaquín Mª López Muñoz 2012-10-29 2012-11 N2882= 09-0072 Library  
N3466 12-0156 More Perfect Forwarding Mike Spertus 2012-11-03 2012-11   Evolution  
N3467 12-0157 Runtime-sized arrays with automatic storage duration (revision 3) Jens Maurer 2012-10-29 2012-11 N3412= 12-0102 Core  
N3468 12-0158 User-defined Literals for Standard Library Types (version 2) Peter Sommerlad 2012-10-24 2012-11 N3402= 12-0092 Evolution  
N3469 12-0159 Constexpr Library Additions: chrono, v3 B. Kosnik, D. Krügler 2012-10-18 2012-11 N3303= 11-0073 Library Adopted 2012-10
N3470 12-0160 Constexpr Library Additions: containers, v2 B. Kosnik, D. Krügler 2012-10-18 2012-11 N3304= 11-0074 Library Adopted 2012-10
N3471 12-0161 Constexpr Library Additions: utilities, v3 B. Kosnik, D. Krügler 2012-10-18 2012-11 N3305= 11-0075 Library Adopted 2012-10
N3472 12-0162 Binary Literals in the C++ Core Language James Dennett 2012-10-19 2012-11   Core  
N3473 12-0163 C++ Standard Library Active Issues List (Revision R80) Alisdair Meredith 2012-11-03 2012-11 N3438= 12-0128 Library  
N3474 12-0164 C++ Standard Library Defect Report List (Revision R80) Alisdair Meredith 2012-11-03 2012-11 N3439= 12-0129 Library  
N3475 12-0165 C++ Standard Library Closed Issues List (Revision R80) Alisdair Meredith 2012-11-03 2012-11 N3440= 12-0130 Library  
N3477 12-0167 C++ Internet Protocol Classes A. Fabijanic, G. Obiltschnig 2012-10-28 2012-11   Networking  
N3478 12-0168 Core Issue 1512: Pointer comparison vs qualification conversions Jens Maurer 2012-10-29 2012-11   Core  
N3479 12-0169 Priority Queue, Queue and Stack: Changes and Additions G. Powell, T. Blechmann 2012-11-02 2012-11 N3443= 12-0133 Library  
N3480 12-0170 C++ Standard Core Language Active Issues, Revision 81 William M. Miller 2012-11-03 2012-11 N3382= 12-0072 Core  
N3481 12-0171 C++ Standard Core Language Defect Reports and Accepted Issues, Revision 81 William M. Miller 2012-11-03 2012-11 N3383= 12-0073 Core  
N3482 12-0172 C++ Standard Core Language Closed Issues, Revision 81 William M. Miller 2012-11-03 2012-11 N3384= 12-0074 Core  
N3484 12-0174 A URI Library for C++ G. Matthews, D. Berris 2012-11-01 2012-11 N3420= 12-0110 Networking  
N3485 12-0175 Working Draft, Standard for Programming Language C++ Stefanus Du Toit 2012-11-02 2012-11 N3376= 12-0066    
N3486 12-0176 C++ Editor's Report, October 2012 Stefanus Du Toit 2012-11-02 2012-11      
N3487 12-0177 TLS and Parallelism Pablo Halpern 2012-05-08 2012-11   Concurrency  
N3488 12-0178 Evolution Working Group paper status Ville Voutilainen 2012-11-02 2012-11   Evolution  
N3489 12-0179 A Rational Number Library for C++ Bill Seymour 2012-10-13 2012-11 N3414= 12-0104 Numerics  
N3490 12-0180 ADL Control for C++ Dave Abrahams 2012-10-31 2012-11   Evolution  
N3491 12-0181 Minutes: SG4 Networking, October 2012 Alex Fabijanic 2012-11-01 2012-11      
N3492 12-0182 Use Cases for Compile-Time Reflection (Rev. 2) Mike Spertus 2012-11-03 2012-11 N3403= 12-0093 Evolution  

Universal References in C++11 -- Scott Meyers

Universal References in C++11

T&& Doesn’t Always Mean “Rvalue Reference”

by Scott Meyers

 

Related materials:

 

Perhaps the most significant new feature in C++11 is rvalue references; they’re the foundation on which move semantics and perfect forwarding are built. (If you’re unfamiliar with the basics of rvalue references, move semantics, or perfect forwarding, you may wish to read Thomas Becker’s overview before continuing.)

Syntactically, rvalue references are declared like “normal” references (now known as lvalue references), except you use two ampersands instead of one.  This function takes a parameter of type rvalue-reference-to-Widget:

void f(Widget&& param);

Given that rvalue references are declared using “&&”, it seems reasonable to assume that the presence of “&&” in a type declaration indicates an rvalue reference. That is not the case:

Widget&& var1 = someWidget;      // here, “&&” means rvalue reference

auto&& var2 = var1;              // here, “&&” does not mean rvalue reference

template<typename T>
void f(std::vector<T>&& param);  // here, “&&” means rvalue reference

template<typename T>
void f(T&& param);               // here, “&&”does not mean rvalue reference

In this article, I describe the two meanings of “&&” in type declarations, explain how to tell them apart, and introduce new terminology that makes it possible to unambiguously communicate which meaning of “&&” is intended. Distinguishing the different meanings is important, because if you think “rvalue reference” whenever you see “&&” in a type declaration, you’ll misread a lot of C++11 code.

The essence of the issue is that “&&” in a type declaration sometimes means rvalue reference, but sometimes it means either rvalue reference or lvalue reference. As such, some occurrences of “&&” in source code may actually have the meaning of “&”, i.e., have the syntactic appearance of an rvalue reference (“&&”), but the meaning of an lvalue reference (“&”).  References where this is possible are more flexible than either lvalue references or rvalue references. Rvalue references may bind only to rvalues, for example, and lvalue references, in addition to being able to bind to lvalues, may bind to rvalues only under restricted circumstances.[1]  In contrast, references declared with “&&” that may be either lvalue references or rvalue references may bind to anything.  Such unusually flexible references deserve their own name. I call them universal references.

The details of when “&&” indicates a universal reference (i.e., when “&&” in source code might actually mean “&”) are tricky, so I’m going to postpone coverage of the minutiae until later. For now, let’s focus on the following rule of thumb, because that is what you need to remember during day-to-day programming:

If a variable or parameter is declared to have type T&& for some deduced type T, that variable or parameter is a universal reference.

The requirement that type deduction be involved limits the situations where universal references can be found. In practice, almost all universal references are parameters to function templates. Because the type deduction rules for auto-declared variables are essentially the same as for templates, it’s also possible to have auto-declared universal references. These are uncommon in production code, but I show some in this article, because they are less verbose in examples than templates.  In the Nitty Gritty Details section of this article, I explain that it’s also possible for universal references to arise in conjunction with uses of typedef and decltype, but until we get down to the nitty gritty details, I’m going to proceed as if universal references pertained only to function template parameters and auto-declared variables.

The constraint that the form of a universal reference be T&& is more significant than it may appear, but I’ll defer examination of that until a bit later. For now, please simply make a mental note of the requirement.

Like all references, universal references must be initialized, and it is a universal reference’s initializer that determines whether it represents an lvalue reference or an rvalue reference:

  • If the expression initializing a universal reference is an lvalue, the universal reference becomes an lvalue reference.
  • If the expression initializing the universal reference is an rvalue, the universal reference becomes an rvalue reference.

This information is useful only if you are able to distinguish lvalues from rvalues.  A precise definition for these terms is difficult to develop (the C++11 standard generally specifies whether an expression is an lvalue or an rvalue on a case-by-case basis), but in practice, the following suffices:

  • If you can take the address of an expression, the expression is an lvalue.
  • If the type of an expression is an lvalue reference (e.g., T& or const T&, etc.), that expression is an lvalue. 
  • Otherwise, the expression is an rvalue.  Conceptually (and typically also in fact), rvalues correspond to temporary objects, such as those returned from functions or created through implicit type conversions. Most literal values (e.g., 10 and 5.3) are also rvalues.

Consider again the following code from the beginning of this article:

Widget&& var1 = someWidget;
auto&& var2 = var1;

You can take the address of var1, so var1 is an lvalue.  var2’s type declaration of auto&& makes it a universal reference, and because it’s being initialized with var1 (an lvalue), var2 becomes an lvalue reference.  A casual reading of the source code could lead you to believe that var2 was an rvalue reference; the “&&” in its declaration certainly suggests that conclusion.  But because it is a universal reference being initialized with an lvalue, var2 becomes an lvalue reference.  It’s as if var2 were declared like this:

Widget& var2 = var1;

As noted above, if an expression has type lvalue reference, it’s an lvalue.  Consider this example:

std::vector<int> v;
...
auto&& val = v[0];               // val becomes an lvalue reference (see below)

val is a universal reference, and it’s being initialized with v[0], i.e., with the result of a call to std::vector<int>::operator[]. That function returns an lvalue reference to an element of the vector.[2]   Because all lvalue references are lvalues, and because this lvalue is used to initialize val, val becomes an lvalue reference, even though it’s declared with what looks like an rvalue reference.

I remarked that universal references are most common as parameters in template functions.  Consider again this template from the beginning of this article:

template<typename T>
void f(T&& param);               // “&&” might mean rvalue reference

Given this call to f,

f(10);                           // 10 is an rvalue

param is initialized with the literal 10, which, because you can’t take its address, is an rvalue.  That means that in the call to f, the universal reference param is initialized with an rvalue, so param becomes an rvalue reference -- in particular, int&&.

On the other hand, if f is called like this,

int x = 10;
f(x);                            // x is an lvalue

param is initialized with the variable x, which, because you can take its address, is an lvalue.  That means that in this call to f, the universal reference param is initialized with an lvalue, and param therefore becomes an lvalue reference -- int&, to be precise.

The comment next to the declaration of f should now be clear:  whether param’s type is an lvalue reference or an rvalue reference depends on what is passed when f is called.  Sometimes param becomes an lvalue reference, and sometimes it becomes an rvalue reference.  param really is a universal reference.

Remember that “&&” indicates a universal reference only where type deduction takes place.  Where there’s no type deduction, there’s no universal reference.  In such cases, “&&” in type declarations always means rvalue reference.  Hence:

template<typename T>
void f(T&& param);               // deduced parameter type ⇒ type deduction;
                                 // && ≡ universal reference

template<typename T>
class Widget {
    ...
    Widget(Widget&& rhs);        // fully specified parameter type ⇒ no type deduction;
    ...                          // && ≡ rvalue reference
};

template<typename T1>
class Gadget {
    ...
    template<typename T2>
    Gadget(T2&& rhs);            // deduced parameter type ⇒ type deduction;
    ...                          // && ≡ universal reference
};

void f(Widget&& param);          // fully specified parameter type ⇒ no type deduction;
                                 // && ≡ rvalue reference

There’s nothing surprising about these examples.  In each case, if you see T&& (where T is a template parameter), there’s type deduction, so you’re looking at a universal reference.  And if you see “&&” after a particular type name (e.g., Widget&&), you’re looking at an rvalue reference.

I stated that the form of the reference declaration must be “T&&” in order for the reference to be universal. That’s an important caveat.  Look again at this declaration from the beginning of this article:

template<typename T>
void f(std::vector<T>&& param);     // “&&” means rvalue reference

Here, we have both type deduction and a “&&”-declared function parameter, but the form of the parameter declaration is not “T&&”, it’s “std::vector<t>&&”.  As a result, the parameter is a normal rvalue reference, not a universal reference.  Universal references can only occur in the form “T&&”!  Even the simple addition of a const qualifier is enough to disable the interpretation of “&&” as a universal reference:

template<typename T>
void f(const T&& param);               // “&&” means rvalue reference

Now, “T&&” is simply the required form for a universal reference.  It doesn’t mean you have to use the name T for your template parameter:

template<typename MyTemplateParamType>
void f(MyTemplateParamType&& param);  // “&&” means universal reference

Sometimes you can see T&& in a function template declaration where T is a template parameter, yet there’s still no type deduction.  Consider this push_back function in std::vector:[3]

template <class T, class Allocator = allocator<T> >
class vector {
public:
    ...
    void push_back(T&& x);       // fully specified parameter type ⇒ no type deduction;
    ...                          // && ≡ rvalue reference
};

Here, T is a template parameter, and push_back takes a T&&, yet the parameter is not a universal reference!  How can that be?

The answer becomes apparent if we look at how push_back would be declared outside the class. I’m going to pretend that std::vector’s Allocator parameter doesn’t exist, because it’s irrelevant to the discussion, and it just clutters up the code.  With that in mind, here’s the declaration for this version of std::vector::push_back:

template <class T>
void vector<T>::push_back(T&& x);

push_back can’t exist without the class std::vector<T> that contains it.  But if we have a class std::vector<T>, we already know what T is, so there’s no need to deduce it.

An example will help.  If I write

Widget makeWidget();             // factory function for Widget
std::vector<Widget> vw;
...
Widget w;
vw.push_back(makeWidget());      // create Widget from factory, add it to vw

my use of push_back will cause the compiler to instantiate that function for the class std::vector<Widget>. The declaration for that push_back looks like this:

void std::vector<Widget>::push_back(Widget&& x);

See?  Once we know that the class is std::vector<Widget>, the type of push_back’s parameter is fully determined:  it’s Widget&&.  There’s no role here for type deduction.

Contrast that with std::vector’s emplace_back, which is declared like this:

template <class T, class Allocator = allocator<T> >
class vector {
public:
    ...
    template <class... Args>
    void emplace_back(Args&&... args); // deduced parameter types ⇒ type deduction;
    ...                                // && ≡ universal references
};

Don’t let the fact that emplace_back takes a variable number of arguments (as indicated by the ellipses in the declarations for Args and args) distract you from the fact that a type for each of those arguments must be deduced.  The function template parameter Args is independent of the class template parameter T, so even if we know that the class is, say, std::vector<Widget>, that doesn’t tell us the type(s) taken by emplace_back.  The out-of-class declaration for emplace_back for std::vector<Widget> makes that clear (I’m continuing to ignore the existence of the Allocator parameter):

template<class... Args>
void std::vector<Widget>::emplace_back(Args&&... args);

Clearly, knowing that the class is std::vector<Widget> doesn’t eliminate the need for the compiler to deduce the type(s) passed to emplace_back.  As a result, std::vector::emplace_back’s parameters are universal references, unlike the parameter to the version of std::vector::push_back we examined, which is an rvalue reference.

A final point is worth bearing in mind: the lvalueness or rvalueness of an expression is independent of its type. Consider the type int.  There are lvalues of type int (e.g., variables declared to be ints), and there are rvalues of type int (e.g., literals like 10).  It’s the same for user-defined types like Widget. A Widget object can be an lvalue (e.g., a Widget variable) or an rvalue (e.g., an object returned from a Widget-creating factory function). The type of an expression does not tell you whether it is an lvalue or an rvalue.
Because the lvalueness or rvalueness of an expression is independent of its type, it’s possible to have lvalues whose type is rvalue reference, and it’s also possible to have rvalues of the type rvalue reference:

Widget makeWidget();                       // factory function for Widget

Widget&& var1 = makeWidget()               // var1 is an lvalue, but
                                           // its type is rvalue reference (to Widget)

Widget var2 = static_cast<Widget&&>(var1); // the cast expression yields an rvalue, but
                                           // its type is rvalue reference  (to Widget)

The conventional way to turn lvalues (such as var1) into rvalues is to use std::move on them, so var2 could be defined like this:

Widget var2 = std::move(var1);             // equivalent to above

I initially showed the code with static_cast only to make explicit that the type of the expression was an rvalue reference (Widget&&).

Named variables and parameters of rvalue reference type are lvalues. (You can take their addresses.) Consider again the Widget and Gadget templates from earlier:

template<typename T>
class Widget {
    ...
    Widget(Widget&& rhs);        // rhs’s type is rvalue reference,
    ...                          // but rhs itself is an lvalue
};

template<typename T1>
class Gadget {
    ...
    template <typename T2>
    Gadget(T2&& rhs);            // rhs is a universal reference whose type will
    ...                          // eventually become an rvalue reference or
};                               // an lvalue reference, but rhs itself is an lvalue

In Widget’s constructor, rhs is an rvalue reference, so we know it’s bound to an rvalue (i.e., an rvalue was passed to it), but rhs itself is an lvalue, so we have to convert it back to an rvalue if we want to take advantage of the rvalueness of what it’s bound to.  Our motivation for this is generally to use it as the source of a move operation, and that’s why the way to convert an lvalue to an rvalue is to use std::move.  Similarly, rhs in Gadget’s constructor is a universal reference, so it might be bound to an lvalue or to an rvalue, but regardless of what it’s bound to, rhs itself is an lvalue.  If it’s bound to an rvalue and we want to take advantage of the rvalueness of what it’s bound to, we have to convert rhs back into an rvalue. If it’s bound to an lvalue, of course, we don’t want to treat it like an rvalue.  This ambiguity regarding the lvalueness and rvalueness of what a universal reference is bound to is the motivation for std::forward:  to take a universal reference lvalue and convert it into an rvalue only if the expression it’s bound to is an rvalue.  The name of the function (“forward”) is an acknowledgment that our desire to perform such a conversion is virtually always to preserve the calling argument’s lvalueness or rvalueness when passing -- forwarding -- it to another function.

But std::move and std::forward are not the focus of this article.  The fact that “&&” in type declarations may or may not declare an rvalue reference is.  To avoid diluting that focus, I’ll refer you to the references in the Further Information section for information on std::move and std::forward.

Nitty Gritty Details

The true core of the issue is that some constructs in C++11 give rise to references to references, and references to references are not permitted in C++. If source code explicitly contains a reference to a reference, the code is invalid:

Widget w1;
...
Widget& & w2 = w1;               // error! No such thing as “reference to reference”

There are cases, however, where references to references arise as a result of type manipulations that take place during compilation, and in such cases, rejecting the code would be problematic. We know this from experience with the initial standard for C++, i.e., C++98/C++03.

During type deduction for a template parameter that is a universal reference, lvalues and rvalues of the same type are deduced to have slightly different types.  In particular, lvalues of type T are deduced to be of type T& (i.e., lvalue reference to T), while rvalues of type T are deduced to be simply of type T. (Note that while lvalues are deduced to be lvalue references, rvalues are not deduced to be rvalue references!) Consider what happens when a template function taking a universal reference is invoked with an rvalue and with an lvalue:

template<typename T>
void f(T&& param);

...

int x;

...

f(10);                           // invoke f on rvalue
f(x);                            // invoke f on lvalue

In the call to f with the rvalue 10, T is deduced to be int, and the instantiated f looks like this:

void f(int&& param);             // f instantiated from rvalue

That’s fine. In the call to f with the lvalue x, however, T is deduced to be int&, and f’s instantiation contains a reference to a reference:

void f(int& && param);           // initial instantiation of f with lvalue

Because of the reference-to-reference, this instantiated code is prima facie invalid, but the source code-- “f(x)” -- is completely reasonable.  To avoid rejecting it, C++11 performs “reference collapsing” when references to references arise in contexts such as template instantiation.

Because there are two kinds of references (lvalue references and rvalue references), there are four possible reference-reference combinations: lvalue reference to lvalue reference, lvalue reference to rvalue reference, rvalue reference to lvalue reference, and rvalue reference to rvalue reference.  There are only two reference-collapsing rules:

  • An rvalue reference to an rvalue reference becomes (“collapses into”) an rvalue reference.
  • All other references to references (i.e., all combinations involving an lvalue reference) collapse into an lvalue reference.

Applying these rules to the instantiation of f on an lvalue yields the following valid code, which is how the compiler treats the call:

void f(int& param);              // instantiation of f with lvalue after reference collapsing

This demonstrates the precise mechanism by which a universal reference can, after type deduction and reference collapsing, become an lvalue reference. The truth is that a universal reference is really just an rvalue reference in a reference-collapsing context.

Things get subtler when deducing the type for a variable that is itself a reference. In that case, the reference part of the type is ignored.  For example, given

int x;

...

int&& r1 = 10;                   // r1’s type is int&&

int& r2 = x;                     // r2’s type is int&

the type for both r1 and r2 is considered to be int in a call to the template f.  This reference-stripping behavior is independent of the rule that, during type deduction for universal references, lvalues are deduced to be of type T& and rvalues of type T, so given these calls,

f(r1);

f(r2);

the deduced type for both r1 and r2 is int&. Why? First the reference parts of r1’s and r2’s types are stripped off (yielding int in both cases), then, because each is an lvalue, each is treated as int& during type deduction for the universal reference parameter in the call to f.

Reference collapsing occurs, as I’ve noted, in “contexts such as template instantiation.” A second such context is the definition of auto variables.  Type deduction for auto variables that are universal references is essentially identical to type deduction for function template parameters that are universal references, so lvalues of type T are deduced to have type T&, and rvalues of type T are deduced to have type T.  Consider again this example from the beginning of this article:

Widget&& var1 = someWidget;      // var1 is of type Widget&& (no use of auto here)

auto&& var2 = var1;              // var2 is of type Widget& (see below)

var1 is of type Widget&&, but its reference-ness is ignored during type deduction in the initialization of var2; it’s considered to be of type Widget.  Because it’s an lvalue being used to initialize a universal reference (var2), its deduced type is Widget&.  Substituting Widget& for auto in the definition for var2 yields the following invalid code,

Widget& && var2 = var1;          // note reference-to-reference

which, after reference collapsing, becomes

Widget& var2 = var1;             // var2 is of type Widget&

A third reference-collapsing context is typedef formation and use. Given this class template,

template<typename T>
class Widget {
    typedef T& LvalueRefType;
    ...
};

and this use of the template,

Widget<int&> w;

the instantiated class would contain this (invalid) typedef:

typedef int& & LvalueRefType;

Reference-collapsing reduces it to this legitimate code:

typedef int& LvalueRefType;

If we then use this typedef in a context where references are applied to it, e.g.,

void f(Widget<int&>::LvalueRefType&& param);

the following invalid code is produced after expansion of the typedef,

void f(int& && param);

but reference-collapsing kicks in, so f’s ultimate declaration is this:

void f(int& param);

The final context in which reference-collapsing takes place is the use of decltype. As is the case with templates and auto, decltype performs type deduction on expressions that yield types that are either T or T&, and decltype then applies C++11’s reference-collapsing rules.  Alas, the type-deduction rules employed by decltype are not the same as those used during template or auto type deduction.  The details are too arcane for coverage here (the Further Information section provides pointers to, er, further information), but a noteworthy difference is that decltype, given a named variable of non-reference type, deduces the type T (i.e., a non-reference type), while under the same conditions, templates and auto deduce the type T&.  Another important difference is that decltype’s type deduction depends only on the decltype expression; the type of the initializing expression (if any) is ignored. Ergo:

Widget w1, w2;

auto&& v1 = w1;                  // v1 is an auto-based universal reference being
                                 // initialized with an lvalue, so v1 becomes an
                                 // lvalue reference referring to w1.

decltype(w1)&& v2 = w2;          // v2 is a decltype-based universal reference, and
                                 // decltype(w1) is Widget, so v2 becomes an rvalue reference.
                                 // w2 is an lvalue, and it’s not legal to initialize an
                                 // rvalue reference with an lvalue, so
                                 // this code does not compile.

Summary

In a type declaration, “&&” indicates either an rvalue reference or a universal reference -- a reference that may resolve to either an lvalue reference or an rvalue reference. Universal references always have the form T&& for some deduced type T.

Reference collapsing is the mechanism that leads to universal references (which are really just rvalue references in situations where reference-collapsing takes place) sometimes resolving to lvalue references and sometimes to rvalue references. It occurs in specified contexts where references to references may arise during compilation. Those contexts are template type deduction, auto type deduction, typedef formation and use, and decltype expressions.

Acknowledgments

Draft versions of this article were reviewed by Cassio Neri, Michal Mocny, Howard Hinnant, Andrei Alexandrescu, Stephan T. Lavavej, Roger Orr, Chris Oldwood, Jonathan Wakely, and Anthony Williams.  Their comments contributed to substantial improvements in the content of the article as well as in its presentation.

Notes

[1] I discuss rvalues and their counterpart, lvalues, later in this article. The restriction on lvalue references binding to rvalues is that such binding is permitted only when the lvalue reference is declared as a reference-to-const, i.e., a const T&.

[2] I’m ignoring the possibility of bounds violations. They yield undefined behavior.

[3] std::vector::push_back is overloaded. The version shown is the only one that interests us in this article.

Further Information

C++11, Wikipedia.

Overview of the New C++ (C++11), Scott Meyers, Artima Press, last updated January 2012.

C++ Rvalue References Explained, Thomas Becker, last updated September 2011.

decltype, Wikipedia.

“A Note About decltype,” Andrew Koenig, Dr. Dobb’s, 27 July 2011.

Value Semantics and Polymorphism -- Dean Michael Berris

Dean Michael Berris:

Value Semantics and Polymorphism

I've already raved about Elements of Programming by Alex Stepanov before but if there's any one chapter you should read in that book is the first one. The succeeding chapters build on the first but what's really important in the first chapter is the fact that once you understand how to think about values, you can start understanding how to really use the STL and how you can start getting more and more into the practice of generic programming. Even though the book is not really about generic programming, it is a glimpse into a fundamentally different way of thinking.

Let's take something dear to the Object Oriented Programming model and look at it in a different perspective...