advanced

A C++11 Name-Lookup Problem: a Long-Term Solution and a Temporary Work-Around

A C++11 Name-Lookup Problem: a Long-Term Solution and a Temporary Work-Around

by Eric Niebler

[This article has been changed since it was first published. New information from the standardization committee, which is working to fix the issue, is included.]

[Disclaimer: This is an advanced article.]

Both beginner programmers and experienced library developers got new tools when C++11 came out. As a library developer, I was particularly excited by variadic templates, an advanced feature that makes it possible to write a function that takes an arbitrary number of arguments. As it turns out, the nature of the language feature, together with the existing name lookup rules of C++, can steer you down a dark alley of frustration. This article sheds some light on the issue and shows a light at the end of the tunnel in the form of a language simplification coming in C++14. Until then, there is a workaround. But first, a word about variadic templates.

Imagine you're trying to write a sum() function that takes an arbitrary number of arguments and adds them all together. In pseudo-code, you want something like this:

// pseudo-code
auto sum( t ) { return t; }
auto sum( t, ...u ) { return t + sum( u... ); }

You have two overloads: one that takes a single argument and returns it; and the other that takes multiple arguments and adds the first to the resulting of summing the rest. Simple enough, right? With variadic templates, this isn't too hard, although the syntax takes some getting used to. Let's assume we're summing ints for now:

// real code
int sum( int i )
{
  return i;
}

template< typename ...Ints >
int sum( int i, Ints ...is )
{
  return i + sum( is... );
}

OK, that wasn't so bad. It's a little strange that we had to parameterize all the arguments after the first. We would have preferred a signature like "int sum(int i, int... is)". Variadic templates don't let us do that, but it's not a huge deal.

The other thing we notice is that variadic templates can only ever expand into a comma-separated list of things. Above, we expand is... into the (comma-separated) argument list of a recursive invocation of sum(). If we could expand the argument pack into a list of things separated with '+', then we could have done all this in one shot. Instead, we're forced by the nature of variadic templates to implement sum() recursively. This may not seem like a big deal right now, but is.

Let's check to see that our sum() function works like we expect:

int i = sum( 1 );       // OK, i == 1
int j = sum( 1, 2 );    // OK, j == 3
int k = sum( 1, 2, 3 ); // OK, k == 6

Now we're feeling bold, and we'd like to generalize. Maybe we want to sum longs, or even concatenate std::strings! We need to make two changes: replace int with a template parameter, and automatically deduce the result type of the addition. We want to use type deduction so that if we, say, add an int and a long, the return type is long as it should be. C++11 makes this possible as well, although again the syntax takes some getting used to:

template< typename T >
T sum( T t )
{
  return t;
}

template< typename T, typename ...Us >
auto sum( T t, Us ...us ) -> decltype( t + sum( us... ) )
{
  return t + sum( us... );
}

Above, we use decltype to deduce the return type, and we use the new trailing return type declaration syntax. The code duplication in the trailing return type declaration is a code smell, but we can live with it for now. (FWIW, the standardization committee smells it too, and wants to clean this up. Read to the end for a glimpse of the smell-free future of C++.) Let's give it a test drive:

auto i = sum( 1 );                                     // OK, i == 1
auto j = sum( 1, 2 );                                  // OK, i == 3
auto greeting = sum( std::string("hello "), "world" ); // Cool! greeting == "hello world"
auto k = sum( 1, 2, 3 );                               // ERROR! Doesn't compile.

The last line gives us the following error on Clang:

>  main.cpp:25:14: error: no matching function for call to 'sum'
>      auto k = sum( 1, 2, 3 ); // ERROR! Doesn't compile.
>               ^~~
>  main.cpp:16:6: note: candidate template ignored: substitution
>  failure [with T = int, Us = <int, int>]: call to function
>  'sum' that is neither visible in the template definition nor
>  found by argument-dependent lookup
>  auto sum( T t, Us ...us ) -> decltype( t + sum( us... ) )
>       ^                                     ~~~
>  main.cpp:11:3: note: candidate function template not viable:
>  requires single argument 't', but 3 arguments were provided
>  T sum( T t )
>    ^
>  1 error generated.

What's going on here? Calls to sum() work for one argument and two arguments, but fail for three. In the error message, we see that Clang really tried to call the variadic overload of sum() but failed because "'sum' ... is neither visible in the template definition nor found by argument-dependent lookup." Huh?

I'll skip to the chase and tell you what's going on (though you might take a moment to figure it out for yourself before reading further). Look again at the definition of the variadic sum() overload:

template< typename T, typename ...Us >
auto sum( T t, Us ...us ) -> decltype( t + sum( us... ) )
{
  return t + sum( us... );
}

sum() appears twice, once in the function declaration and once in the function body. It's the one in the function declaration that's causing the problem. Consider that the above is (roughly) equivalent to the following:

template< typename T, typename ...Us >
decltype( T() + sum( Us()... ) ) sum( T t, Us ...us )
{
  return t + sum( us... );
}

The main difference is that we moved the return type from the back to the front, but this makes it a little more clear that in the return type the second sum() overload hasn't been seen yet! Even in the trailing return type formulation, the compiler pretends like it hasn't seen the second overload of sum() yet. And if it hasn't seen it, it can't call it. Picky, picky.

Name Lookup for Dummies

Name lookup happens in two phases. Let's just consider what happens in the case of the non-qualified function call in the return type of sum(). The compiler builds a set of overloads by adding a dash of functions from lookup phase 1 and then another dash from lookup phase 2. In phase 1, which happens when sum() is first parsed, functions with the right name that are in scope are tossed in the overload bucket. In our case, that's the single-argument overload of sum(), since that's the only one that is currently in scope. This phase happens before the argument types are known. Phase 2 is the argument-dependent lookup (ADL) phase and happens when the argument types are known; we check the associated namespaces of the arguments and look for any sum() overloads in those namespaces. None are found, because the arguments are of type int which has no associated namespaces. Bummer, our bucket only has only one overload of sum() in it, and it's not the one we need.

If you want to see something interesting about the way ADL works, try this. Define "struct String : std::string {};" at global scope. Now do this:

auto k = sum( "", String(), "" ); // OK!

Why does it compile when one of the arguments is a String? Because now ADL is forced to look in String's associated namespaces. Since String is defined in the global scope, ADL must check the global scope during the phase 2, and presto! there's another overload of sum() hiding in plain sight.

What's curious about our case is that we're trying to use a function in the declaration of that same function. There's just no way to get a function in scope until after its declaration, so trying to get phase 1 lookup to find it is a lost cause.

Recall what we said earlier about variadic templates: recursive algorithms are pretty much the only way to get non-trivial things done with them. There's no way around it. If the return type depends on the type returned by the recursive invocation -- and it often does -- we will run into this name lookup problem. Every time. The committee is already on its way with a language simplification that will make this problem go away (more about that below). For now, we have to find a workaround. Help us, phase 2 lookup. You're our only hope.

Relax, It's Just a Phase

How do we get name lookup to cool its jets and wait for phase 2? Let's back up a second and ask ourselves why lookup is done in two phases in the first place. When the compiler first sees the definition of a function template, some things are known and some aren't. For instance, the name of the function is known, but not the types of the arguments. Anything that depends on the template parameters is called, er, dependent. Name lookup in dependent types and dependent expressions happens during phase 2. So, in order to make the sum() function call resolve correctly, we need to make it depend on a template parameter.

Once again skipping ahead a bit, here is the hackish workaround that I've found. Explanation to follow.

namespace detail
{
  // The implementation of sum() is in a function object,
  // which is really just a holder for overloads.
  struct sum_impl
  {
    template< typename T >
    T operator()( T t ) const
    {
      return t;
    }

    // Sneaky trick below to make the function call lookup
    // happen in phase 2 instead of phase 1:
    template< typename T, typename ...Us, typename Impl = sum_impl >
    auto operator()( T t, Us ...us ) const -> decltype( t + Impl()( us... ) )
    {
      return t + Impl()( us... );
    }
  };
}

constexpr detail::sum_impl sum{};

In the above code, it looks like we simply moved the implementation of the sum() function into a sum_impl function object. But look closely, because there's more going on here than at first glance. The magic happens here:

template< typename T, typename ...Us, typename Impl = sum_impl >
// ----------------------- MAGIC HERE ^^^^^^^^^^^^^^^^^^^^^^^^
auto operator()( T t, Us ...us ) const -> decltype( t + Impl()( us... ) )

The "typename Impl = sum_impl" is a default function template parameter, new in C++11. If you don't specify it explicitly, Impl defaults to sum_impl. We won't specify it explicitly, but when the compiler first sees this code (in phase 1), it doesn't know that. Instead, when the compiler sees this...

decltype( t + Impl()( us... ) )

... the compiler must be patient and wait until phase 2, when it knows what Impl is, before it can resolve the function call. And that's precisely the point.

Sum-mary

With the above trick, our sum() function now compiles for three or more arguments. In short, when the compiler isn't finding the thing it should be finding, and you're yelling at the computer, "BUT IT'S RIGHT THERE!!!", pause and consider that maybe you too have been bitten by the phases of name lookup. Try the default function template parameter trick to make your expressions dependent.

A Look to C++14

The good news is that the problem described above is about to disappear. The C++ committee is in the process of approving return type deduction for regular functions the same way as it already works for lambdas, with the happy result that you won't even have to bother to write sum's return type at all (including not having to write decltype) because it will be deduced from the return statement in the body, at which point everything is happily in scope and Just Works.

In particular, the variadic sum() will be simply:

template< typename T, typename ...Us >
auto sum( T t, Us ...us )         // look ma, no "->"
{
  return t + sum( us... );
}

See Jason Merrill's paper N3386 for details. This proposal is on track to be adopted for C++14, and has already been implemented in the current development branch of GCC; to try it out, get the current development snapshot of GCC (or wait for GCC 4.8 in a few months) and compile with the -std=c++1y switch.

Automatic return type deduction, already becoming available in GCC and soon other compilers, is another example of how C++ continues to become simpler and remove the need for knowing about old-C++ corner cases. As simpler alternatives are added, we can just stop using the older complex ways in our new code, even while enjoying C++'s great backward compatibility for our older existing code.

C++ Optimization Manuals -- Agner Fog

[Ed.: Many thanks to reader Bartosz Bielecki for directing our attention to a good resource for all the performance tweak-heads out there. You know who you are.]

Agner Fog's C++ Optimization Manuals

submitted by Bartosz Bielecki

Agner Fog's site is one of the most important pages related to software optimization (also for the C++). You will find following guides there:

  • C++ optimization,
  • assembly optimization,
  • CPU instructions choice,
  • C++ calling conventions,
  • and various other links/tools.

If you have not been there, it is high time you did it!

Continue reading ...

Value Semantics and Concepts-Based Polymorphism -- Sean Parent

This past year at C++Now, Sean Parent gave a talk about Value Types that blew the room away. It will deepen your understanding of the design of the STL and change the way you think about and write code. He'll also show you some lean-and-mean image-processing demos that will drop your jaw. This is why we do C++.

Value Semantics and Concepts-Based Polymorphism

by Sean Parent

Sean will further develop the Value Semantics and Concepts-based Polymorphism concepts covered in his keynote, "Now What? A vignette in 3 parts."

Note: The audio is soft. Turn your volume up. The slides, the Keynote presentation, and the source code can be found in C++Now's GitHub repo here.

Watch the video...

 

Use CRTP for Polymorphic Chaining -- Marco Arena

Use CRTP for Polymorphic Chaining

by Marco Arena

 

This suggested video reminded me another use of CRTP I described some months ago. It was about how to support method chaining when inheritance is involved.

Consider this simple scenario:

struct Printer
{
   Printer(ostream& stream) : m_stream(stream) {}

   template<typename T>
   Printer& println(T&& message)
   {
      cout << message << endl;
      return *this;
   }

protected:
   ostream& m_stream;
};

struct CoutPrinter : public Printer
{
   CoutPrinter() : Printer(cout) {}

   CoutPrinter& color(Color c) 
   {
      // change console color
      return *this;
   }
}

//...

CoutPrinter printer;
printer.color(red).println("Hello").color( // OOPS

println("Hello") returns a Printer& and not a CoutPrinter&, so color is no longer available.

In this post I wrote a very simple approach based on CRTP to maintain the chaining relationship when inheritance is involved. I also propose a complicated-at-first-sight way to avoid duplication employing a bit of metaprogramming.

Continue reading...

Another Alternative to Lambda Move Capture -- John Bandela

[Ed.: John Bandela correctly points out some hazards of one approach to capture-by-move in C++11 lambdas. In this blog post, he offers his own safer solution.]

Another Alternative to Lambda Move Capture

by John Bandela

Today the post on isocpp.org called Learn How to Capture By Move caught my attention. I found the post informative and thought provoking, and you should go and read it before reading the rest of this post.

The problem is how do we lambda capture a large object we want to avoid copying?

The motivating example is below:

function<void()> CreateLambda()  
{  
  vector<HugeObject> hugeObj;  
  // ...preparation of hugeObj...  
  auto toReturn = [hugeObj] { /*...operate on hugeObj...*/ };  
  return toReturn;  
}

The solution proposed is a template class move_on_copy that is used like this:

auto moved = make_move_on_copy(move(hugeObj));
auto toExec = [moved] { /*...operate on moved.value...*/ };

However, there are problems with this approach, mainly in safety. The move_on_copy acts as auto_ptr and silently performs moves instead of copies (by design).

I present here a different take on the problem which accomplishes much of the above safely, however, with a little more verbosity in exchange for more clarity and safety.

Continue reading...

 

Modules update video available -- Doug Gregor

MP4 video of Doug Gregor's talk on Modules is now available via llvm.org. Combining links here:

Modules (MP4 video) (PDF slides)

Doug Gregor - Apple

The C preprocessor has long been a source of problems for programmers and tools alike.

Programmers must contend with widespread macro pollution and #include-ordering problems due to ill-behaved headers. Developers habitually employ various preprocessor workarounds, such as LONG_MACRO_PREFIXES, #include guards, and the occasional #undef of a library macro to mitigate these problems.

Tools, on the other hand, must cope with the inherent scalability problems associated with parsing the same headers repeatedly, because each different preprocessing context could effect how a header is interpreted – even though the programmer rarely wants it.

Modules seeks to solve this problem by isolating the interface of a particular library and compiling it (once) into an efficient, serialized representation that can be efficiently imported whenever that library is used, improving both the programmer’s experience and the scalability of the compilation process.

 

Compile Time Computations -- Andrzej KrzemieĊ„ski

[Ed.: This is an old post but we feel it has valuable information about constexpr.]

This is a good article about constant expressions and computations at compile time. It gives good coverage of the new constexpr keyword for allowing compile-time and run-time evalution. constexpr is discussed rigorously with information on the rules required by the facility.

Compile-Time Computations

by Andrzej Krzemieński

[...] But now, consider the following example:

const int i = 2;

const char array[ i == 2 ? 64 : throw exception() ];

Throwing an exception (which is an expression) cannot be evaluated at compile-time, but since i == 2 is true, the third argument of conditional operator should not be evaluated; at least at run-time. So is the above a valid code? Not in C++03. It is valid however in C++11. Does this sound incredible? [...]

 

Continue reading...

 

Learn How to Capture by Move -- Marco Arena

[Ed.: C++11 lambdas support capturing local variables by value and by reference... but not by move. This will likely be addressed in the next standard, but in the meantime we need workarounds, usually involving a shared_ptr, to get the right behavior. In this article, Marco Arena looks at the shared_ptr approach and offers another potential helper. Observing the tradeoffs in these workarounds can instructive.]

Learn How to Capture by Move

by Marco Arena

what if I want to capture by moving an object instead of both copying and referencing? Consider this plausible scenario:

function<void()> CreateLambda()

{

    vector<HugeObject> hugeObj;

    // ...preparation of hugeObj...


    auto toReturn = [hugeObj] { ...operate on hugeObj... };

    return toReturn;

}

This fragment of code prepares a vector of HugeObject (e.g. expensive to copy) and returns a lambda which uses this vector (the vector is captured by copy because it goes out of scope when the lambda is returned). Can we do better?

"Yes, of course we can!" – I heard. "We can use a shared_ptr to reference-count the vector and avoid copying it."  ...

Continue reading...

On vector<bool> -- Howard Hinnant

On vector<bool>

by Howard Hinnant

 

vector<bool> has taken a lot of heat over the past decade, and not without reason. However I believe it is way past time to draw back some of the criticism and explore this area with a dispassionate scrutiny of detail.

There are really two issues here:

  1. Is the data structure of an array of bits a good data structure?
  2. Should the aforementioned data structure be named vector<bool>?

I have strong opinions on both of these questions. And to get this out of the way up front:

  1. Yes.
  2. No.

The array of bits data structure is a wonderful data structure. It is often both a space and speed optimization over the array of bools data structure if properly implemented. However it does not behave exactly as an array of bools, and so should not pretend to be one.

First, what's wrong with vector<bool>?

Because vector<bool> holds bits instead of bools, it can't return a bool& from its indexing operator or iterator dereference. This can play havoc on quite innocent looking generic code. For example:

template <class T>
void
process(T& t)
{
    // do something with t
}

template <class T, class A>
void
test(std::vector<T, A>& v)
{
    for (auto& t : v)
        process(t);
}

The above code works for all T except bool. When instantiated with bool, you will receive a compile time error along the lines of:

error: non-const lvalue reference to type 'std::__bit_reference<std::vector<bool, std::allocator<bool>>, true>' cannot bind to
      a temporary of type 'reference' (aka 'std::__bit_reference<std::vector<bool, std::allocator<bool>>, true>')
    for (auto& t : v)
               ^ ~
note: in instantiation of function template specialization 'test<bool, std::allocator<bool>>' requested here
    test(v);
    ^
vector:2124:14: note: selected 'begin' function
      with iterator type 'iterator' (aka '__bit_iterator<std::vector<bool, std::allocator<bool>>, false>')
    iterator begin()
             ^
1 error generated.

This is not a great error message. But it is about the best the compiler can do. The user is confronted with implementation details of vector and in a nutshell says that the vector is not working with a perfectly valid ranged-based for statement. The conclusion the client comes to here is that the implementation of vector is broken. And he would be at least partially correct.

But consider if instead of vector<bool> being a specialization instead there existed a separate class template std::bit_vector<A = std::allocator<bool>> and the coder had written:

template <class A>
void
test(bit_vector<A>& v)
{
    for (auto& t : v)
        process(t);
}

Now one gets a similar error message:

error: non-const lvalue reference to type 'std::__bit_reference<std::bit_vector<std::allocator<bool>>, true>' cannot bind to
      a temporary of type 'reference' (aka 'std::__bit_reference<std::bit_vector<std::allocator<bool>>, true>')
    for (auto& t : v)
               ^ ~
note: in instantiation of function template specialization 'test<std::allocator<bool>>' requested here
    test(v);
    ^
bit_vector:2124:14: note: selected 'begin' function
      with iterator type 'iterator' (aka '__bit_iterator<std::bit_vector<std::allocator<bool>>, false>')
    iterator begin()
             ^
1 error generated.

And although the error message is similar, the coder is far more likely to see that he is using a dynamic array of bits data structure and it is understandable that you can't form a reference to a bit.

I.e. names are important. And creating a specialization that has different behavior than the primary, when the primary template would have worked, is poor practice.

But what's right with vector<bool>?

For the rest of this article assume that we did indeed have a std::bit_vector<A = std::allocator<bool>> and that vector was not specialized on bool. bit_vector<> can be much more than simply a space optimization over vector<bool>, it can also be a very significant performance optimization. But to achieve this higher performance, your vendor has to adapt many of the std::algorithms to have specialized code (optimizations) when processing sequences defined by bit_vector<>::iterators.

find

For example consider this code:

template <class C>
typename C::iterator
test()
{
    C c(100000);
    c[95000] = true;
    return std::find(c.begin(), c.end(), true);
}

How long does std::find take in the above example for:

  1. A hypothetical non-specialized vector<bool>?
  2. A hypothetical bit_vector<> using an optimized find?
  3. A hypothetical bit_vector<> using the unoptimized generic find?

I'm testing on an Intel Core i5 in 64 bit mode. I am normalizing all answers such that the speed of A is 1 (smaller is faster):

  1. 1.0
  2. 0.013
  3. 1.6

An array of bits can be a very fast data structure for a sequential search! The optimized find is inspecting 64 bits at a time. And due to the space optimization, it is much less likely to cause a cache miss. However if the implementation fails to do this, and naively checks one bit at a time, then this giant 75X optimization turns into a significant pessimization.

count

std::count can be optimized much like std::find to process a word of bits at a time:

template <class C>
typename C::difference_type
test()
{
    C c(100000);
    c[95000] = true;
    return std::count(c.begin(), c.end(), true);
}

My results are:

  1. 1.0
  2. 0.044
  3. 1.02

Here the results are not quite as dramatic as for the std::find case. However any time you can speed up your code by a factor of 20, one should do so!

fill

std::fill is yet another example:

template <class C>
void
test()
{
    C c(100000);
    std::fill(c.begin(), c.end(), true);
}

My results are:

  1. 1.0
  2. 0.40
  3. 38.

The optimized fill is over twice as fast as the non-specialized vector<bool>. But if the vendor neglects to specialize fill for bit-iterators the results are disastrous! Naturally the results are identical for the closely related fill_n.

copy

std::copy is yet another example:

template <class C>
void
test()
{
    C c1(100000);
    C c2(100000);
    std::copy(c1.begin(), c1.end(), c2.begin());
}

My results are:

  1. 1.0
  2. 0.36
  3. 34.

The optimized copy is approaches three times as fast as the non-specialized vector<bool>. But if the vendor neglects to specialize fill for bit-iterators the results are not good. If the copy is not aligned on word boundaries (as in the above example), then the optimized copy slows down to the same speed as the copy for A. Results for copy_backward, move and move_backward are similar.

swap_ranges

std::swap_ranges is yet another example:

template <class C>
void
test()
{
    C c1(100000);
    C c2(100000);
    std::swap_ranges(c1.begin(), c1.end(), c2.begin());
}

My results are:

  1. 1.0
  2. 0.065
  3. 4.0

Here bit_vector<> is 15 times faster than an array of bools, and over 60 times as fast as working a bit at a time.

rotate

std::rotate is yet another example:

template <class C>
void
test()
{
    C c(100000);
    std::rotate(c.begin(), c.begin()+c.size()/4, c.end());
}

My results are:

  1. 1.0
  2. 0.59
  3. 17.9

Yet another example of good results with an optimized algorithm and very poor results without this extra attention.

equal

std::equal is yet another example:

template <class C>
bool
test()
{
    C c1(100000);
    C c2(100000);
    return std::equal(c1.begin(), c1.end(), c2.begin());
}

My results are:

  1. 1.0
  2. 0.016
  3. 3.33

If you're going to compare a bunch of bools, it is much, much faster to pack them into bits and compare a word of bits at a time, rather than compare individual bools, or individual bits!

Summary

The dynamic array of bits is a very good data structure if attention is paid to optimizing algorithms that can process up to a word of bits at a time. In this case it becomes not only a space optimization but a very significant speed optimization. If such attention to detail is not given, then the space optimization leads to a very significant speed pessimization.

But it is a shame that the C++ committee gave this excellent data structure the name vector<bool> and that it gives no guidance nor encouragement on the critical generic algorithms that need to be optimized for this data structure. Consequently, few std::lib implementations go to this trouble.