Articles & Books

LINQ-like List Manipulation in C++

Cross platform LINQ for C++ 

LINQ-like List Manipulation in C++

by Gaston Hillar

From the article:

When developers make the move from C# to C++, they definitely miss Language-Integrated Query (LINQ) features. Cpplinq is an excellent open-source project that uses C++11 features to provide extensible LINQ-like list manipulation to C++.

I'm sad to say Farewell Dr. Dobb's. Thank you for all the good articles.

Software Duct Tape: Binding the C++  Universe Together—Glennan Carnie

In case you missed it last week on Sticky Bits:

Software Duct Tape -- Binding the C++  Universe Together

by Glennan Carnie

From the article:

One of the cornerstones of object-oriented design is the concept of objects interacting by sending messages to form mechanisms – units of higher-order (or ‘emergent’) behaviour.

In order to send a message (in this case, invoke a member function) an object must have a ‘link’ to the target object. That link is formed by building in an association between the two classes as part of the type’s definition.

In this article we look at building associations between classes and forming run-time links so objects can communicate...

Lessons to learn from John Carmack and Oculus [about] the “Modern C++” approach—CoderGears Team

Fresh on CoderGears:

Lessons to learn from John Carmack and Oculus development team when using the “Modern C++” approach

by CoderGears Team

From the article:

In the previous post about Doom3, we discovered that it was developed using the “C with Classes” approach. John Carmack is the main developer and it was between 2000 and 2004, what explains why “Modern C++” approach was not adopted. And, even if “C with Classes” is not recommended, he made some good decisions to have clean code.

John Carmack is now the oculus VR CTO, Oculus VR is an American virtual reality technology company founded by Palmer Luckey and Brendan Iribe. Their first product, still in development, is the Oculus Rift, a head-mounted display for immersive virtual reality (VR).

Oculus provides an SDK which includes its source code, it’s developed using C++.  For this project the Modern C++ approach was adopted. Let’s discover some design and implementation choices of John Carmack and his team...

Quick Q: What is constexpr useful for?—StackOverflow

Quick A: When you need the constant to be available at compile time.

Today on SO:

What are 'constexpr' useful for?

I really can't find any use of it. My first idea was that I could use it to implement 'Design by Contract' without using macros like this:

struct S
{  
    S(constexpr int i) : S(i) { static_assert( i < 9, "i must be < 9" ); }

    S(int i); //external defintion

    char *pSomeMemory;
};

But this wouldn't compile. I thought we could also use it to reference same-variable without the need of additional memory to be created when we want to avoid the get/setters in order to make instances to one member from users to be read-only:

class S
{ 
private:
    int _i;

public:
    const int & constexpr i = _i;
};

But none of the above actually compiled. Can someone give me some insight why this keyword was being introduced?

Using assertions—Andrzej Krzemieński

Today on Andrzej's blog:

Using assertions

by Andrzej Krzemieński

From the article:

This post is a response to my recent encounters with fellow programmers who appear to me to be missing the point of assertions and fail to appreciate their usefulness. The first post I have ever written here was on assertions, I still find it good, so there is no need to repeat it; here I will only describe how I observe people treat assertions and why I believe it is wrong.

I am reviewing the following code...

Tutorial on Tag Dispatching—Crazy Eddie

Fresh on Crazy Eddie's Crazy C++ (fortunately not entirely true...):

Tutorial on tag dispatching

by Crazy Eddie

From the article:

... There’s been some confused notions passed around recently that lead me to think there needs to be more information about probably one of the simplest, most powerful metaprogramming techniques that exist in C++: tag dispatching. The reason tag dispatching is powerful is that it leverages the language and the compiler to do work for you so that you don’t have to. It’s a technique whereby you use overload resolution rules to decide between otherwise ambiguous functions.

Concepts and tags

The first part to understand with regard to tag dispatching is the idea of “concepts”...

A Tiny Metaprogramming Library—Manu Sánchez

Manu Sánchez is proposing a little-big collaborative challenge to the C++ community. Learning how to build:

A Tiny Metaprogramming Library

by Manu Sánchez

The proposal:

I hope you like this idea. It’s not only me writing crazy meta-stuff, but everybody developing their own metaprogramming library, learning something new each week, and comparing the different approaches each one is taking. I’m the guy who writes this posts, but I can learn a lot with your Tiny Metaprogramming Libraries and your feedback.

 

Bring named parameters in modern C++—Marco Arena

I’m going to explore some of the classical ways to emulate named parameters in C++ as well as mention new approaches. Part of this work would not have been possible without Davide Di Gennaro's help and suggestions. An entire paragraph was written by Davide.

Bring named parameters in modern C++

by Marco Arena

From the article:

In programming, named parameters refer to a computer language’s support for function calls that clearly state the name of each parameter within the function call itself.[...]Several languages support named parameters (e.g. C#, Objective-C, …). C++ does not.[...]In this post, I’m going to explore some of the classical ways to emulate named parameters in C++ as well as mention new approaches.

A function to visit nodes of a graph with C++14—Adrien Hamelin

Some thoughts about how to visit nodes of a graph:

A function to visit nodes of a graph with C++14

by Adrien Hamelin

From the article:

Visiting a graph is something useful. [...] A classical way to do that is to implement the visitor pattern. [...] it requires to create a Visitor class and to modify each of the possibly visited class to add a virtual accept function. [...] It works well, but surely we can do better...

Five Popular Myths about C++, Part 2

myth.png[For your winter reading pleasure, we're pleased to present this three-part series of new material by Bjarne Stroustrup. Part one is here, and part three will be posted next Monday which will complete the series just in time for the holidays. Enjoy. -- Ed.]

 

Five Popular Myths about C++ (Part 2)

by Bjarne Stroustrup

Morgan Stanley, Columbia University, Texas A&M University

 

1. Introduction (repeated from Part 1)

In this three-part series, I will explore, and debunk, five popular myths about C++:

  1. "To understand C++, you must first learn C"
  2. "C++ is an Object-Oriented Language"
  3. "For reliable software, you need Garbage Collection"
  4. "For efficiency, you must write low-level code"
  5. "C++ is for large, complicated, programs only"

If you believe in any of these myths, or have colleagues who perpetuate them, this short article is for you. Several of these myths have been true for someone, for some task, at some time. However, with today’s C++, using widely available up-to date ISO C++ 2011 compilers, and tools, they are mere myths.

I deem these myths “popular” because I hear them often. Occasionally, they are supported by reasons, but more often they are simply stated as obvious, needing no support. Sometimes, they are used to dismiss C++ from consideration for some use.

Each myth requires a long paper or even a book to completely debunk, but my aim here is simply to raise the issues and to briefly state my reasons.

The first two myths were addressed in my first installment.

4. Myth 3: "For reliable software, you need Garbage Collection"

Garbage collection does a good, but not perfect, job at reclaiming unused memory. It is not a panacea. Memory can be retained indirectly and many resources are not plain memory. Consider:

class Filter { // take input from file iname and produce output on file oname
public:
  Filter(const string& iname, const string& oname); // constructor
  ~Filter();                                        // destructor
  // ...
private:
  ifstream is;
  ofstream os;
  // ...
};

This Filter’s constructor opens two files. That done, the Filter performs some task on input from its input file producing output on its output file. The task could be hardwired into Filter, supplied as a lambda, or provided as a function that could be provided by a derived class overriding a virtual function. Those details are not important in a discussion of resource management. We can create Filters like this:

void user()
{
  Filter flt {“books”,”authors”};
  Filter* p = new Filter{“novels”,”favorites”};
  // use flt and *p
  delete p;
}

From a resource management point of view, the problem here is how to guarantee that the files are closed and the resources associated with the two streams are properly reclaimed for potential re-use.

The conventional solution in languages and systems relying on garbage collection is to eliminate the delete (which is easily forgotten, leading to leaks) and the destructor (because garbage collected languages rarely have destructors and “finalizers” are best avoided because they can be logically tricky and often damage performance). A garbage collector can reclaim all memory, but we need user actions (code) to close the files and to release any non-memory resources (such as locks) associated with the streams. Thus memory is automatically (and in this case perfectly) reclaimed, but the management of other resources is manual and therefore open to errors and leaks.

The common and recommended C++ approach is to rely on destructors to ensure that resources are reclaimed. Typically, such resources are acquired in a constructor leading to the awkward name “Resource Acquisition Is Initialization” (RAII) for this simple and general technique. In user(), the destructor for flt  implicitly calls the destructors for the streams is and os. These constructors in turn close the files and release the resources associated with the streams. The delete would do the same for *p.

Experienced users of modern C++ will have noticed that user() is rather clumsy and unnecessarily error-prone. This would be better:

void user2()
{
  Filter flt {“books”,”authors”};
  unique_ptr<Filter> p {new Filter{“novels”,”favorites”}};
  // use flt and *p
}

Now *p will be implicitly released whenever user() is exited. The programmer cannot forget to do so. The unique_ptr is a standard-library class designed to ensure resource release without runtime or space overheads compared to the use of built-in “naked” pointers.

However, we can still see the new, this solution is a bit verbose (the type Filter is repeated), and separating the construction of the ordinary pointer (using new) and the smart pointer (here, unique_ptr) inhibits some significant optimizations. We can improve this by using a C++14 helper function make_unique that constructs an object of a specified type and returns a unique_ptr to it:

void user3()
{
  Filter flt {“books”,”authors”};
  auto p = make_unique<Filter>(“novels”,”favorites”);
  // use flt and *p
}

Unless we really needed the second Filter to have pointer semantics (which is unlikely) this would be better still:

void user3()
{
  Filter flt {“books”,”authors”};
  Filter flt2 {“novels”,”favorites”};
  // use flt and flt2
}

This last version is shorter, simpler, clearer, and faster than the original.

But what does Filter’s destructor do? It releases the resources owned by a Filter; that is, it closes the files (by invoking their destructors). In fact, that is done implicitly, so unless something else is needed for Filter, we could eliminate the explicit mention of the Filter destructor and let the compiler handle it all. So, what I would have written was just:

class Filter { // take input from file iname and produce output on file oname
public:
  Filter(const string& iname, const string& oname);
  // ...
private:
  ifstream is;
  ofstream os;
  // ...
};

void user3()
{
  Filter flt {“books”,”authors”};
  Filter flt2 {“novels”,”favorites”};
  // use flt and flt2
}

This happens to be simpler than what you would write in most garbage collected languages (e.g., Java or C#) and it is not open to leaks caused by forgetful programmers. It is also faster than the obvious alternatives (no spurious use of the free/dynamic store and no need to run a garbage collector). Typically, RAII also decreases the resource retention time relative to manual approaches.

This is my ideal for resource management. It handles not just memory, but general (non-memory) resources, such as file handles, thread handles, and locks. But is it really general? How about objects that needs to be passed around from function to function? What about objects that don’t have an obvious single owner?

4.1 Transferring Ownership: move

Let us first consider the problem of moving objects around from scope to scope. The critical question is how to get a lot of information out of a scope without serious overhead from copying or error-prone pointer use. The traditional approach is to use a pointer:

X* make_X()
{
  X* p = new X:
  // ... fill X ..
  return p;
}

void user()
{
  X* q = make_X();
  // ... use *q ...
  delete q;
}

Now who is responsible for deleting the object? In this simple case, obviously the caller of make_X() is, but in general the answer is not obvious. What if make_X() keeps a cache of objects to minimize allocation overhead? What if user() passed the pointer to some other_user()? The potential for confusion is large and leaks are not uncommon in this style of program.

I could use a shared_ptr or a unique_ptr to be explicit about the ownership of the created object. For example:

unique_ptr<X> make_X();

But why use a pointer (smart or not) at all? Often, I don’t want a pointer and often a pointer would distract from the conventional use of an object. For example, a Matrix addition function creates a new object (the sum) from two arguments, but returning a pointer would lead to seriously odd code:

unique_ptr<Matrix> operator+(const Matrix& a, const Matrix& b);
Matrix res = *(a+b);

That * is needed to get the sum, rather than a pointer to it. What I really want in many cases is an object, rather than a pointer to an object. Most often, I can easily get that. In particular, small objects are cheap to copy and I wouldn’t dream of using a pointer:

double sqrt(double); // a square root function
double s2 = sqrt(2); // get the square root of 2

On the other hand, objects holding lots of data are typically handles to most of that data. Consider istream, string, vector, list, and thread. They are all just a few words of data ensuring proper access to potentially large amounts of data. Consider again the Matrix addition. What we want is

Matrix operator+(const Matrix& a, const Matrix& b); // return the sum of a and b
Matrix r = x+y;

We can easily get that.

Matrix operator+(const Matrix& a, const Matrix& b)
{
  Matrix res;
  // ... fill res with element sums ...
  return res;
}

By default, this copies the elements of res into r, but since res is just about to be destroyed and the memory holding its elements is to be freed, there is no need to copy: we can “steal” the elements. Anybody could have done that since the first days of C++, and many did, but it was tricky to implement and the technique was not widely understood. C++11 directly supports “stealing the representation” from a handle in the form of move operations that transfer ownership. Consider a simple 2-D Matrix of doubles:

class Matrix {
  double* elem; // pointer to elements
  int nrow;     // number of rows
  int ncol;     // number of columns
public:
  Matrix(int nr, int nc)                  // constructor: allocate elements
    :elem{new double[nr*nc]}, nrow{nr}, ncol{nc}
  {
    for(int i=0; i<nr*nc; ++i) elem[i]=0; // initialize elements
  }

  Matrix(const Matrix&);                  // copy constructor
  Matrix operator=(const Matrix&);        // copy assignment

  Matrix(Matrix&&);                       // move constructor
  Matrix operator=(Matrix&&);             // move assignment

  ~Matrix() { delete[] elem; }            // destructor: free the elements

// …
};

A copy operation is recognized by its reference (&) argument. Similarly, a move operation is recognized by its rvalue reference (&&) argument. A move operation is supposed to “steal” the representation and leave an “empty object” behind. For Matrix, that means something like this:

Matrix::Matrix(Matrix&& a)                   // move constructor
  :nrow{a.nrow}, ncol{a.ncol}, elem{a.elem}  // “steal” the representation
{
  a.elem = nullptr;                          // leave “nothing” behind
}

That’s it! When the compiler sees the return res; it realizes that res is soon to be destroyed. That is, res will not be used after the return. Therefore it applies the move constructor, rather than the copy constructor to transfer the return value. In particular, for

Matrix r = a+b;

the res inside operator+() becomes empty -- giving the destructor a trivial task -- and res’s elements are now owned by r. We have managed to get the elements of the result -- potentially megabytes of memory -- out of the function (operator+()) and into the caller’s variable. We have done that at a minimal cost (probably four word assignments).

Expert C++ users have pointed out that there are cases where a good compiler can eliminate the copy on return completely (in this case saving the four word moves and the destructor call). However, that is implementation dependent, and I don’t like the performance of my basic programming techniques to depend on the degree of cleverness of individual compilers. Furthermore, a compiler that can eliminate the copy, can as easily eliminate the move. What we have here is a simple, reliable, and general way of eliminating complexity and cost of moving a lot of information from one scope to another.

Often, we don’t even need to define all those copy and move operations. If a class is composed out of members that behave as desired, we can simply rely on the operations generated by default. Consider:

class Matrix {
    vector<double> elem; // elements
    int nrow;            // number of rows
    int ncol;            // number of columns
public:
    Matrix(int nr, int nc)    // constructor: allocate elements
      :elem(nr*nc), nrow{nr}, ncol{nc}
    { }

    // ...
};

This version of Matrix behaves like the version above except that it copes slightly better with errors and has a slightly larger representation (a vector is usually three words).

What about objects that are not handles? If they are small, like an int or a complex<double>, don’t worry. Otherwise, make them handles or return them using “smart” pointers, such as unique_ptr and shared_ptr. Don’t mess with “naked” new and delete operations.

Unfortunately, a Matrix like the one I used in the example is not part of the ISO C++ standard library, but several are available (open source and commercial). For example, search the Web for “Origin Matrix Sutton” and see Chapter 29 of my The C++ Programming Language (Fourth Edition) [11] for a discussion of the design of such a matrix.

4.2 Shared Ownership: shared_ptr

In discussions about garbage collection it is often observed that not every object has a unique owner. That means that we have to be able ensure that an object is destroyed/freed when the last reference to it disappears. In the model here, we have to have a mechanism to ensure that an object is destroyed when its last owner is destroyed. That is, we need a form of shared ownership. Say, we have a synchronized queue, a sync_queue, used to communicate between tasks. A producer and a consumer are each given a pointer to the sync_queue:

void startup()
{
  sync_queue* p  = new sync_queue{200};  // trouble ahead!
  thread t1 {task1,iqueue,p};  // task1 reads from *iqueue and writes to *p
  thread t2 {task2,p,oqueue};  // task2 reads from *p and writes to *oqueue
  t1.detach();
  t2.detach();
}

I assume that task1, task2, iqueue, and oqueue have been suitably defined elsewhere and apologize for letting the thread outlive the scope in which they were created (using detatch()). Also, you may imagine pipelines with many more tasks and sync_queues. However, here I am only interested in one question: “Who deletes the sync_queue created in startup()?” As written, there is only one good answer: “Whoever is the last to use the sync_queue.” This is a classic motivating case for garbage collection. The original form of garbage collection was counted pointers: maintain a use count for the object and when the count is about to go to zero delete the object. Many languages today rely on a variant of this idea and C++11 supports it in the form of shared_ptr. The example becomes:

void startup()
{
  auto p = make_shared<sync_queue>(200);  // make a sync_queue and return a stared_ptr to it
  thread t1 {task1,iqueue,p};  // task1 reads from *iqueue and writes to *p
  thread t2 {task2,p,oqueue};  // task2 reads from *p and writes to *oqueue
  t1.detach();
  t2.detach();
}

Now the destructors for task1 and task2 can destroy their shared_ptrs (and will do so implicitly in most good designs) and the last task to do so will destroy the sync_queue.

This is simple and reasonably efficient. It does not imply a complicated run-time system with a garbage collector. Importantly, it does not just reclaim the memory associated with the sync_queue. It reclaims the synchronization object (mutex, lock, or whatever) embedded in the sync_queue to mange the synchronization of the two threads running the two tasks. What we have here is again not just memory management, it is general resource management. That “hidden” synchronization object is handled exactly as the file handles and stream buffers were handled in the earlier example.

We could try to eliminate the use of shared_ptr by introducing a unique owner in some scope that encloses the tasks, but doing so is not always simple, so C++11 provides both unique_ptr (for unique ownership) and shared_ptr (for shared ownership).

4.3 Type safety

Here, I have only addressed garbage collection in connection with resource management. It also has a role to play in type safety. As long as we have an explicit delete operation, it can be misused. For example:

X* p = new X;
X* q = p;
delete p;
// ...
q->do_something();  // the memory that held *p may have been re-used

Don’t do that. Naked deletes are dangerous -- and unnecessary in general/user code. Leave deletes inside resource management classes, such as string, ostream, thread, unique_ptr, and shared_ptr. There, deletes are carefully matched with news and harmless.

4.4 Summary: Resource Management Ideals

For resource management, I consider garbage collection a last choice, rather than “the solution” or an ideal:

  1. Use appropriate abstractions that recursively and implicitly handle their own resources. Prefer such objects to be scoped variables.
  2. When you need pointer/reference semantics, use “smart pointers” such as unique_ptr and shared_ptr to represent ownership.
  3. If everything else fails (e.g., because your code is part of a program using a mess of pointers without a language supported strategy for resource management and error handling), try to handle non-memory resources “by hand” and plug in a conservative garbage collector to handle the almost inevitable memory leaks.

Is this strategy perfect? No, but it is general and simple. Traditional garbage-collection based strategies are not perfect either, and they don’t directly address non-memory resources.

 

Postscript

The first installment of this series addressed

  • Myth 1: "To understand C++, you must first learn C"
  • Myth 2: "C++ is an Object-Oriented Language"

In my next installment I will address

  • Myth 4: "For efficiency, you must write low-level code"
  • Myth 5: "C++ is for large, complicated, programs only"