performance

Building a hybrid spin mutex in C++ -- Foster Brereton

Forster Brereton reports about his first steps to build a hybrid mutex.

Building a hybrid spin mutex in C++

by Foster Brereton

From the article

Blocking Mutexes
A blocking mutex will halt the thread until it acquisition. It is useful because it consumes negligible computer resources while blocked. This leaves the CPU free to perform other tasks, including whatever other task currently owns the mutex. All this goodness is not cheap, however: it takes a decent amount of time to block thread. If your critical section is brief, you could be spending a disproportionate amount of time protecting it instead of running it.
Generally, blocking mutexes should be used when your critical section will take a while, such as I/O operations, calling out to the OS, or doing laundry in a collegiate dorm.


Spinning Mutexes
A spinning mutex will enter into an infinite loop (spin) until acquisition. It is useful because it can resume very quickly once the lock has been obtained, resulting in minimal overhead while protecting a critical section. However, since the thread remains active on the CPU, it can reduce (or eliminate!) the ability of the CPU to do other work††. If your critical section is long, you could be spending a disproportionate amount of time protecting it instead of running it.
Generally, spin mutexes should be used when your critical section is brief, such as reading or writing a memory-resident data structure.

Finding a middle ground
The dichotomy between the two mutex behaviors has left me stuck more than once. What if I was trying to protect a global resource that occasionally required a call to the OS? In those cases a blocking mutex is not a good fit, as modifying the memory-resident structure is pretty quick. However a spin mutex would be equally bad, because I do need to go to the OS time and again, and it would be a pessimization to spike a CPU while doing so.

More Meeting C++ 2016 videos are online!

A week full of video editing brings the first batch of Meeting C++ 2016 videos online:

More videos are online!

by Jens Weller

Meeting C++ 2016 Playlist

From the article:

With today, almost all videos from the A and all videos of the D Track are online. There is a recording issue with one talk in the A track, which might get resolved in 2017. Also since today, the Meeting C++ YouTube channel has more then 400k views!

The full video set you can find in the Meeting C++ 2016 Playlist, the newest videos are easily found by visiting the Meeting C++ YouTube channel or subscribing to this RSS feed.

6 Tips to supercharge C++11 vector performance--Deb Haldar

Discussion on how we can efficiently use std::vector<T> container.

6 Tips to supercharge C++11 vector performance

by Deb Haldar

From the article:

Vector is like the swiss army knife of C++ STL containers. In the words of Bjarne Stroutsoup – “By default, use Vector when you need a container”. For mere mortals like us, we take this as gospel and just run with it. However, Vector is just a tool and like any tool, it can be used both effectively or ineffectively.

In this article we’ll look at 6 ways to optimize usage of vectors. We’ll look at both efficient and inefficient ways to perform the most common programming tasks using vectors, measure the performance gain we obtain by using vectors efficiently and try to understand why we’re getting the performance gain.

Infographics: Operation Costs in CPU Clock Cycles--“No Bugs” Hare

A very interesting article about the cost of our basic operations.

Infographics: Operation Costs in CPU Clock Cycles

by “No Bugs” Hare

From the article:

Whenever we need to optimise the code, we should profile it, plain and simple. However, sometimes it makes sense just to know ballpark numbers for relative costs of some popular operations, so you won’t do grossly inefficient things from the very beginning (and hopefully won’t need to profile the program later �� )...

GoingNative 53: Learning STL Multithreading--Steve Carroll, Augustin Popa and BryanDiLaura

The new GoingNative is out!

GoingNative 53: Learning STL Multithreading

by Steve Carroll, Augustin Popa and BryanDiLaura

From the video:

In this episode, Billy O'Neal and Stephan T. Lavavej (S.T.L.) talk about the Standard Template Library for multithreading, and how to use it properly. We would love to hear some feedback on this episode! If you liked it, let us know and we may make a follow up!

Quick Q: Is std::vector so much slower than plain arrays?

Quick A: A vector isn’t slower than an array when they do the same things. But it lets you do much more…

Some time ago on SO:

Is std::vector so much slower than plain arrays?

Using the following:

g++ -O3 Time.cpp -I <MyBoost>
./a.out
UseArray completed in 2.196 seconds
UseVector completed in 4.412 seconds
UseVectorPushBack completed in 8.017 seconds
The whole thing completed in 14.626 seconds


So array is twice as quick as vector.

But after looking at the code in more detail this is expected; as you run across the vector twice and the array only once. Note: when you resize() the vector you are not only allocating the memory but also running through the vector and calling the constructor on each member.

Re-Arranging the code slightly so that the vector only initializes each object once:

std::vector<Pixel>  pixels(dimensions * dimensions, Pixel(255,0,0));

Now doing the same timing again:

g++ -O3 Time.cpp -I <MyBoost>
./a.out
UseVector completed in 2.216 seconds


The vector now performance only slightly worse than the array. IMO this difference is insignificant and could be caused by a whole bunch of things not associated with the test.

I would also take into account that you are not correctly initializing/Destroying the Pixel object in the UseArrray() method as neither constructor/destructor is not called (this may not be an issue for this simple class but anything slightly more complex (ie with pointers or members with pointers) will cause problems.

CppCon 2015 Boost Units Library for Correct Code--Robert Ramey

Have you registered for CppCon 2016 in September? Don’t delay – Late registration is open now.

While we wait for this year’s event, we’re featuring videos of some of the 100+ talks from CppCon 2015 for you to enjoy. Here is today’s feature:

Boost Units Library for Correct Code

by Robert Ramey

(watch on YouTube) (watch on Channel 9)

Summary of the talk:

I will give a presentation on the Boost Units library.

This library implements a zero runtime facility for performing dimensional analysis checking and automatic units conversion on C++ expressions. I have found this indispensable for coding scientific programs involving a variety of complex physical units. The documentation of the Boost Units library is totally complete and accurate, but totally inpenetrable. I had to spend way too much time figuring out how to use this. By attending this meeting, you're going to avoid this pain and just get the benefit of simpler programs that contain fewer bugs.

CppCon 2015 Work Stealing--Pablo Halpern

Have you registered for CppCon 2016 in September? Don’t delay – Late registration is open now.

While we wait for this year’s event, we’re featuring videos of some of the 100+ talks from CppCon 2015 for you to enjoy. Here is today’s feature:

Work Stealing

by Pablo Halpern

(watch on YouTube) (watch on Channel 9)

Summary of the talk:

If you've used a C++ parallel-programming system in the last decade, you've probably run across the term "work stealing." Work stealing is a scheduling strategy that automatically balances a parallel workload among available CPUs in a multi-core computer, using computation resources with theoretical utilization that is nearly optimal. Modern C++ parallel template libraries such as Intel(R)'s TBB or Microsoft*'s PPL and language extensions such as Intel(R) Cilk(tm) Plus or OpenMP tasks are implemented using work-stealing runtime libraries.

Most C++ programmers pride themselves on understanding how their programs execute on the underlying machine. Yet, when it comes to parallel programming, many programmers mistakenly believe that if you understand threads, then you understand parallel runtime libraries. In this talk, we'll investigate how work-stealing applies to the semantics of a parallel C++ program. We'll look at the theoretical underpinnings of work-stealing, now it achieves near optimal machine utilization, and a bit about how it's implemented. In the process, we'll discover some pit-falls and how to avoid them. You should leave this talk with a deeper appreciation of how parallel software runs on real systems.

Previous experience with parallel programming is helpful but not required. A medium level of expertise in C++ is assumed.

CppCon 2015 Compile-time contract checking with nn--Jacob Potter

Have you registered for CppCon 2016 in September? Don’t delay – Registration is open now.

While we wait for this year’s event, we’re featuring videos of some of the 100+ talks from CppCon 2015 for you to enjoy. Here is today’s feature:

Compile-time contract checking with nn

by Jacob Potter

(watch on YouTube) (watch on Channel 9)

Summary of the talk:

Tony Hoare called null pointers a “billion-dollar mistake”, but nearly every language in wide use today has them. There have been many efforts to reduce the risk of nulls creeping in where they shouldn't be, but most involve attributes or annotations rather than being part of the type system itself. Can we do better? C++'s customizable value types make it possible to solve this sort of problem.

In this talk, I’ll present a non-nullable pointer wrapper, `nn`, that’s found wide use in Dropbox’s C++ code. This helper lets us use the type system to track pointers that can't be null, and express and enforce contracts at compile time. I’ll go into some depth on the template trickery needed to make things “just work”, the toolchain bugs we found along the way, and how this tool has helped us improve our code.