Articles & Books

Parallel Coding: From 90x Performance Loss To 2x Improvement--"No Bugs" Hare

Part 2!

Parallel Coding: From 90x Performance Loss To 2x Improvement

by "No Bugs" Hare

From the article:

In my previous post, we have observed pretty bad results for calculations as we tried to use mutexes and even atomics to do things parallel. OTOH, it was promised to show how parallel <algorithm> CAN be used both correctly and efficiently (that is, IF you need it, which is a separate question); this is what we’ll start discussing within this post...

Quick Q: Why does unary operator & not require a complete type?

Quick A: It only need to take the address.

Recently on SO:

Why does unary operator & not require a complete type?

What if stru has overloaded operator&()?

Then it is unspecified whether the overload will be called (See Oliv's comment for standard quote).

How could unary operator & does not require a complete type?

That's how the standard has defined the language. The built-in address-of operator doesn't need to know the definition of the type, since that has no effect on where to get the address of the object.

One consideration for why it is a good thing: Compatibility with C.

Quick Q: Why does shared_ptr needs to hold reference counting for weak_ptr?

Quick A: To know when to deallocate the control block.

Recently on SO:

Why does shared_ptr needs to hold reference counting for weak_ptr?

The reference count controls the lifetime of the pointed-to-object. The weak count does not, but does control (or participate in control of) the lifetime of the control block.

If the reference count goes to 0, the object is destroyed, but not necessarily deallocated. When the weak count goes to 0 (or when the reference count goes to 0, if there are no weak_ptrs when that happens), the control block is destroyed and deallocated, and the storage for the object is deallocated if it wasn't already.

The separation between destroying and deallocating the pointed-to-object is an implementation detail you don't need to care about, but it is caused by using make_shared.

If you do

shared_ptr<int> myPtr(new int{10});

you allocate the storage for the int, then pass that into the shared_ptr constructor, which allocates storage for the control block separately. In this case, the storage for the int can be deallocated as early as possible: as soon as the reference count hits 0, even if there is still a weak count.

If you do

auto myPtr = make_shared<int>(10);

then make_shared might perform an optimisation where it allocates the storage for the int and the control block in one go. This means that the storage for the int can't be deallocated until the storage for the control block can also be deallocated. The lifetime of the int ends when the reference count hits 0, but the storage for it is not deallocated until the weak count hits 0.

Is that clear now?

Using Parallel Without a Clue: 90x Performance Loss Instead of 8x Gain--"No Bugs" Hare

Be careful.

Using Parallel <algorithm> Without a Clue: 90x Performance Loss Instead of 8x Gain

by "No Bugs" Hare

From the article:

With C++17 supporting1 parallel versions of the std:: algorithms, there are quite a few people saying “hey, it became really simple to write parallel code!”.

Just as one example, [MSDN] wrote: “Only a few years ago, writing parallel code in C++ was a domain of the experts.” (implying that these days, to write parallel code, you don’t need to be an expert anymore).

Inquisitive hare:
“I made an experiment which demonstrates Big Fat Dangers(tm) of implying that parallelization can be made as simple as just adding a policy parameter to your std:: call.
I always had my extremely strong suspicions about this position being deadly wrong, but recently I made an experiment which demonstrates Big Fat Dangers(tm) of implying that parallelization can be made as simple as just adding a policy parameter to your std:: call...

Guidelines For Rvalue References In APIs--Jonathan Müller

Everything you need to know.

Guidelines For Rvalue References In APIs

by Jonathan Müller

From the article:

I’ll be giving a talk at ACCU about when to use which pointer types and why.

While working on that I made some guidelines for rvalue references in interfaces which didn’t quite fit the talk, so I’m writing about them here.

When should you use rvalue references as function parameters?

When as return types?

What are ref-qualified member functions and when and how should you use them?

Let’s tackle it one by one...

My Little (String) Optimization, Part 2--Jordan Rose

Performance!

My Little (String) Optimization, Part 2

by Jordan Rose

From the article:

Previously, I talked about how Clang is smart enough to optimize a series of comparisons against constant strings in C++ by starting out with a switch on the length. I left off with the idea that while this is good, you might be able to do better if your strings have a unique character at a certain offset. Today we’re going to see what that looks like.

Freestanding trip report: emBO++ and Jacksonville wg21 2018 experience-Ben Craig

A good experience.

Freestanding trip report: emBO++ and Jacksonville wg21 2018 experience

by Ben Craig

From the article:

I'm the author of P0829, Freestanding Proposal. The tl;dr of the paper is that it standardizes a subset of the library suitable for kernel and embedded programming. R0 of this poorly titled paper was reasonably well received in the Albuquerque 2017 meeting. I was encouraged to send it out to a wider audience... and so I did. One of the people that I sent it to was Odin Holmes, and that got me an invitation to emBO++, my first speech at a public conference. This conference was the week prior to the Jacksonville meeting, so I ended up flying from Bochum to Jacksonville without going home first...