performance

A zero cost abstraction?--Josh Peterson

Safe and performant?

A zero cost abstraction?

by Josh Peterson

From the article:

Recently Joachim (CTO at Unity) has been talking about “performance by default”, the mantra that software should be as fast as possible from the outset. This is driving the pretty cool stuff many at Unity are doing around things like ECS, the C# job system, and Burst (find lots more about that here).

One question Joachim has asked internally of Unity developers is (I’m paraphrasing here): “What is the absolute lower bound of time this code could use?” This strikes me as a really useful way to think about performance. The question changes from “How fast is this?” to “How fast could this be?”. If the answers to those two questions are not the same, the next question is “Do we really need the additional overhead?”

Another way to think about this is to consider the zero-cost abstraction, a concept much discussed in the C++ and Rust communities. Programmers are always building abstractions, and those abstractions often lead to the difference between “how fast it is” and “how fast it could be”. We want to provide useful abstractions that don’t hurt performance...

The Amazing Performance of C++17 Parallel Algorithms, is it Possible?--Bartlomiej Filipek

Are you using it?

The Amazing Performance of C++17 Parallel Algorithms, is it Possible?

by Bartlomiej Filipek

From the article:

With the addition of Parallel Algorithms in C++17, you can now easily update your “computing” code to benefit from parallel execution. In the article, I’d like to examine one STL algorithm which naturally exposes the idea of independent computing. If your machine has 10-core CPU, can you always expect to get 10x speed up? Maybe more? Maybe less? Let’s play with this topic.

HPX V1.2 released -- STE||AR Group

The STE||AR Group has released V1.2 of HPX -- A C++ Standard library for parallelism and concurrency.

HPX V1.2 Released

The newest version of HPX (V1.2) is now available for download! Please see here for the release notes. This release is the first in our more frequent release schedule. We are aiming to produce one release every six months in an effort to get new features and stable releases out to users more quickly.

    HPX exposes an API fully conforming to the parts of the C++11/C++14/C++17 standards that are related to parallelism and concurrency, extended and applied to distributed and heterogeneous computing, and aligned with the ongoing standardization discussions.

    HPX is a general purpose parallel C++ runtime system for applications of any scale. It implements all of the related facilities as defined by the C++ Standard. As of this writing, HPX provides the only widely available open-source implementation of the new C++17 parallel algorithms. Additionally, HPX implements functionalities proposed as part of the ongoing C++ standardization process, such as large parts of the C++ Concurrency TS, Parallelism TS V2, data-parallel algorithms, executors, and many more. It also extends the existing C++ Standard APIs to the distributed case (e.g. compute clusters) and for heterogeneous systems (e.g. GPUs).
    HPX seamlessly enables a new asynchronous C++ Standard Programming Model which tends to improve the parallel efficiency of our applications and helps reducing complexities usually associated with parellism and concurrency.

 

Using C++17 Parallel Algorithms for Better Performance--Billy O’Neal

Are you using the parallel capacities of the std?

Using C++17 Parallel Algorithms for Better Performance

by Billy O’Neal

From the article:

C++17 added support for parallel algorithms to the standard library, to help programs take advantage of parallel execution for improved performance. MSVC first added experimental support for some algorithms in 15.5, and the experimental tag was removed in 15.7.

The interface described in the standard for the parallel algorithms doesn’t say exactly how a given workload is to be parallelized. In particular, the interface is intended to express parallelism in a general form that works for heterogeneous machines, allowing SIMD parallelism like that exposed by SSE, AVX, or NEON, vector “lanes” like that exposed in GPU programming models, and traditional threaded parallelism.

Our parallel algorithms implementation currently relies entirely on library support, not on special support from the compiler. This means our implementation will work with any tool currently consuming our standard library, not just MSVC’s compiler. In particular, we test that it works with Clang/LLVM and the version of EDG that powers Intellisense...

First Meeting Embedded Conference Schedule available

Meeting Embedded is a new conference with a focus on embedded, hosting lots of talks connected to embedded & C++, plus a keynote by Dan Saks!

Meeting Embedded 2018

Schedule

Organized by Jens Weller

From the article:

Meeting Embedded 2018 is a one day event focused on hard and software development for embedded and the IoT. Meeting Embedded will be at Vienna House Andel's Berlin Hotel on the 14th of November, right in front of Meeting C++!

Quick Q: What are copy elision and return value optimization?

Quick A: they are common optimizations that a compiler can do behind the scenes to avoid copying in certain cases.

Recently on SO:

What are copy elision and return value optimization?

Copy elision is an optimization implemented by most compilers to prevent extra (potentially expensive) copies in certain situations. It makes returning by value or pass-by-value feasible in practice (restrictions apply).

It's the only form of optimization that elides (ha!) the as-if rule - copy elision can be applied even if copying/moving the object has side-effects.

The following example taken from Wikipedia:

struct C {
  C() {}
  C(const C&) { std::cout << "A copy was made.\n"; }
};

C f() {
  return C();
}

int main() {
  std::cout << "Hello World!\n";
  C obj = f();
}

Depending on the compiler & settings, the following outputs are all valid:

Hello World!
A copy was made.
A copy was made.
Hello World!
A copy was made.
Hello World!

This also means fewer objects can be created, so you also can't rely on a specific number of destructors being called. You shouldn't have critical logic inside copy/move-constructors or destructors, as you can't rely on them being called.

If a call to a copy or move constructor is elided, that constructor must still exist and must be accessible. This ensures that copy elision does not allow copying objects which are not normally copyable, e.g. because they have a private or deleted copy/move constructor.

C++17: As of C++17, Copy Elision is guaranteed when an object is returned directly:

struct C {
  C() {}
  C(const C&) { std::cout << "A copy was made.\n"; }
};

C f() {
  return C(); //Definitely performs copy elision
}
C g() {
    C c;
    return c; //Maybe performs copy elision
}

int main() {
  std::cout << "Hello World!\n";
  C obj = f(); //Copy constructor isn't called
}

CppCon 2017: How to Write a Custom Allocator--Bob Steagall

Have you registered for CppCon 2018 in September? Registration is open now.

While we wait for this year’s event, we’re featuring videos of some of the 100+ talks from CppCon 2017 for you to enjoy. Here is today’s feature:

How to Write a Custom Allocator

by Bob Steagall

(watch on YouTube) (watch on Channel 9)

Summary of the talk:

You'd like to improve the performance of your application with regard to memory management, and you believe this can be accomplished by writing a custom allocator. But where do you start? Modern C++ brings many improvements to the standard allocator model, but with those improvements come several issues that must be addressed when designing a new allocator.

This talk will provide guidance on how to write custom allocators for the C++14/C++17 standard containers. It will cover the requirements specified by the standard, and will describe the facilities provided by the standard to support the new allocator model and allocator-aware containers. We'll look at the issues of allocator identity and propagation, and examine their implications for standard library users, standard library implementers, and custom allocator implementers. We'll see how a container uses its allocator, including when and how a container's allocator instance propagates. This will give us the necessary background to describe allocators that implement unusual semantics, such as a stateful allocator type whose instances compare non-equal. Finally, the talk will provide some guidelines for how to specify a custom allocator's public interface based on the semantics it provides.