performance

The surprisingly high cost of static-lifetime constructors—Arthur O’Dwyer

Today we look at compile time performance.

The surprisingly high cost of static-lifetime constructors

by Arthur O’Dwyer

From the article:

I was looking at HyperRogue again this week (see my previous post). It has a really nice localization framework: every message in the game can be translated just by adding a lookup entry to a single file (like, for the Czech translation, you add entries to language-cz.cpp); and then during the build process, all the language-??.cpp files are collated together and used to produce a single language-data.cpp file with a lookup table from each English message to the same message in every other language. (Seeing all the messages at once allows us to report on how “complete” each translation is, relative to the others.)...

 

Examples of Parallel Algorithms From C++17—Bartlomiej Filipek

Do you know them?

Examples of Parallel Algorithms From C++17

by Bartlomiej Filipek

From the article:

MSVC (VS 2017 15.7, end of June 2018) is as far as I know the only major compiler/STL implementation that has parallel algorithms. Not everything is done, but you can use a lot of algorithms and apply std::execution::par on them!

Have a look at few examples I managed to run...

CppCon 2017: Going Nowhere Faster—Chandler Carruth

Have you registered for CppCon 2018 in September? Early bird registration is open now.

While we wait for this year’s event, we’re featuring videos of some of the 100+ talks from CppCon 2017 for you to enjoy. Here is today’s feature:

Going Nowhere Faster

by Chandler Carruth

(watch on YouTube) (watch on Channel 9)

Summary of the talk:

You care about the performance of your C++ code. You have followed basic patterns to make your C++ code efficient. You profiled your application or server and used the appropriate algorithms to minimize how much work is done and the appropriate data structures to make it fast. You even have reliable benchmarks to cover the most critical and important parts of the system for performance. But you're profiling the benchmark and need to squeeze even more performance out of it... What next?

This talk dives into the performance and optimization concerns of the important, performance critical loops in your program. How do modern CPUs execute these loops, and what influences their performance? What can you do to make them faster? How can you leverage the C++ compiler to do this while keeping the code maintainable and clean? What optimization techniques do modern compilers make available to you? We'll cover all of this and more, with piles of code, examples, and even live demo.

While the talk will focus somewhat on x86 processors and the LLVM compiler, but everything will be broadly applicable and basic mappings for other processors and toolchains will be discussed throughout. However, be prepared for a lot of C++ code and assembly.

uninitialized_tag in C++—Marius Elvert

Optimise or not?

uninitialized_tag in C++

by Marius Elvert

From the article:

No doubt, C++ is one of those languages you can use to squeeze out every last drop of your CPU’s processing power. On the other hand, it also allows a high amount of abstraction. However, micro-optimization seldom works well with nice abstractions...

Parallel Coding: From 90x Performance Loss To 2x Improvement—“No Bugs” Hare

Part 2!

Parallel Coding: From 90x Performance Loss To 2x Improvement

by "No Bugs" Hare

From the article:

In my previous post, we have observed pretty bad results for calculations as we tried to use mutexes and even atomics to do things parallel. OTOH, it was promised to show how parallel <algorithm> CAN be used both correctly and efficiently (that is, IF you need it, which is a separate question); this is what we’ll start discussing within this post...

Using Parallel Without a Clue: 90x Performance Loss Instead of 8x Gain—“No Bugs” Hare

Be careful.

Using Parallel <algorithm> Without a Clue: 90x Performance Loss Instead of 8x Gain

by "No Bugs" Hare

From the article:

With C++17 supporting1 parallel versions of the std:: algorithms, there are quite a few people saying “hey, it became really simple to write parallel code!”.

Just as one example, [MSDN] wrote: “Only a few years ago, writing parallel code in C++ was a domain of the experts.” (implying that these days, to write parallel code, you don’t need to be an expert anymore).

Inquisitive hare:
“I made an experiment which demonstrates Big Fat Dangers(tm) of implying that parallelization can be made as simple as just adding a policy parameter to your std:: call.
I always had my extremely strong suspicions about this position being deadly wrong, but recently I made an experiment which demonstrates Big Fat Dangers(tm) of implying that parallelization can be made as simple as just adding a policy parameter to your std:: call...

My Little (String) Optimization, Part 2—Jordan Rose

Performance!

My Little (String) Optimization, Part 2

by Jordan Rose

From the article:

Previously, I talked about how Clang is smart enough to optimize a series of comparisons against constant strings in C++ by starting out with a switch on the length. I left off with the idea that while this is good, you might be able to do better if your strings have a unique character at a certain offset. Today we’re going to see what that looks like.

HPX V1.1 released—STE||AR Group

The STE||AR Group has released V1.0 of HPX -- A C++ Standard library for parallelism and concurrency.

HPX V1.1 Released

The newest version of HPX (V1.1) is now available for download! Please see here for the release notes.

HPX exposes an API fully conforming to the concurrency related parts of the C++11/C++14/C++17 standards, extended and applied to distributed and heterogeneous computing, and aligned with the ongoing standardization discussions.

    HPX is a general purpose parallel C++ runtime system for applications of any scale. It implements all of the related facilities as defined by the C++ Standard. As of this writing, HPX provides the only widely available open-source implementation of the new C++17 parallel algorithms. Additionally, HPX implements functionalities proposed as part of the ongoing C++ standardization process, such as large parts of the C++ Concurrency TS, task blocks, data-parallel algorithms, executors, index-based parallel for loops, and many more. It also extends the existing C++ Standard APIs to the distributed case (e.g. compute clusters) and for heterogeneous systems (e.g. GPUs).
    HPX seamlessly enables a new asynchronous C++ Standard Programming Model which tends to improve the parallel efficiency of our applications and helps reduce complexities usually associated with concurrency