performance

C++ Weekly Episode 125: The Optimal Way To Return From A Function—Jason Turner

Episode 125 of C++ Weekly.

The Optimal Way To Return From A Function

by Jason Turner

About the show:

In this episode of C++ Weekly Jason investigates the possible methods that one of two string values might be returned from a function. Which option is best? Which option can the compiler optimize the most? Do we take one return path or multiple return paths? Should a ternary be used?

CppCon 2017: Delegate this! Designing with delegates in modern C++—Alfred Bratterud

Have you registered for CppCon 2018 in September? Registration is open now.

While we wait for this year’s event, we’re featuring videos of some of the 100+ talks from CppCon 2017 for you to enjoy. Here is today’s feature:

Delegate this! Designing with delegates in modern C++

by Alfred Bratterud

(watch on YouTube) (watch on Channel 9)

Summary of the talk:

Designing a fast IP stack from scratch is hard. Using delegates made it all easier for IncludeOS, the open source library operating system written from scratch in modern C++. Our header-only delegates are just as fast as C-style function pointers, compatible with std::function, and allows any object to delegate work to stateful member functions without knowing anything about the class they belong to. We use delegates for everything from routing packets to creating REST endpoints, and most importantly to tie the whole IP stack together. In this talk we’ll show you how we use delegates in IncludeOS, discuss pitfalls and alternatives, and give you all you need to get started.

CppCon 2017: Coroutines: what can’t they do?—Toby Allsopp

Have you registered for CppCon 2018 in September? Registration is open now.

While we wait for this year’s event, we’re featuring videos of some of the 100+ talks from CppCon 2017 for you to enjoy. Here is today’s feature:

Coroutines: what can't they do?

by Toby Allsopp

(watch on YouTube) (watch on Channel 9)

Summary of the talk:

Coroutines are coming. They're coming for your asynchronous operations. They're coming for your lazy generators. This much we know. But once they're here, will they be satisfied with these offerings? They will not. They will require feeding, lest they devour our very souls. We present some fun ways to keep their incessant hunger at bay. I, for one, welcome our new coroutine overlords.

The Coroutines Technical Specification is an experimental extension to the C++ language that allows functions to be suspended and resumed, with the primary aim of simplifying code that invokes asynchronous operations. We present a short introduction to Coroutines followed by some possibly non-obvious ways they can help to simplify your code.

Have you ever wanted to elegantly compose operations that might fail? Coroutines can help. Have you ever wished for a zero-overhead type-erased function wrapper? Coroutines can help. We show you how and more.

CppCon 2017: Concurrency, Parallelism and Coroutines—Anthony Williams

Have you registered for CppCon 2018 in September? Early bird registration is open now.

While we wait for this year’s event, we’re featuring videos of some of the 100+ talks from CppCon 2017 for you to enjoy. Here is today’s feature:

Concurrency, Parallelism and Coroutines

by Anthony Williams

(watch on YouTube) (watch on Channel 9)

Summary of the talk:

C++17 is adding parallel overloads of most of the Standard Library algorithms. There is a TS for Concurrency in C++ already published, and a TS for Coroutines in C++ and a second TS for Concurrency in C++ in the works.

What does all this mean for programmers? How are they all related? How do coroutines help with parallelism?

This session will attempt to answer these questions and more. We will look at the implementation of parallel algorithms, and how continuations, coroutines and work-stealing fit together. We will also look at how this meshes with the Grand Unified Executors Proposal, and how you will be able to take advantage of all this as an application developer.

Parallel STL And Filesystem: Files Word Count Example—Bartlomiej Filipek

Now with some more numbers.

Parallel STL And Filesystem: Files Word Count Example

by Bartlomiej Filipek

From the article:

Last week you might have read about a few examples of parallel algorithms. Today I have one more application that combines the ideas from the previous post.

We’ll use parallel algorithms and the standard filesystem to count words in all text files in a given directory...

The surprisingly high cost of static-lifetime constructors—Arthur O’Dwyer

Today we look at compile time performance.

The surprisingly high cost of static-lifetime constructors

by Arthur O’Dwyer

From the article:

I was looking at HyperRogue again this week (see my previous post). It has a really nice localization framework: every message in the game can be translated just by adding a lookup entry to a single file (like, for the Czech translation, you add entries to language-cz.cpp); and then during the build process, all the language-??.cpp files are collated together and used to produce a single language-data.cpp file with a lookup table from each English message to the same message in every other language. (Seeing all the messages at once allows us to report on how “complete” each translation is, relative to the others.)...

 

Examples of Parallel Algorithms From C++17—Bartlomiej Filipek

Do you know them?

Examples of Parallel Algorithms From C++17

by Bartlomiej Filipek

From the article:

MSVC (VS 2017 15.7, end of June 2018) is as far as I know the only major compiler/STL implementation that has parallel algorithms. Not everything is done, but you can use a lot of algorithms and apply std::execution::par on them!

Have a look at few examples I managed to run...

CppCon 2017: Going Nowhere Faster—Chandler Carruth

Have you registered for CppCon 2018 in September? Early bird registration is open now.

While we wait for this year’s event, we’re featuring videos of some of the 100+ talks from CppCon 2017 for you to enjoy. Here is today’s feature:

Going Nowhere Faster

by Chandler Carruth

(watch on YouTube) (watch on Channel 9)

Summary of the talk:

You care about the performance of your C++ code. You have followed basic patterns to make your C++ code efficient. You profiled your application or server and used the appropriate algorithms to minimize how much work is done and the appropriate data structures to make it fast. You even have reliable benchmarks to cover the most critical and important parts of the system for performance. But you're profiling the benchmark and need to squeeze even more performance out of it... What next?

This talk dives into the performance and optimization concerns of the important, performance critical loops in your program. How do modern CPUs execute these loops, and what influences their performance? What can you do to make them faster? How can you leverage the C++ compiler to do this while keeping the code maintainable and clean? What optimization techniques do modern compilers make available to you? We'll cover all of this and more, with piles of code, examples, and even live demo.

While the talk will focus somewhat on x86 processors and the LLVM compiler, but everything will be broadly applicable and basic mappings for other processors and toolchains will be discussed throughout. However, be prepared for a lot of C++ code and assembly.

uninitialized_tag in C++—Marius Elvert

Optimise or not?

uninitialized_tag in C++

by Marius Elvert

From the article:

No doubt, C++ is one of those languages you can use to squeeze out every last drop of your CPU’s processing power. On the other hand, it also allows a high amount of abstraction. However, micro-optimization seldom works well with nice abstractions...

Parallel Coding: From 90x Performance Loss To 2x Improvement—“No Bugs” Hare

Part 2!

Parallel Coding: From 90x Performance Loss To 2x Improvement

by "No Bugs" Hare

From the article:

In my previous post, we have observed pretty bad results for calculations as we tried to use mutexes and even atomics to do things parallel. OTOH, it was promised to show how parallel <algorithm> CAN be used both correctly and efficiently (that is, IF you need it, which is a separate question); this is what we’ll start discussing within this post...