performance

CppCon 2016: Implementing `static` control flow in C++14—Vittorio Romeo

Have you registered for CppCon 2017 in September? Don’t delay – Registration is open now.

While we wait for this year’s event, we’re featuring videos of some of the 100+ talks from CppCon 2016 for you to enjoy. Here is today’s feature:

Implementing `static` control flow in C++14

by Vittorio Romeo

(watch on YouTube) (watch on Channel 9)

Summary of the talk:

There has always been great interest in imperative compile-time control flow: as an example, consider all the existing `static_if` proposals and the recently accepted `constexpr_if` construct for C++17.

What if you were told that it is actually possible to implement imperative control flow in C++14?

In this tutorial, the implementation and design of a compile-time `static_if` branching construct and of a compile-time `static_for` iteration construct will be shown and analyzed. These constructs will then be compared to traditional solutions and upcoming C++17 features, examining advantages and drawbacks.

itCppCon17: C++ executors to enable heterogeneous computing in tomorrow’s C++ today—Michael Wong

Videos of the Italian C++ Conference 2017 are popping up. Here is the keynote:

C++ executors to enable heterogeneous computing in tomorrow's C++ today

by Michael Wong

Slides

Summary of the talk:

For a long time, C++ has been outpaced by other models such as CUDA, OpenMP, OpenCL, HSA, which has offered Heterogeneous computing capability to enable dispatch to GPU, DSP, FPGA or other accelerators.
SG14 is an ISO C++ SG that works on low-latency, in the areas of Games, Financial, Embedded programming. One of their mandate is support of heterogeneous in Native C++, without having to drop to some other model as is needed today.
This talk will describe the first and possibly the most important step towards supporting heterogeneous computing with a description of C++ executors, an interface between concurrency constructs, and the agents/resources, one of which can be a GPU core, or a SIMD unit.  I have led a small group of experts from Google, Nvidia, Codeplay, and NASDAQ, to define a specification to enable separation of concerns in defining where, when, and how execution is done in service of the constructs in existing C++ standard library, Concurrency, Parallelism, Transactional Memory, and Networking TS.
This talk will help you understand these TSes which currently only works on CPUs and how they will tie together in future to enable execution on accelerators based on C++ models we already have today such as SYCL, HPX, Kokkos, and Raja. This will enable native C++ to be usable on machine learning algorithms on future self-driving cars, or medical devices.

CppCon 2016: Constant Fun—Dietmar Kühl

Have you registered for CppCon 2017 in September? Don’t delay – Registration is open now.

While we wait for this year’s event, we’re featuring videos of some of the 100+ talks from CppCon 2016 for you to enjoy. Here is today’s feature:

Constant Fun

by Dietmar Kühl

(watch on YouTube) (watch on Channel 9)

Summary of the talk:

This presentation discusses why it is useful to move some of the processing to compile time and shows some applications of doing so. In particular it shows how to create associative containers created at compile time and what is needed from the types involved to make it possible. The presentation also does some analysis to estimate the costs in terms for compile-time and object file size.

Specifically, the presentation discusses:
- implications of static and dynamic initialization – the C++ language rules for implementing constexpr functions and classes supporting constexpr objects.
- differences in error handling with constant expressions.
- sorting sequences at compile time and the needed infrastructure – creating constant associative containers with compile-time and run-time look-up.

CppCon 2016: The C++17 Parallel Algorithms Library and Beyond—Bryce Adelstein Lelbach

Have you registered for CppCon 2017 in September? Don’t delay – Registration is open now.

While we wait for this year’s event, we’re featuring videos of some of the 100+ talks from CppCon 2016 for you to enjoy. Here is today’s feature:

The C++17 Parallel Algorithms Library and Beyond

by Bryce Adelstein Lelbach

(watch on YouTube) (watch on Channel 9)

Summary of the talk:

One of the major library features in C++17 is a parallel algorithms library (formerly the Parallelism Technical Specification v1). The parallel algorithms library has both parallel versions of the existing algorithms in the standard library and a handful of new algorithms inspired by common patterns from parallel programming (such as std::reduce() and std::transform_reduce()).

We’ll talk about what’s in the parallel algorithms library, and how to utilize it in your code today. Also, we’ll discuss some exciting future developments relating to the parallel algorithms library which are targeted for the second version of the Parallelism Technical Specification – executors, and asynchronous parallel algorithms.

Future Ruminations—Sean Parent

This post is a lengthy answer to a question from Alisdair Meredith via Twitter

Future Ruminations

by Sean Parent

From the article:

The question is regarding the numerous proposals for a better future class template for C++, including the proposal from Felix Petriconi, David Sankel, and myself.

It is a valid question for any endeavor. To answer it, we need to define what we mean by a future so we can place bounds on the solution. We also need to understand the problems that a future is trying to solve, so we can determine if a future is, in fact, a useful construct for solving those problems.

The proposal started with me trying to solve a fairly concrete problem; how to take a large, heavily threaded application, and make it run in a single threaded environment (specifically, compiled to asm.js with the Emscripten compiler) but also be able to scale to devices with many cores. I found the current standard and boost implementation of futures to be lacking. I open sourced my work on a better solution, and discussed this in my Better Code: Concurrency talk. Felix heard my CppCast interview on the topic, and became the primary contributor to the project.

CppCon 2016: Improving Performance Through Compiler Switches…—Tim Haines

Have you registered for CppCon 2017 in September? Don’t delay – Registration is open now.

While we wait for this year’s event, we’re featuring videos of some of the 100+ talks from CppCon 2016 for you to enjoy. Here is today’s feature:

Improving Performance Through Compiler Switches...

by Tim Haines

(watch on YouTube) (watch on Channel 9)

Summary of the talk:

Much attention has been given to what modern optimizing compilers can do with your code, but little is ever said as to how to make the compiler invoke these optimizations. Of course, the answer is compiler switches! But which ones are needed to generate the best code? How many switches does it take to get the best performance? How do different compilers compare when using the same set of switches? I explore all of these questions and more to shed light on the interplay between C++ compilers and modern hardware drawing on my work in high performance scientific computing.

Enabling modern optimizing compilers to exploit current-generation processor features is critical to success in this field. Yet, modernizing aging codebases to utilize these processor features is a daunting task that often results in non-portable code. Rather than relying on hand-tuned optimizations, I explore the ability of today's compilers to breathe new life into old code. In particular, I examine how industry-standard compilers like those from gcc, clang, and Intel perform when compiling operations common to scientific computing without any modifications to the source code. Specifically, I look at streaming data manipulations, reduction operations, compute-intensive loops, and selective array operations. By comparing the quality of the code generated and time to solution from these compilers with various optimization settings for several different C++ implementations, I am able to quantify the utility of each compiler switch in handling varying degrees of abstractions in C++ code. Finally, I measure the effects of these compiler settings on the up-and-coming industrial benchmark High Performance Conjugate Gradient that focuses more on the effects of the memory subsystem than current benchmarks like the traditional High Performance LinPACK suite.

5 years of Meeting C++

Meeting C++ exists now for 5 years, lets celebrate on the blog:

5 years of Meeting C++

by Jens Weller

From the article:

Just a little bit more then 5 years ago, Meeting C++ went public. Since then, it has been a wild ride and huge success. Today, Meeting C++ reaches over 50k in social media, the conference it self has grown from 150 to 600 in its 5 editions...

C++ Weekly Episode 70: C++ IIFE in quick-bench.com—Jason Turner

Episode 70 of C++ Weekly.

C++ IIFE in quick-bench.com

by Jason Turner

About the show:

We are commonly taught to const everything that we can in C++. One way to accomplish this goal in the post-C++11 world is to use a immediately invoked lambda (equivalent to an IIFE in the JavaScript world) that generates a value for us, which we assign to a const value. But what impact does this design decision have on the quality of code generated and the performance? In this episode of C++ Weekly we use the new website quick-bench to test the various options available.

CppCon 2016: Deploying C++ modules to 100s of millions of lines of code—Manuel Klimek

Have you registered for CppCon 2017 in September? Don’t delay – Registration is open now.

While we wait for this year’s event, we’re featuring videos of some of the 100+ talks from CppCon 2016 for you to enjoy. Here is today’s feature:

Deploying C++ modules to 100s of millions of lines of code

by Manuel Klimek

(watch on YouTube) (watch on Channel 9)

Summary of the talk:

Compile times are a pain point for C++ programmers all over the world. Google is no exception.. We have a single unified codebase with hundreds of millions of lines of C++ code, all of it built from source. As the size of the codebase and the depth of interrelated interfaces exposed through textually included headers grew, the scaling of compiles became a critical issue.

Years ago we started working to build technology in the Clang compiler that could help scale builds more effectively than textual inclusion. This is the core of C++ Modules: moving away from the model of textual inclusion. We also started preparing our codebase to migrate to this technology en masse, and through a highly automated process. It's been a long time and a tremendous effort, but we'd like to share where we are as well as what comes next.

In this talk, we will outline the core C++ Modules technology in Clang. This is just raw technology at this stage, not an integrated part of the C++ programming language. That part is being worked on by a large group of people in the ISO C++ standards committee. But we want to share how Google is using this raw technology internally to make today's C++ compiles faster, what it took to get there, and how you too can take advantage of these features. We will cover everything from the details of migrating a codebase of this size to use a novel compilation model to the ramifications for both local and distributed build systems. We hope to give insight into the kinds of benefits that technology like C++ Modules can bring to a large scale C++ development environment.

CppCon 2016: AAAARGH!? Adopting Almost Always Auto Reinforces Good Habits!?—Andy Bond

Have you registered for CppCon 2017 in September? Don’t delay – Registration is open now.

While we wait for this year’s event, we’re featuring videos of some of the 100+ talks from CppCon 2016 for you to enjoy. Here is today’s feature:

AAAARGH!? Adopting Almost Always Auto Reinforces Good Habits!?

by Andy Bond

(watch on YouTube) (watch on Channel 9)

Summary of the talk:

Prominent members of the C++ community are advocating the "almost-always-auto" idiom, but there are understandable concerns from many about its implications. This case study will demonstrate how it may be applied in different situations, suggest ways to avoid performance penalties, introduce algorithms to minimize the "almost" part, and discuss the overall impact.