performance

CppCon 2016: Improving Performance Through Compiler Switches…—Tim Haines

Have you registered for CppCon 2017 in September? Don’t delay – Registration is open now.

While we wait for this year’s event, we’re featuring videos of some of the 100+ talks from CppCon 2016 for you to enjoy. Here is today’s feature:

Improving Performance Through Compiler Switches...

by Tim Haines

(watch on YouTube) (watch on Channel 9)

Summary of the talk:

Much attention has been given to what modern optimizing compilers can do with your code, but little is ever said as to how to make the compiler invoke these optimizations. Of course, the answer is compiler switches! But which ones are needed to generate the best code? How many switches does it take to get the best performance? How do different compilers compare when using the same set of switches? I explore all of these questions and more to shed light on the interplay between C++ compilers and modern hardware drawing on my work in high performance scientific computing.

Enabling modern optimizing compilers to exploit current-generation processor features is critical to success in this field. Yet, modernizing aging codebases to utilize these processor features is a daunting task that often results in non-portable code. Rather than relying on hand-tuned optimizations, I explore the ability of today's compilers to breathe new life into old code. In particular, I examine how industry-standard compilers like those from gcc, clang, and Intel perform when compiling operations common to scientific computing without any modifications to the source code. Specifically, I look at streaming data manipulations, reduction operations, compute-intensive loops, and selective array operations. By comparing the quality of the code generated and time to solution from these compilers with various optimization settings for several different C++ implementations, I am able to quantify the utility of each compiler switch in handling varying degrees of abstractions in C++ code. Finally, I measure the effects of these compiler settings on the up-and-coming industrial benchmark High Performance Conjugate Gradient that focuses more on the effects of the memory subsystem than current benchmarks like the traditional High Performance LinPACK suite.

5 years of Meeting C++

Meeting C++ exists now for 5 years, lets celebrate on the blog:

5 years of Meeting C++

by Jens Weller

From the article:

Just a little bit more then 5 years ago, Meeting C++ went public. Since then, it has been a wild ride and huge success. Today, Meeting C++ reaches over 50k in social media, the conference it self has grown from 150 to 600 in its 5 editions...

C++ Weekly Episode 70: C++ IIFE in quick-bench.com—Jason Turner

Episode 70 of C++ Weekly.

C++ IIFE in quick-bench.com

by Jason Turner

About the show:

We are commonly taught to const everything that we can in C++. One way to accomplish this goal in the post-C++11 world is to use a immediately invoked lambda (equivalent to an IIFE in the JavaScript world) that generates a value for us, which we assign to a const value. But what impact does this design decision have on the quality of code generated and the performance? In this episode of C++ Weekly we use the new website quick-bench to test the various options available.

CppCon 2016: Deploying C++ modules to 100s of millions of lines of code—Manuel Klimek

Have you registered for CppCon 2017 in September? Don’t delay – Registration is open now.

While we wait for this year’s event, we’re featuring videos of some of the 100+ talks from CppCon 2016 for you to enjoy. Here is today’s feature:

Deploying C++ modules to 100s of millions of lines of code

by Manuel Klimek

(watch on YouTube) (watch on Channel 9)

Summary of the talk:

Compile times are a pain point for C++ programmers all over the world. Google is no exception.. We have a single unified codebase with hundreds of millions of lines of C++ code, all of it built from source. As the size of the codebase and the depth of interrelated interfaces exposed through textually included headers grew, the scaling of compiles became a critical issue.

Years ago we started working to build technology in the Clang compiler that could help scale builds more effectively than textual inclusion. This is the core of C++ Modules: moving away from the model of textual inclusion. We also started preparing our codebase to migrate to this technology en masse, and through a highly automated process. It's been a long time and a tremendous effort, but we'd like to share where we are as well as what comes next.

In this talk, we will outline the core C++ Modules technology in Clang. This is just raw technology at this stage, not an integrated part of the C++ programming language. That part is being worked on by a large group of people in the ISO C++ standards committee. But we want to share how Google is using this raw technology internally to make today's C++ compiles faster, what it took to get there, and how you too can take advantage of these features. We will cover everything from the details of migrating a codebase of this size to use a novel compilation model to the ramifications for both local and distributed build systems. We hope to give insight into the kinds of benefits that technology like C++ Modules can bring to a large scale C++ development environment.

CppCon 2016: AAAARGH!? Adopting Almost Always Auto Reinforces Good Habits!?—Andy Bond

Have you registered for CppCon 2017 in September? Don’t delay – Registration is open now.

While we wait for this year’s event, we’re featuring videos of some of the 100+ talks from CppCon 2016 for you to enjoy. Here is today’s feature:

AAAARGH!? Adopting Almost Always Auto Reinforces Good Habits!?

by Andy Bond

(watch on YouTube) (watch on Channel 9)

Summary of the talk:

Prominent members of the C++ community are advocating the "almost-always-auto" idiom, but there are understandable concerns from many about its implications. This case study will demonstrate how it may be applied in different situations, suggest ways to avoid performance penalties, introduce algorithms to minimize the "almost" part, and discuss the overall impact.

CppCon 2016: The strange details of std::string at Facebook—Nicholas Ormrod

Have you registered for CppCon 2017 in September? Don’t delay – Registration is open now.

While we wait for this year’s event, we’re featuring videos of some of the 100+ talks from CppCon 2016 for you to enjoy. Here is today’s feature:

The strange details of std::string at Facebook

by Nicholas Ormrod

(watch on YouTube) (watch on Channel 9)

Summary of the talk:

Standard strings are slowing you down. Strings are everywhere. Changing the performance of std::string has a measurable impact on the speed of real-world C++ programs. But how can you make strings better? In this talk, we'll explore how Facebook optimizes strings, especially with our open-source std::string replacement, fbstring. We'll dive into implementation tradeoffs, especially the storage of data in the struct; examine which standard rules can and cannot be flouted, such as copy-on-write semantics; and share some of the things we've learned along the way, like how hard it is to abolish the null-terminator. War stories will be provided.

CppCon 2016: Bringing Clang and C++ to GPUs: An Open-Source, CUDA-Compatible GPU C++ Compiler—Lebar

Have you registered for CppCon 2017 in September? Don’t delay – Registration is open now.

While we wait for this year’s event, we’re featuring videos of some of the 100+ talks from CppCon 2016 for you to enjoy. Here is today’s feature:

Bringing Clang and C++ to GPUs: An Open-Source, CUDA-Compatible GPU C++ Compiler

by Justin Lebar

(watch on YouTube) (watch on Channel 9)

Summary of the talk:

GPU computing has gone mainstream. It is a dominant part of the performance landscape, providing the initial 10x performance lift to a wide variety of applications. However, programing for GPUs can be extremely challenging. C++ is rarely available in an unmodified form, and there are few portable and open source approaches available. One of the most popular platforms, CUDA, has no production quality open source implementation. As a consequence, its C++ support has lagged behind and it has been a less appealing area for researchers and others that weren’t comfortable relying on NVIDIA’s tooling.

However, today things are different. Clang is now a fully functional open-source GPU compiler. It provides a CUDA-compatible programming model and can compile most of the awesome CUDA libraries out there ranging from Thrust (the CUDA-enabled parallel algorithms library that gave rise to the new parallelism technical specification) to Eigen and TensorFlow.

In this talk we will give an overview of how LLVM and Clang support targeting C++ to GPUs, how they work to be compatible with existing CUDA code, and how you can build your code today to run on GPUs with this open source compiler.

High-Performance and Low-Latency C++ with Herb Sutter

Join us for a 3-day training event with Herb Sutter in London, October 9-11, 2017

High-Performance and Low-Latency C++

About the training:

Welcome to a unique training with Mr Herb Sutter focusing on Efficiency, Concurrency, Parallelism, Modern Hardware, and Modern C++11/14/17. Participants of this intensive 3-day training will be given the knowledge and skills required to write high-performance and low-latency code using modern C++ on today´s systems.

Mr Sutter is the chair of the ISO C++ committee and best-selling author of four books and hundreds of technical papers and articles, including the essay “The Free Lunch Is Over”.

Intermediate to advanced C++ programming experience is required. Some experience with concurrency, parallelism, and/or multiprocessing in e.g. Java, C, C++ or similar language is recommended, but not required.

Don’t miss out on the opportunity to attend this three day course, to be held in London on the 9th – 11th October, 2017. Please notice there are a limited number of seats.

CppCon 2016: Practical Performance Practices—Jason Turner

Have you registered for CppCon 2017 in September? Don’t delay – Registration is open now.

While we wait for this year’s event, we’re featuring videos of some of the 100+ talks from CppCon 2016 for you to enjoy. Here is today’s feature:

Practical Performance Practices

by Jason Turner

(watch on YouTube) (watch on Channel 9)

Summary of the talk:

In the past 6 years ChaiScript's performance has been improved by nearly 100x. This was not accomplished by adding a virtual machine or performing dynamic recompilation. Instead, these increases have been accomplished by moving to more simple, cleaner, idiomatic C++ and by following some simple rules. We will outline these concepts with examples for how they both simplified code while improving performance.

There Is A New Future—Felix Petriconi

Version 1.0 of a new C++ future and channel library has been released.

There Is A New Future

by Sean Parent, Foster Brereton and Felix Petriconi

About the library:

This library provides high level abstractions for implementing algorithms that eases the use of multiple CPU cores while minimizing the contention.

The future implementaton differs in several aspects compared to the C++11/14/17 standard futures: It provides continuations and joins, which were just added in a C++17 TS. But more important this futures propagate values through the graph and not futures. This allows an easy way of creating splits. That means a single future can have multiple continuations into different directions. An other important difference is that the futures support cancellation. So if one is not anymore interested in the result of a future, then one can destroy the future without the need to wait until the future is fullfilled, as it is the case with std::future (and boost::future). An already started future will run until its end, but will not trigger any continuation. So in all these cases, all chained continuations will never be triggered. Additionally the future interface is designed in a way, that one can use build in or custom build executors.

Since one can create with futures only graphs for single use, this library provides as well channels. With these channels one can build graphs, that can be used for multiple invocations.