performance : Standard C++

Great news: Since yesterday, both of the keynotes from this years Meeting C++ conference are on youtube! Both keynote speakers chose to speak on a specific topic, and delivered very well. There is also a playlist for Meeting C++ 2015.

Careful With That STL Map Insert, Eugene

By Adrien Hamelin | Dec 21, 2015 07:27 AM | Tags: performance intermediate

Some things are not as efficient as we thought:

Careful With That STL Map Insert, Eugene

by Aras Pranckevičius

From the article:

So we had this pattern in some of our code. Some sort of “device/API specific objects” need to be created out of simple “descriptor/key” structures. Think D3D11 rasterizer state or Metal pipeline state, or something similar to them...

All Meeting C++ Lightning Talk videos are online

By Meeting C++ | Dec 11, 2015 09:03 AM | Tags: performance intermediate efficiency c++14 c++11 boost basics advanced

Meeting C++ just started a week ago, and I already managed to edit and upload all lightning talks:

Meeting C++ 2015 - all lightning talks are now online at youtube

by Jens Weller

From the article:

This year for the very first time we had lightning talks at the Meeting C++ conference. Two sessions with each 5 lightning talks were held...

HPX version 0.9.11 released -- STE||AR Group

By Hartmut Kaiser | Nov 13, 2015 05:32 AM | Tags: performance parallelism experimental distributed computing c++14 c++11

The STE||AR Group has released V0.9.11 of HPX -- A general purpose parallel C++ runtime system for applications of any scale.

HPX V0.9.11 Released

The newest version of HPX (V0.9.11) is now available for download! Please see here for the release notes.

HPX exposes an API fully conforming to the concurrency related parts of the C++11 and C++14 standards, extended and applied to distributed computing.

From the announcement:

In this release our team has focused on developing higher level C++ programming interfaces which simplify the use of HPX in applications and ensure their portability in terms of code and performance. We paid particular attention to align all of these changes with the existing C++ Standard or with the ongoing standardization work. Other major features include the introduction of executors and various policies which enable customizing the ‘where’ and ‘when’ of task and data placement.
This release consolidates many of the APIs exposed by HPX. We introduced a new uniform way of creating (local and remote) objects, we added distribution policies allowing to manage and customize data placement and migration, we unified the way various types of parallelism are made available to the user.

What is Code Modernization -- Mike Pearce

By Mantosh Kumar | Nov 10, 2015 10:13 PM | Tags: performance intermediate efficiency

Discussion regarding systematic approach to go about code modernization.

What is Code Modernization?

by Mike Pearce (Intel)

From the article:

Random Acts of Optimization --Tony Albrecht

By Mantosh Kumar | Oct 29, 2015 01:16 AM | Tags: performance efficiency

Discussion regarding systematic approach to go about optimization of logic.

Random Acts of Optimization

by Tony Albrecht

From the article:

The three stages mentioned here, while seemingly obvious, are all too often overlooked when programmers seek to optimize. Just to reiterate:
    1. Identification: profile the application and identify the worst performing parts.
    2. Comprehension: understand what the code is trying to achieve and why it is slow.
    3. Iteration: change the code based on step 2 and then re-profile. Repeat until fast enough.
The solution above is not the fastest possible version, but it is a step in the right direction—the safest path to performance gains is via iterative improvements.

Do You Prefer Fast or Precise?--Jim Hogg

By Adrien Hamelin | Oct 20, 2015 12:26 PM | Tags: performance advanced

A nice article explaining the troubles of float numbers, and what effects it can have. It is talking in the case of Visual C++, but the problems are the same for other compilers.

Do You Prefer Fast or Precise?

by Jim Hogg

From the article:

Floating Point Basics

In C++, a float can store a value in the 3 (approximate) disjoint ranges { [-E+38, -E-38], 0, [E-38, E+38] }. Each float consumes 32 bits of memory. In this limited space, a float can only store approximately 4 billion different values. It does this in a cunning way, where adjacent values for small numbers lie close together; while adjacent values for big numbers lie far apart. You can count on each float value being accurate to about 7 decimal digits.

Floating Point Calculations

We all understand how a computer calculates with ints. But what about floats? One obvious effect is that if I add a big number and a small number, the small one may simply get lost. For example, E+20 + E-20 results in E+20 – there are not enough bits of precision within a float to represent the precise/exact/correct value...

Video available: Chandler Carruth, "Tuning C++: Benchmarks, and CPUs, and Compilers!" -- CppCon

By Felix Petriconi | Sep 29, 2015 06:20 AM | Tags: performance efficiency advanced

Chandler's talk about benchmarking, cheating the compiler's optimizer and optimizing code from the recent CppCon is online.

Tuning C++: Benchmarks, and CPUs, and Compilers! Oh My! (YouTube)

by Chandler Carruth, CppCon 2015

From the talk's outline:

A primary use case for C++ is low latency, low overhead, high performance code. But C++ does not give you these things for free, it gives you the tools to control these things and achieve them where needed. How do you realize this potential of the language? How do you tune your C++ code and achieve the necessary performance metrics?

This talk will walk through the process of tuning C++ code from benchmarking to performance analysis. It will focus on small scale performance problems ranging from loop kernels to data structures and algorithms. It will show you how to write benchmarks that effectively measure different aspects of performance even in the face of advanced compiler optimizations and bedeviling modern CPUs. It will also show how to analyze the performance of your benchmark, understand its behavior as well as the CPUs behavior, and use a wide array of tools available to isolate and pinpoint performance problems. The tools and some processor details will be Linux and x86 specific, but the techniques and concepts should be broadly applicable.

Should you be using something instead of what you should use instead?--Scott Meyers

By Adrien Hamelin | Sep 17, 2015 12:15 AM | Tags: performance c++11

The title is confusing, the article is not and should be read!

Should you be using something instead of what you should use instead?

by Scott Meyers

From the article:

The April 2000 C++ Report included an important article by Matt Austern: "Why You Shouldn't Use set—and What to Use Instead." It explained why lookup-heavy applications typically get better performance from applying binary search to a sorted std::vector than they'd get from the binary search tree implementation of a std::set. Austern reported that in a test he performed on a Pentium III using containers of a million doubles, a million lookups in a std::set took nearly twice as long as in a sorted std::vector...