performance : Standard C++

Home » Blog » Tags » performance

« Prev Next »

performance

Wrapping up Meeting C++ 2018

By Meeting C++ | Nov 30, 2018 04:15 AM | Tags: performance meetingcpp intermediate experimental efficiency community c++17 c++14 c++11 basics advanced

Pictures and more about Meeting C++ 2018!

Meeting C++ 2018 wrap up

by Jens Weller

From the article:

Two weeks ago, Meeting C++ 2018 started. The first attendees got their badges and it was again this time of year for my staff, the volunteers and myself: conference time...

How to Boost Performance with Intel Parallel STL and C++17 Parallel Algorithms--Bartlomiej Filipek

By Adrien Hamelin | Nov 29, 2018 01:31 PM | Tags: performance c++17

Another one.

How to Boost Performance with Intel Parallel STL and C++17 Parallel Algorithms

by Bartlomiej Filipek

From the article:

C++17 brings us parallel algorithms. However, there are not many implementations where you can use the new features. The situation is getting better and better, as we have the MSVC implementation and now Intel’s version will soon be available as the base for libstdc++ for GCC.

Since the library is important, I’ve decided to see how to use it and what it offers...

A zero cost abstraction?--Josh Peterson

By Adrien Hamelin | Nov 26, 2018 02:12 PM | Tags: performance

Safe and performant?

A zero cost abstraction?

by Josh Peterson

From the article:

Recently Joachim (CTO at Unity) has been talking about “performance by default”, the mantra that software should be as fast as possible from the outset. This is driving the pretty cool stuff many at Unity are doing around things like ECS, the C# job system, and Burst (find lots more about that here).

One question Joachim has asked internally of Unity developers is (I’m paraphrasing here): “What is the absolute lower bound of time this code could use?” This strikes me as a really useful way to think about performance. The question changes from “How fast is this?” to “How fast could this be?”. If the answers to those two questions are not the same, the next question is “Do we really need the additional overhead?”

Another way to think about this is to consider the zero-cost abstraction, a concept much discussed in the C++ and Rust communities. Programmers are always building abstractions, and those abstractions often lead to the difference between “how fast it is” and “how fast it could be”. We want to provide useful abstractions that don’t hurt performance...

The Amazing Performance of C++17 Parallel Algorithms, is it Possible?--Bartlomiej Filipek

By Adrien Hamelin | Nov 14, 2018 01:27 PM | Tags: performance c++17

Are you using it?

The Amazing Performance of C++17 Parallel Algorithms, is it Possible?

by Bartlomiej Filipek

From the article:

With the addition of Parallel Algorithms in C++17, you can now easily update your “computing” code to benefit from parallel execution. In the article, I’d like to examine one STL algorithm which naturally exposes the idea of independent computing. If your machine has 10-core CPU, can you always expect to get 10x speed up? Maybe more? Maybe less? Let’s play with this topic.

HPX V1.2 released -- STE||AR Group

By Hartmut Kaiser | Nov 13, 2018 08:22 AM | Tags: performance parallelism heterogeneous computing distributed computing c++17 c++14 c++11

The STE||AR Group has released V1.2 of HPX -- A C++ Standard library for parallelism and concurrency.

HPX V1.2 Released

The newest version of HPX (V1.2) is now available for download! Please see here for the release notes. This release is the first in our more frequent release schedule. We are aiming to produce one release every six months in an effort to get new features and stable releases out to users more quickly.

HPX exposes an API fully conforming to the parts of the C++11/C++14/C++17 standards that are related to parallelism and concurrency, extended and applied to distributed and heterogeneous computing, and aligned with the ongoing standardization discussions.

HPX is a general purpose parallel C++ runtime system for applications of any scale. It implements all of the related facilities as defined by the C++ Standard. As of this writing, HPX provides the only widely available open-source implementation of the new C++17 parallel algorithms. Additionally, HPX implements functionalities proposed as part of the ongoing C++ standardization process, such as large parts of the C++ Concurrency TS, Parallelism TS V2, data-parallel algorithms, executors, and many more. It also extends the existing C++ Standard APIs to the distributed case (e.g. compute clusters) and for heterogeneous systems (e.g. GPUs).
HPX seamlessly enables a new asynchronous C++ Standard Programming Model which tends to improve the parallel efficiency of our applications and helps reducing complexities usually associated with parellism and concurrency.

Using C++17 Parallel Algorithms for Better Performance--Billy O’Neal

By Adrien Hamelin | Sep 13, 2018 12:36 PM | Tags: performance c++17

Are you using the parallel capacities of the std?

Using C++17 Parallel Algorithms for Better Performance

by Billy O’Neal

From the article:

C++17 added support for parallel algorithms to the standard library, to help programs take advantage of parallel execution for improved performance. MSVC first added experimental support for some algorithms in 15.5, and the experimental tag was removed in 15.7.

The interface described in the standard for the parallel algorithms doesn’t say exactly how a given workload is to be parallelized. In particular, the interface is intended to express parallelism in a general form that works for heterogeneous machines, allowing SIMD parallelism like that exposed by SSE, AVX, or NEON, vector “lanes” like that exposed in GPU programming models, and traditional threaded parallelism.

Our parallel algorithms implementation currently relies entirely on library support, not on special support from the compiler. This means our implementation will work with any tool currently consuming our standard library, not just MSVC’s compiler. In particular, we test that it works with Clang/LLVM and the version of EDG that powers Intellisense...

First Meeting Embedded Conference Schedule available

By Meeting C++ | Aug 22, 2018 01:58 AM | Tags: performance embedded efficiency community

Meeting Embedded is a new conference with a focus on embedded, hosting lots of talks connected to embedded & C++, plus a keynote by Dan Saks!

Meeting Embedded 2018

Schedule

Organized by Jens Weller

From the article:

Meeting Embedded 2018 is a one day event focused on hard and software development for embedded and the IoT. Meeting Embedded will be at Vienna House Andel's Berlin Hotel on the 14th of November, right in front of Meeting C++!

Speeding up Pattern Searches with Boyer-Moore Algorithm from C++17--Bartlomiej Filipek

By Adrien Hamelin | Aug 20, 2018 01:10 PM | Tags: performance c++17

Did you know it?

Speeding up Pattern Searches with Boyer-Moore Algorithm from C++17

by Bartlomiej Filipek

From the article:

With C++17 you can now use more sophisticated algorithms for pattern searches! Now, you’ll have more control and a promising performance boost for many use cases...

Quick Q: What are copy elision and return value optimization?

By Adrien Hamelin | Aug 17, 2018 01:10 PM | Tags: performance intermediate

Quick A: they are common optimizations that a compiler can do behind the scenes to avoid copying in certain cases.

Recently on SO:

What are copy elision and return value optimization?

Copy elision is an optimization implemented by most compilers to prevent extra (potentially expensive) copies in certain situations. It makes returning by value or pass-by-value feasible in practice (restrictions apply).

It's the only form of optimization that elides (ha!) the as-if rule - copy elision can be applied even if copying/moving the object has side-effects.

The following example taken from Wikipedia:
struct C {
  C() {}
  C(const C&) { std::cout << "A copy was made.\n"; }
};

C f() {
  return C();
}

int main() {
  std::cout << "Hello World!\n";
  C obj = f();
}
Depending on the compiler & settings, the following outputs are all valid:
Hello World!
A copy was made.
A copy was made.
Hello World!
A copy was made.
Hello World!
This also means fewer objects can be created, so you also can't rely on a specific number of destructors being called. You shouldn't have critical logic inside copy/move-constructors or destructors, as you can't rely on them being called.

If a call to a copy or move constructor is elided, that constructor must still exist and must be accessible. This ensures that copy elision does not allow copying objects which are not normally copyable, e.g. because they have a private or deleted copy/move constructor.

C++17: As of C++17, Copy Elision is guaranteed when an object is returned directly:
struct C {
  C() {}
  C(const C&) { std::cout << "A copy was made.\n"; }
};

C f() {
  return C(); //Definitely performs copy elision
}
C g() {
    C c;
    return c; //Maybe performs copy elision
}

int main() {
  std::cout << "Hello World!\n";
  C obj = f(); //Copy constructor isn't called
}

OutOfLine – A Memory-Locality Pattern for High Performance C++ -- Patrick Moran

By PMoranDev | Aug 17, 2018 12:39 PM | Tags: performance

A description and implementation of a tradeoff for improving memory locality in fast-path code.

OutOfLine – A Memory-Locality Pattern for High Performance C++

By Patrick Moran

From the article:

OutOfLine is a tool that you can use to keep RAII, and move your cold members completely outside your object with zero space overhead.