performance : Standard C++

Home » Blog » Tags » performance

« Prev Next »

performance

CppCon 2015 Work Stealing--Pablo Halpern

By Adrien Hamelin | Sep 14, 2016 12:18 PM | Tags: performance community

Have you registered for CppCon 2016 in September? Don’t delay – Late registration is open now.

While we wait for this year’s event, we’re featuring videos of some of the 100+ talks from CppCon 2015 for you to enjoy. Here is today’s feature:

Work Stealing

by Pablo Halpern

(watch on YouTube) (watch on Channel 9)

Summary of the talk:

If you've used a C++ parallel-programming system in the last decade, you've probably run across the term "work stealing." Work stealing is a scheduling strategy that automatically balances a parallel workload among available CPUs in a multi-core computer, using computation resources with theoretical utilization that is nearly optimal. Modern C++ parallel template libraries such as Intel(R)'s TBB or Microsoft*'s PPL and language extensions such as Intel(R) Cilk(tm) Plus or OpenMP tasks are implemented using work-stealing runtime libraries.

Most C++ programmers pride themselves on understanding how their programs execute on the underlying machine. Yet, when it comes to parallel programming, many programmers mistakenly believe that if you understand threads, then you understand parallel runtime libraries. In this talk, we'll investigate how work-stealing applies to the semantics of a parallel C++ program. We'll look at the theoretical underpinnings of work-stealing, now it achieves near optimal machine utilization, and a bit about how it's implemented. In the process, we'll discover some pit-falls and how to avoid them. You should leave this talk with a deeper appreciation of how parallel software runs on real systems.

Previous experience with parallel programming is helpful but not required. A medium level of expertise in C++ is assumed.

CppCon 2015 Compile-time contract checking with nn--Jacob Potter

By Adrien Hamelin | Aug 31, 2016 01:56 PM | Tags: performance intermediate

Have you registered for CppCon 2016 in September? Don’t delay – Registration is open now.

While we wait for this year’s event, we’re featuring videos of some of the 100+ talks from CppCon 2015 for you to enjoy. Here is today’s feature:

Compile-time contract checking with nn

by Jacob Potter

(watch on YouTube) (watch on Channel 9)

Summary of the talk:

Tony Hoare called null pointers a “billion-dollar mistake”, but nearly every language in wide use today has them. There have been many efforts to reduce the risk of nulls creeping in where they shouldn't be, but most involve attributes or annotations rather than being part of the type system itself. Can we do better? C++'s customizable value types make it possible to solve this sort of problem.

In this talk, I’ll present a non-nullable pointer wrapper, `nn`, that’s found wide use in Dropbox’s C++ code. This helper lets us use the type system to track pointers that can't be null, and express and enforce contracts at compile time. I’ll go into some depth on the template trickery needed to make things “just work”, the toolchain bugs we found along the way, and how this tool has helped us improve our code.

CppCon 2015 C++ Rcpp: Seamless R and C++ Integration--Matt P. Dziubinski

By Adrien Hamelin | Aug 19, 2016 01:52 PM | Tags: performance community

Have you registered for CppCon 2016 in September? Don’t delay – Registration is open now.

While we wait for this year’s event, we’re featuring videos of some of the 100+ talks from CppCon 2015 for you to enjoy. Here is today’s feature:

Rcpp: Seamless R and C++ Integration

by Matt P. Dziubinski

(watch on YouTube) (watch on Channel 9)

Summary of the talk:

R is an open-source statistical language designed with a focus on data analysis. While its historical roots are in statistical applications, it is currently experiencing a rapid growth in popularity in all fields where data matters: from data science, through bioinformatics and finance, to machine learning. Key strengths contributing to this growth include its rich libraries ecosystem (over 6 thousands packages at the moment of writing) – often authored by the leading researchers in the field, providing early access to the latest techniques; beautiful, high-quality visualizations – supporting seamless exploratory data analysis and producing stunning presentations; all of this available in an interactive environment resulting in high productivity through fast iteration times.

At the same time, there are no free lunches in programming: the dynamic, interactive nature of R does have its costs, including a significant impact on run-time performance. In an era of growing data sizes and increasingly realistic models this concern is only becoming more important.

In this talk we provide an introduction to Rcpp – a library allowing smooth integration of R with C++, combining the productivity benefits of R for data science together with the performance of C++. First released in 2005, today it’s the most popular language extension for R -- used by over 400 packages. We'll also discuss challenges (as well as possible solutions) involved in integrating modern C++ code, and demonstrate the usage of popular C++ libraries in practice. We’ll conclude the talk with the RInside package allowing to embed R in C++.

CppCon 2015 C++ on the Web: Ponies for developers without pwn’ing users--JF Bastien

By Adrien Hamelin | Aug 17, 2016 01:51 PM | Tags: performance community

Have you registered for CppCon 2016 in September? Don’t delay – Registration is open now.

While we wait for this year’s event, we’re featuring videos of some of the 100+ talks from CppCon 2015 for you to enjoy. Here is today’s feature:

C++ on the Web: Ponies for developers without pwn’ing users

by JF Bastien

(watch on YouTube) (watch on Channel 9)

Summary of the talk:

Is it possible to write apps in C++ that run in the browser with native code speed? Yes. Can you do this without the security problems associated with running native code downloaded from the net? Yes and yes. Come to this session to learn how.

We'll showcase some resource-intensive applications that have been compiled to run in the browser. These applications run as fast as native code with access to cornerstone native programming APIs—modern C++ STL, OpenGL, files and processes with full access to C++’s concurrency and parallelism—all in an architecture- and OS-agnostic packaging. Then, we'll describe how we deliver native code on the web securely, so developers get their C++ ponies and users don’t get pwn’d. We’ll also touch on the fuzzing, code randomization, and sandboxing that keep the billions of web users safe.

Quick Q: Efficiency of postincrement v.s. preincrement in C++

By Adrien Hamelin | Aug 15, 2016 12:55 PM | Tags: performance advanced

Quick A: The difference is only marginal with optimizations enabled.

Recently on SO:

Efficiency of postincrement v.s. preincrement in C++

It is true - although perhaps overly strict. Pre increment doesn't necessarily introduce a data dependency - but it can.

A trivial example for exposition:
a = b++ * 2;
Here, the increment can be executed in parallel with the multiplication. The operands of both the increment and the multiplication are immediately available and do not depend on the result of either operation.

Another example:
a = ++b * 2;
Here, the multiplication must be executed after the increment, because one of the operands of the multiplication depends on the result of the increment.

Of course, these statements do slightly different things, so the compiler might not always be able to transform the program from one form to the other while keeping the semantics the same - which is why using the post increment might make a slight difference in performance.

A practical example, using a loop:
for(int i= 0; arr[i++];)
    count++;

for(int i=-1; arr[++i];)
    count++;
One might think that the latter is necessarily faster if they reason that "post-increment makes a copy" - which would have been very true in the case of non-fundamental types. However, due to the data dependency (and because int is a fundamental type with no overload function for increment operators), the former can theoretically be more efficient. Whether it is depends on the cpu architecture, and the ability of the optimizer.

For what it's worth - in a trivial program, on x86 arch, using g++ compiler with optimization enabled, the above loops had identical assembly output, so they are perfectly equivalent in that case.

Rules of thumb:

If the counter is a fundamental type and the result of increment is not used, then it makes no difference whether you use post/pre increment.

If the counter is not a fundamental type and the result of the increment is not used and optimizations are disabled, then pre increment may be more efficient. With optimizations enabled, there is no difference.

If the counter is a fundamental type and the result of increment is used, then post increment can theoretically be marginally more efficient - in some cpu architecture - in some context - using some compiler.

If the counter is a complex type and the result of the increment is used, then pre increment is typically faster than post increment. Also see R Sahu's answer regarding this case.

CppCon 2015 std::allocator Is to Allocation what std::vector Is to Vexation--Andrei Alexandrescu

By Adrien Hamelin | Aug 15, 2016 12:43 PM | Tags: performance efficiency

Have you registered for CppCon 2016 in September? Don’t delay – Registration is open now.

While we wait for this year’s event, we’re featuring videos of some of the 100+ talks from CppCon 2015 for you to enjoy. Here is today’s feature:

std::allocator Is to Allocation what std::vector Is to Vexation

by Andrei Alexandrescu

(watch on YouTube) (watch on Channel 9)

Summary of the talk:

std::allocator has an inglorious past, murky present, and cheerless future. STL introduced allocators as a stop gap for the now antiquated segmented memory models of the 1990s. Their design was limited and in many ways wasn't even aiming at helping allocation that much. Because allocators were there, they simply continued being there, up to the point they became impossible to either uproot or make work, in spite of valiant effort spent by the community.

But this talk aims at spending less time on poking criticism at std::allocator and more on actually defining allocator APIs that work.

Scalable, high-performance memory allocation is a topic of increasing importance in today's demanding applications. For such, std::allocator simply doesn't work. This talk discusses the full design of a memory allocator created from first principles. It is generic, componentized, and composable for supporting application-specific allocation patterns.

CppCon 2016 Workshop and Talk--Anthony Williams

By Adrien Hamelin | Aug 8, 2016 02:05 PM | Tags: performance efficiency

An interesting workshop in perspective!

CppCon 2016 Workshop and Talk

by Anthony Williams

From the announcement:

I will be running my new 2-day workshop on Concurrent Thinking, as well as doing a session on The Continuing Future of Concurrency in C++ at CppCon 2016 in September.

CppCon 2015 constexpr: Applications--Scott Schurr

By Adrien Hamelin | Aug 8, 2016 02:03 PM | Tags: performance c++14

Have you registered for CppCon 2016 in September? Don’t delay – Registration is open now.

While we wait for this year’s event, we’re featuring videos of some of the 100+ talks from CppCon 2015 for you to enjoy. Here is today’s feature:

constexpr: Applications

by Scott Schurr

(watch on YouTube) (watch on Channel 9)

Summary of the talk:

I'm excited about constexpr. It's probably my favorite C++11 feature and it's gotten even better with C++14. But when I talk to other developers about constexpr they seem puzzled. What sorts of useful computations can the compiler possibly do before runtime?

I'd like to take this session to explore some of the capabilities that constexpr brings to the table. We'll look at compile-time parsing, floating-point computations, and containers. We'll also talk about motivations for computing these at compile time.

This session builds on the "constexpr: Introduction" talk.

Quick Q: Why doesn't c++ have a specified order for evaluating function arguments?

By Adrien Hamelin | Aug 5, 2016 02:00 PM | Tags: performance intermediate

Quick A: In order to get the best performance.

recently on SO:

Why doesn't c++ have a specified order for evaluating function arguments?

C++ leaves as much as practical up to the implementation of the compiler, especially where optimization opportunities might occur. Before fixing something like this, existing compiler writers are consulted to see if there is a cost.

Some order of evaluation guarantees surrounding overloaded operators and complete-argument rules where added in C++17. But it remains that which argument goes first is left unspecified. In C++17, it is now specified that the expression giving what to call (the code on the left of the ( of the function call) goes before the arguments, and whichever argument is evaluated first is evaluated fully before the next one is started, and in the case of an object method the value of the object is evaluated before the arguments to the method are. (There may be some minor errors in this description: ask a narrow question about it for a more vetted answer).

These where vetted and determined to not cause significant prolems for existing compilers. Reordering of arguments was considered, and discarded, presumably for good reasons. Or even poor reasons, like existing compilers have a default order and it could break (possibly non-compliant) code on that compiler to force them all to do it in one global order.

In short, C++ leaves lots of freedom to compiler writers. This has let compiler writers find optimization opportunities in strange crannies. The kind of optimizations available today are way beyond what the original writers of C++, let alone C++, may have suspected where practical or be considered reasonable.

CppCon 2015 C++ Atomics: The Sad Story of memory_order_consume--Paul E. McKenney

By Adrien Hamelin | Aug 5, 2016 01:52 PM | Tags: performance experimental

Have you registered for CppCon 2016 in September? Don’t delay – Registration is open now.

While we wait for this year’s event, we’re featuring videos of some of the 100+ talks from CppCon 2015 for you to enjoy. Here is today’s feature:

C++ Atomics: The Sad Story of memory_order_consume: A Happy Ending At Last?

by Paul E. McKenney

(watch on YouTube) (watch on Channel 9)

Summary of the talk:

One of the big advantages of C++ atomics is that developers can now let the compiler know about the intent behind their multi-threaded synchronization mechanisms. At least they can tell the compiler as long as the synchronization mechanism in question is not RCU. You see, all production compilers promote RCU's memory_order_consume to memory_order_acquire. Although this promotion does ensure correctness, it also ensures the additional overhead of memory-barrier instructions on weakly ordered systems and of needlessly suppressed compiler optimizations on all systems.

All previous attempts to resolve this issue have foundered on either standard-committee reluctance to eviscerate the standard for a special case, compiler-writer reluctance to eviscerate their compilers for a special case, and kernel-developers reluctance to eviscerate their source base for late-to-the-party compiler support.

But now there is a glimmer of hope in the guise of a small set of small patches to the Linux kernel that eliminate the most challenging use cases. Will this hope be realized? Come to this talk to here the story, which by September will hopefully have a happy ending!