efficiency

C++ Compiler Benchmarks--Imagine Raytracer

On the blog Imagine Raytracer a few days ago:

C++ Compiler Benchmarks

by Imagine Raytracer

From the article:

[...] I decided to try 4.9.2 which had just been released. This seemed to showed a fairly serious regression in terms of speed (speed being a pretty important aspect for a renderer), so I decided I'd do a more comprehensive comparison of the latest main compilers for the Linux platform, as back in 2011 and 2012 I used to do compiler benchmarks (GCC, LLVM and ICC) regularly every six months or so on my own code (including Imagine), and on the commercial VFX compositor made by the company I worked for at the time, and it had been a while since I'd compared them myself...

biicode 2.0 is out -- biicode Team

biicode, a C++ dependency manager, releases version 2.0

biicode 2.0 is out

by biicode Team

Who's biicode for?

For C/C++ developers that think a dependency manager is needed, biicode is a multiplatform tool and hosting service that allows you to build your projects easily, integrate third party code and reuse code among projects with just #includes.

 

New optimizations for X86 in upcoming GCC 5.0 -- Evgeny Stupachenko

Fresh on the Intel Developer Zone blog:

New optimizations for X86 in upcoming GCC 5.0

by Evgeny Stupachenko

From the article:

Part 1. Vectorization of loads/stores group.

GCC 5.0 significantly improves vector code quality for load groups and store groups. By loads/stores group I mean iterated consecutive sequence of loads/stores. For example:

x = a[i], y = a[i + 1], z = a[i + 2] iterated by “i” is loads group of size 3

...

The most frequent case where loads/stores groups are applicable is array of structures.
  1. Image conversion (RGB structure to some other) ...
  2. N-dimentional coordinates. (Normalize array of XYZ points) ...
  3. Multiplication of vectors by constant matrix: ...

... GCC 5.0:

  1. Introduces vectorization of load/store groups of size 3
  2. Improves load groups vectorization for all supported sizes
  3. Maximizes load/store groups performance by generating code that is more optimal for particular x86 CPU...

 

C++ and Zombies: a moving question

One of the issues I was thinking about since C++Now: move and move-destruction

C++ and Zombies: a moving question

by Jens Weller

From the article:

This has been on my things to think about since C++Now. At C++Now, I realized, that we've might got zombies in the C++ standard. And that there are two fractions, one of them stating, that it is ok to have well defined zombies, while some people think that you'd better kill them.

Insights into new and C++

I've written down some basic thinking on new and new standards:

Insights into new and C++

by Jens Weller

From the article:

Every now and then, I've been thinking about this. So this blogpost is also a summary of my thoughts on this topic, dynamic memory allocation and C++. Since I wrote the blog entries on smart pointers, and C++14 giving us make_unique, raw new and delete seem to disappear from C++ in our future code...

The Drawbacks of Implementing Move Assignment in Terms of Swap -- Scott Meyers

Hot off the Meyers press: How would you implement move, and why? Scott Meyers explains two related issues:

The Drawbacks of Implementing Move Assignment in Terms of Swap

by Scott Meyers

From the article:

More and more, I bump into people who, by default, want to implement move assignment in terms of swap. This disturbs me, because (1) it's often a pessimization in a context where optimization is important, and (2) it has some unpleasant behavioral implications as regards resource management.

Vector of Objects vs Vector of Pointers Updated -- Bartlomiej Filipek

More in the "contiguous enables fast" department:

Vector of Objects vs Vector of Pointers Updated

by Bartlomiej Filipek

From the article:

For 1000 particles we need on the average 2000 cache line reads! This is 78% more cache line reads than the first case! Additionally Hardware Prefetcher cannot figure out the pattern -- it is random -- so there will be a lot of cache misses and stalls.

In our experiment the pointer code for 80k of particles was more 266% slower than the continuous case.

Fast Polymorphic Collections -- Joaquín M López Muñoz

munuz-poly.PNGOn the theme of "contiguous enables fast":

Fast Polymorphic Collections

by Joaquín M López Muñoz

From the article:

poly_collection behaves excellently and is virtually not affected by the size of the container. For n < 105, the differences in performance between poly_collection and a std::vector of std::unique_ptrs are due to worse virtual call branch prediction in the latter case; when n > 105, massive cache misses are added to the first degrading factor.

Parsing XML at the Speed of Light--Arseny Kapoulkine

Some high-performance techniques that you an use for more than just parsing, including this week's darling of memory management:

Parsing XML at the Speed of Light

a chapter from "The Performance of Open Source Applications"
by Arseny Kapoulkine

From the chapter:

This chapter describes various performance tricks that allowed the author to write a very high-performing parser in C++: pugixml. While the techniques were used for an XML parser, most of them can be applied to parsers of other formats or even unrelated software (e.g., memory management algorithms are widely applicable beyond parsers). ...

Optimizing software is hard. In order to be successful, optimization efforts almost always involve a combination of low-level micro-optimizations, high-level performance-oriented design decisions, careful algorithm selection and tuning, balancing among memory, performance, implementation complexity, and more. Pugixml is an example of a library that needs all of these approaches to deliver a very fast production-ready XML parser–even though compromises had to be made to achieve this. A lot of the implementation details can be adapted to different projects and tasks, be it another parsing library or something else entirely.

Continue reading...