CppCon Program Highlights, 14 of N: Parallel Computation on GPUs

The CppCon 2014 conference program has been posted for the upcoming September conference. We've received requests that the program continue to be posted in "bite-sized" posts, a few sessions at a time, to make the 100+ sessions easier to absorb, so here is another set of talks. This series of posts will conclude once the entire conference program has been posted in this way.


In addition to other performance-focused CppCon talks posted yesterday, CppCon 2014 also has thorough coverage of a very specific form of high performance parallel code -- using GPUs for general-purpose computation (aka GPGPU). This is important because every desktop machine and notebook, and nearly every tablet and smartphone, contains not only multiple CPU cores and vector units, but also a "compute-class" GPU -- these three forms of hardware parallelism are part of the mainstream hardware platform for the foreseeable future in all form factors. If you have a computationally intensive app or an app that could benefit from faster local processing, and you're not exploiting the GPU, you're leaving performance (and battery life) on the table and should be sure to attend these sessions.

In this post:

  • Writing Data Parallel Algorithms on GPUs
  • Another fundamental shift in Parallelism Paradigm? OpenMP 4.0 for GPU/Accelerators and other things
  • Introduction to C++ AMP (GPGPU Computing)


Writing Data Parallel Algorithms on GPUs

Today most PCs, tablets and phones support multi-core processors and most programmers have some familiarity with writing (task) parallel code. Many of those same devices also have GPUs but writing code to run on a GPU is harder. Or is it?

Getting to grips with GPU programming is really about understanding things in a data parallel way. This talk will look at some of the common patterns for implementing algorithms on today's GPUs using examples from the C++ AMP Algorithms Library. Along the way it will cover some of the unique aspects of writing code for GPUs and contrast them with a more conventional code running on a CPU.

Speaker: Ade Miller. Ade Miller writes C++ for fun. He wrote his first N-body model in BASIC on an 8-bit microcomputer 30 years ago and never really looked back. Recently, he's written two books on parallel programming with C++; "C++ AMP: Accelerated Massive Parallelism with Microsoft Visual C++" and "Parallel Programming with Microsoft Visual C++". Ade spends the long winters in Washington contributing to the open source C++ AMP Algorithms Library and well as a few other projects. His summers are mostly spent crashing expensive bicycles into trees.


Another fundamental shift in Parallelism Paradigm? OpenMP 4.0 for GPU/Accelerators and other things

Another fundamental shift in Parallelism Paradigm? Sure. When was the last time you heard that before?

But seriously, as the number of threads/cores continue to increase, there is a growing pressure on applications to exploit more of the available parallelism in their codes, including coarse-, medium-, and fine-grain parallelism. OpenMP has been one of the dominant shared-memory programming models but is evolving beyond that with a new Mission Statement (no, really!) making it well suited for exploiting medium- and fine-grained parallelism.

OpenMP 4.0 exhibits many of these features to support the next step in both consumer, high-performance and exascale computing, with one of the world's first programming model for high-level language support for GPU/Accelerators and vector SIMD across not 1 but 3 high-level languages: C++, C, and that language whose name we dare not speak, but starts with F.

Speaker: Michael Wong, OpenMP CEO/Architect, IBM/OpenMP. Anything including C++, Transactional Memory, Parallel Programming, OpenMP, stars, tennis, travel, and the best food.


Introduction to C++ AMP (GPGPU Computing)

Meet C++ AMP (Accelerated Massive Parallelism), an abstraction layer on top of accelerators such as GPUs. In its current version it allows you to run code on any DX11 GPU, independent of the vendor, and it will even distribute workload across GPUs of different vendors simultaneously. C++ AMP was originally designed by Microsoft but is now an open standard. C++ AMP can deliver orders of magnitude performance increase with certain algorithms by utilizing the GPU to perform mathematical calculations. This talk will give a high level overview of what C++ AMP is and what it can do for you. It is time to start taking advantage of the computing power of GPUs!

Speaker: Marc Gregoire, Nikon Metrology. Marc Gregoire has worked as a software engineer consultant for 6 years for Siemens and Nokia Siemens Networks on critical 2G and 3G software running on Solaris for telecom operators. This required working in international teams stretching from South America and USA to EMEA and Asia. Now, Marc is working for Nikon Metrology on 3D scanning software. Marc is the author of "Professional C++, Second and Third Edition", published by Wiley/Wrox, is the founder of the Belgian C++ Users Group (www.becpp.org), and has written a number of articles which have been published on CodeGuru and/or his personal blog. He also creates freeware and shareware programs that are distributed through his website at www.nuonsoft.com, and maintains a blog on www.nuonsoft.com/blog/.

Add a Comment

Comments are closed.

Comments (3)

0 0

derpyloves said on Aug 21, 2014 07:25 AM:

So, C++14 is here, and all the TSs are in flight leading up to C++17 (my favorite: https://isocpp.org/blog/2014/05/n4021).

So everyone's happeh. But as Herb Sutter suggested at his AFDS Keynote a while back: http://herbsutter.com/2011/06/16/c-amp-keynote/ , if we don't get out of our strongly-coupled and (mostly still) homogenous little hardware box, our days of reveling in C++'s performance and scalability may soon be over.

Heterogenous computing reaching across all hardware platforms and scaling up to the largest datasets and problems is a very important area already, and will only become moreso. Up till now this has basically been the domain of just a select few. But HPC using commodity consumer GPUs is now an attainable reality--but we still have a long way to go to make it thoroughly usable. C++ can lead the way in this arena more effectively than any other language conceivably could IMO. But we all should come together on a standard approach for this problem-space. Microsoft with C++AMP, Intel with Shevlin Park (LLVM/Clang & OpenCL), and the GPU vendors (and others like OpenMP and OpenACC of course) have all made very good progress already, but it's all non ISO-standard and we need some strong leadership in this area so we can all "row together as a team". It will certainly benefit the industry as a whole to have a standard way to approach this area. C++AMP seems a rather elegant approach to me (as Herb eloquently demonstrated in his talk). Thrust is also a very nice STL-like approach: https://thrust.github.io/ .

I hope that the committee will take up the mantle and bring out a solid solution to this problem--and it would be wonderful if it was available by C++17!
0 0

Blog Staff said on Aug 22, 2014 10:55 AM:

Note that the C++ Parallel STL proposal, now in its primary comment ballot, is exactly what you're asking for: A merged proposal from Microsoft, NVidia, and Intel drawing heavily in particular from C++ AMP, PPL, TBB, and Thrust. It is being covered with Artur Laksberg's CppCon talk and will no doubt also be mentioned in other sessions.
0 0

derpyloves said on Aug 22, 2014 04:38 PM:

@Blog Staff: That's great to hear! I didn't realize the Parallelism TS included heterogenous (read GPU) processing. I'll look forward to Artur Laksberg's talk.

I've found [ N3960 ] http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2014/n3960.pdf
Is this the correct document?

Thank you.