New paper: N3554, A Parallel Algorithms Library -- NVIDIA, Microsoft, Intel

A new WG21 paper is available. A copy is linked below, and the paper will also appear in the next normal WG21 mailing. If you are not a committee member, please use the comments section below or the std-proposals forum for public discussion.

Context: In Kona, Redmond, and Portland, Study Group 1 (SG1, Concurrency) discussed language- and library-based ways to add parallelization and vectorization to Standard C++. For Bristol, three participating companies -- NVIDIA, Microsoft, and Intel -- were asked to come up with a combined proposal for a parallel algorithms library as one contribution. This is their first combined paper.

Document number: N3554

Date: 2013-03-15

A Parallel Algorithms Library

by Jared Hoberock et al. (NVIDIA, Microsoft, Intel)

Excerpt:

1 Executive Summary

We introduce a library of algorithms with parallel execution semantics. Some of these algorithms have semantics similar to the existing standard library of sequential algorithms. Some of the algorithms we propose are novel.

We introduce three parallel execution policies for parallel algorithm execution: std::seq, std::par, and std::vec, as well as a facility for vendors to provide non-standard execution policies as extensions. These policy objects may be used to specify how a parallel algorithm should be executed:

std::vector<int> vec = ...

// legacy sequential sort
std::sort(vec.begin(), vec.end());

// explicit sequential sort
std::sort(std::seq, vec.begin(), vec.end());

// parallel sort
std::sort(std::par, vec.begin(), vec.end());

// vectorized sort
std::sort(std::vec, vec.begin(), vec.end());

// sort with dynamically-selected execution
size_t threshold = ...
std::execution_policy exec = std::seq;
if(vec.size() > threshold)
{
  exec = std::par;
}

std::sort(exec, vec.begin(), vec.end());

// parallel sort with non-standard implementation-provided execution policies:
std::sort(vectorize_in_this_thread, vec.begin(), vec.end());
std::sort(submit_to_my_thread_pool, vec.begin(), vec.end());
std::sort(execute_on_that_gpu, vec.begin(), vec.end());
std::sort(offload_to_my_fpga, vec.begin(), vec.end());
std::sort(send_this_computation_to_the_cloud, vec.begin(), vec.end());

Algorithms invoked with std::seq execute internally in sequential order in the calling thread.

Algorithms invoked with std::par are permitted to execute internally in an unordered fashion in unspecified threads. It is the caller’s responsibility to ensure that the invocation does not introduce data races or deadlocks.

Algorithms invoked with std::vec are permitted to execute internally in an unordered fashion in unspecified threads. In addition to the restrictions implied by std::par, it is the caller’s responsibility to ensure that a std::vec invocation does not throw exceptions or attempt to perform synchronization operations.

Algorithms invoked without an execution policy execute as if they were invoked with std::seq.

An implementation may provide additional execution policies besides std::seq, std::par, or std::vec.

This proposal is a pure addition to the existing C++ standard library; we do not believe it alters the semantics of any existing functionality.

2 Design Notes and Outstanding Questions

Before introducing the detailed specification of our proposal, we begin by outlining our rationale and by noting that we have identified a number of outstanding questions which require further work to resolve. ...

Add a Comment

Comments are closed.

Comments (6)

5 0

matt said on Mar 15, 2013 04:41 PM:

Perhaps a stupid question, but why not
std::sequential
,
std::parallel
, or
std::vectorized
instead of
std::seq
,
std::par
, or
std::vec
?

std::sort(std::vec, vec.begin(), vec.end());
looks a bit... funny.
1 0

bkuhns said on Mar 15, 2013 07:54 PM:

Seems odd to put the tag as the first parameter. It seems to put emphasis on the mode of the algorithm and not on the subject. For example,
std::sort(begin(v), end(v), std::vec)
vs the proposed
std::sort(std::vec, begin(v), end(v))
. The former is more clear about **what** is being sorted rather than **how**.
2 0

mmocny said on Mar 15, 2013 09:05 PM:

Am I the only one who does not like the flag-as-first-parameter notation?
In particular, when/if we get ranges, I would really like to have the target range be the first argument.
(aside from personal preference, this is also useful if you hold out hope that C++ one day gets extension methods / globals can be called with the dot syntax).
0 0

ArashPartow said on Mar 16, 2013 11:48 AM:

"in unspecified threads" - that is asinine. I would want complete control over how many threads are used and the priorities and affinities that those threads for that particular parallel process will have, furthermore I want complete control over the type of thread, as I can envision wanting to perform certain computations exclusively on GPU(s) or other non-cpu compute mechanisms.

I don't think anyone with a sane and reasonable view on things would ever trust a "concrt"-like layer to make the best decision for such parallel computing tasks.

Also the syntax really stinks and is confusing, and I'm not saying this because it may seem foreign as I've seen similar attempts before.

The syntax should be explicit about what is happening, like tbb primitives. Please do some real work in real environments and gain some real experience instead of pie-in-the-sky prototyping around product marketing schedules.
0 0

Paul Jurczak said on Mar 18, 2013 11:17 PM:

I share the opinion about unfortunate placement of execution policy as the first argument. Why not make it the last one? Declare std::seq as a default value, which would make new std::sort compatible with existing code.
0 0

andrew.corrigan said on Apr 25, 2013 04:12 AM:

IMHO, this is the single most crucial proposal needed to ensure that C++ is the best programming language for implementing efficient parallel algorithms for scientific computing. I am strongly in favor of how this library is based so heavily of existing algorithms in the standard C++ library and only extends those algorithms when absolutely necessary. I think that the example std::for_each(std::par, vec.begin(), vec.end(), [](int &x){ x += 13; }); makes it clear that execution policy ought to be the first parameter.

I am pleasantly surprised to see NVIDIA + Microsoft + Intel collaborating on this proposal. AMD, where are you? Bolt looks great, but please do contribute to designing this library.

If this is going to be standardized, we need prototype implementations available to apply in real applications to flesh out practical issues:

Microsoft: I'd really love to see support for a prototype version of this library, built on top of C++ AMP. Why not add a C++ AMP backend to Thrust? Can you also work to reduce some of the restrictions of restrict(amp) outlined here: http://stackoverflow.com/questions/9674019/is-restrictamp-more-restrictive-than-cuda-kernel-code . These restrictions inhibit the implementation of significant scientific code in C++AMP.

Intel: Please implement a prototype version for your compilers and architectures (Intel64, MIC). NVIDIA already supports OpenMP and TBB in Thrust, but there is a lot of work to be done in terms of properly inlining Thrust code and achieving vectorization.