1. Motivation
The [P1928R3] proposal outlines a number of constructors and accessors which
can be used to move data in and out of a
object. However, there are a number
of other data types in the standard C++ libraries which provide a form of
data-parallel value storage, such as
and types which implement
the
concept such as
. It is desirable
to be able to easily convert those to and from
values too. In this
paper we shall examine the benefits of providing constructors for building
or
values from other types, and for adding accessors
which allow them to be converted back into other types from their equivalent
SIMD values. Some of these proposals were briefly outlined in [P2638R0], and
some proposals are new.
2. Bitset
has many characteristics in common with a
value and it is useful to be able to use a
value as a mask and
vice versa. There is no easy way for a programmer to concisely and efficiently
achieve these interchange conversions so we propose that a constructor and an
accessor are provided in
. Firstly, a constructor can be provided
for a
:
constexpr basic_simd_mask ( const bitset < size () >& b ) noexcept ;
This constructor allows a bitset with the same number of elements as a SIMD value to be used to build a
for that value. Each element in the constructor mask has the same boolean value as the
respective element from the incoming bitset. It is not marked as explicit, so a bitset
can be conveniently used anywhere that a simd_mask could be.
A
can be converted into an equivalent
using a conversion operator:
constexpr basic_simd_mask :: operator bitset < size () > () const ;
or alternatively though a named method which makes it explicitly clear that the conversion is happening:
constexpr bitset < size () > basic_simd_mask :: to_bitset () const ;
The output bitset value will have the same size as the
, and every
element in the bitset will have the same value as its respective
element.
2.1. Implementation experience
When working with ABIs which already have compact bit representations (e.g.,
AVX512 predicate registers) then converting to and from bitset is efficient.
Conversion to and from wide element representations of masks (e.g., SSE or AVX),
is more expensive but the
library
implementation is able to exploit the internal implementation details to make it
more efficient than anything that the user could do using the public
API.
3. Conversion to and from integral value bit representations
There are several ways that
-like values could be stored, ranging from
wide-element values (e.g., SSE, AVX), compact mask (e.g., ARM or AVX-512
predicates), bitsets, or byte-valued memory regions (e.g.,
). The
API already has the ability
to convert wide elements to and from simd_masks using, for example, the
. However, it can be useful to be able to convert to and from
compact masks represented using raw bits stored in an integral value (something
which
also supports explicitly). In this section we propose
ways of constructing and accessing packed bits stored in integral values.
3.1. Building a mask from a compact bit representation
When working with SIMD values of fixed sizes (rather than native types whose
size can vary by target) it can be useful to express a mask pattern directly.
For example, suppose a programmer requires a custom bit pattern, such as
. There is currently no easy or direct way to encode
that pattern into a
value.
We propose that the following constructor could be provided:
constexpr basic_simd_mask ( auto unsigned_integral bits ) noexcept ;
The
is constructed such that the first (rightmost, least significant)
M bit positions are set to the corresponding bit values of
, where
M is the smaller of N and the number of bits in the value representation of
. If M is less than the size of the mask then the remaining bit
positions are initialized to zeroes.
The issue with using a
as the container for the
input is that it might contain too many bits for a small mask, or too few bits
for a big mask. Without a way of representing an arbitrary number of bits in
an integral value (e.g., C23’s bit-precise
) we are limited to only
allowing up to 64-bits to be inserted into a
, analagous to the same
limits as a
.
3.2. Compact bit accessor
There is currently no easy way for mask bits to be extracted from a
in a compact form. Neither of the existing methods to extract
is efficient when used for this. For example, a mask’s values can be extracted
as
-like values by converting it to a
containing 8-bit
elements.
auto t = simd_mask < uint8_t , 8 > ( m ); auto asByteSimd = + t ; // simd of bool-like objects
In this example the mask contents are converted to elements which are either 0 or 1 according to their respective mask bit. Even in this form these is still no easy step to convert to a compact bit representation.
To make it possible to extract compacted bits as an unsigned integral value we
propose to borrow an idea from
and provide:
constexpr unsigned long long basic_simd_mask::to_ullong () const noexcept ;
This will copy up to 64 mask elements to an output value, storing each mask
element in a single bit. Unfortunately this potentially loses bits, since
will only emit those bits which can be contained in a 64-bit
representation. As with the unsigned_integral constructor this could be solved
if the C23 bit-precise
type was available in C++.
3.3. Implementation experience
In some problem domains, such as telecommunications, compact representations of bits as integers are very common and it is very important to be able to efficiently convert to and from this format. Providing an API to make it easy to convert masks to and from this representation proved invaluable in writing concise meaningful code in this problem domain.
On a machine with compact mask representation (e.g., we tested on AVX-512) the masks are already stored in compact form, so converting to and from an integer representation was trivial.
On a machine with wide mask representation (e.g., SSE, AVX, AVX2) it is not easy or
efficient to use compact representation if only the standard
API
can be used (e.g., converting to a byte memory and then from that to individual
bits). Efficient conversion is only possible if target-specific instructions are
used, and the programmer writes non-portable code to use them. For these targets
then, it was better that a uniform API into a compact format was provided and
handled efficiently by
itself.
Constructing and accessing bits through the bitset pathway (e.g.,
) proved also to be inefficient under some
conditions. Even on targets which already had compact mask representation, the
extra step in storing data temporarily in a bitset added to the overhead of the
conversion. The extra step was difficult to remove because data would be moved
in/out of a bitmask in 64-bit blocks, and in/out of a
in blocks whose
sizes were governed by the operation in progress. This mismatch in sizes made it
difficult to smooth out the data flow across the conversion.
4. Initialising from a list
In earlier revisions of this proposal an initialisation-list-like mechanism was described, to enable something like the following:
simd < float , 4 > myLut = { 2.f , 9.f , 23.f , 45.f };
Since the earlier revisions, papers [P3299R2] and [P1928R15] have added CTAD and support for range/span constructors. These can be used to achieve an effect similar to that of the previous example:
basic_simd myLut = std :: array { 2.f , 9.f , 23.f , 45.f }; // simd<float, 4>
Although slightly more verbose, it does satisfy our original requirements
without needing additional features in the
proposal, so we have removed
the initialiser-list mechanism.
5. Wording
The following wording is a diff against the current draft standard.
5.1. Modify [simd.mask.overview]
In the header
overview - [simd.mask.overview] - add at the end of the existing list of constructors.
// [simd.mask.ctor], basic_simd_mask constructors constexpr explicit basic_simd_mask ( value_type ) noexcept ; template < size_t UBytes , class UAbi > constexpr explicit basic_simd_mask ( const basic_simd_mask < UBytes , UAbi >& ) noexcept ; template < class G > constexpr explicit basic_simd_mask ( G && gen ) noexcept ; constexpr basic_simd_mask ( const bitset < size () >& b ) noexcept ; constexpr basic_simd_mask ( auto unsigned_integral bits ) noexcept ;
Add after the conversion operators:
// [simd.mask.conv], basic_simd_mask conversion operators template < class U , class A > constexpr explicit ( sizeof ( U ) != Bytes ) operator basic_simd < U , A > () const noexcept ; constexpr operator bitset < size () > () const noexcept ; // [simd.mask.namedconv], basic_simd_mask named type convertors constexpr bitset < size () > to_bitset () const noexcept ; constexpr unsigned long long to_ullong () const noexcept ;
5.2. Modify [simd.mask.ctor]
At the end of the [simd.mask.ctor] section add two new constructors.
constexpr basic_simd_mask ( const bitset < size () >& b ) noexcept ; Effects:
Initializes the
th element with
i for all
b [ i ] in the range
i .
[ 0 , size ()) constexpr basic_simd_mask ( auto unsigned_integral val ) noexcept ; Effects:
Initializes the first M elements to the corresponding bit values in
, where M is the smaller of
val and the number of bits in the value representation ([basic.types.general]) of the type of
size () . If
val , the remaining bit positions are initialized to zero.
M < size ()
5.3. Modify [simd.mask.conv]
At the end of the [simd.mask.conv] section add one new conversion operator.
constexpr operator bitset < size () > () const noexcept ; Returns:
A
object where the
bitset < size () > th element is initialized to
i for all
operator []( i ) in the range
i .
[ 0 , size ())
5.4. Modify [simd.mask.namedconv]
After the [simd.mask.conv] add a new section called "
named conversion operators".
named conversion operators [simd.mask.namedconv]
basic_simd_mask constexpr bitset < size () > to_bitset () const noexcept ; Preconditions:
, or
size () <= N For all i in the range
,
[ N , size ()) is false.
operator []( i ) Effects:
Equivalent to:
return operator bitset < size () > (); constexpr unsigned long long to_ullong () const noexcept ; Let N be the number of bits in
.
unsigned long long Returns:
The integral value corresponding to the bits in
.
* this
6. Revision History
R1 => R2
-
Removed initialiser list from the proposal.
-
Added wording
R0 => R1
-
Moved range constructor into its own paper [P3299R0]
-
Updated discussion of initialiser-list-like constructor.