std::iterator_interface
Document #: | P2727R0 |
Date: | 2022-11-20 |
Project: | Programming Language C++ |
Audience: |
LEWG-I LEWG |
Reply-to: |
Zach Laine <[email protected]> |
Writing STL iterators is surprisingly hard. There are a lot of things that can subtly go wrong. It is also very tedious, which of course makes it error-prone.
Iterators have numerous typedefs and operations, even though all the operations of a given iterator can be implemented in terms of at most four operations (and usually only three). Writing all the other operations yields very similar-looking code that is hard to review, and all but requires that you write full-coverage tests for each iterator.
As an example, say you wanted an iterator that allowed you to iterate over repetitions of a sequence of characters, like:
("foo", 3, 0); // 3 is the length of "foo", 0 is this iterator's position.
repeated_chars_iterator first("foo", 3, 7); // Same as above, but now the iterator's position is 7.
repeated_chars_iterator last::string result;
std::copy(first, last, std::back_inserter(result));
stdassert(result == "foofoof");
Here’s how you might implement it, with and without this proposal:
Before
|
After
|
---|---|
|
|
I used that motivating example in the README file for the Boost library this proposal is based on. One day, someone pointed out that there was a bug in the “before” version of the iterator. The iterator was used extensively in yet another proposed Boost library of mine, and yet this bug was never actually exercised that I know of. This underscores an important point: with very large APIs like the std iterators, one is very likely to use lots of copy pasta, and one is very unlikely to fully test each element of the API.
I’ve reintroduced the bug here for funsies; see how long it takes you to find it by inspection.
Some examples of useful iterators you may want to write:
Because writing iterators is so much work at the moment, most of use avoid it whenever possible. So ad hoc use cases for iterators almost always go unfulfilled. Here’s an example of one such ad hoc iterator use case.
Say you have two or more sequences that you want to treat as a single sequence:
::vector<int> a;
std::list<int> b; std
You’d use something like the
views::concat
proposed in [P2542R2]:
for (auto && x : concat(a, b)) {
// ...
}
That works well, as long as there exists a common reference type
among the iterators used within the
views::concat
implementation.
What if there isn’t? What if the
value_type
s of the various
ranges you pass to concat()
are
essentially compatible, but nevertheless have no common reference
type?
For example, what if I want to concatenate two ranges, one of which
is a range of int
and the other
of which is a range of:
struct my_int
{
() = default;
my_intexplicit my_int(int v) : value_{v} {}
int value_;
};
Clearly, std::common_reference_with<my_int, int>
is false
. However, if I also had
some simple way of projecting from either one to
int
, like this:
int as_int(int i) { return i; }
int as_int(my_int i) { return i.value_; }
I would still expect to be able to construct some view over the
concatenation of a
and
b
– even if it were
read-only.
This example may seem contrived, but it is not. While implementing an
algorithm to support Unicode operations on strings, I ran in to a case
in which a certain step S
of the
algorithm required iterating over two containers
A a
and
B b
, where
A::iterator::reference
and
B::iterator::reference
were
different enough that there was no common reference, but
A::value_type
and
B::value_type
were similar
enough that I could just call a function (analogous to
as_int()
above) that could
project both of them to a single type. To make matters worse, I could
not even break S
up into the
a
-part and the
b
-part, because
S
could not consier elements in
isolation; it required examining adjacent elements in the concatenation
of a
and
b
.
None of the views proposed for the standard has ever supported
projections. I don’t consider this a defect; it’s probably the wrong
kind of customization for a general-purpose view. So, what to do? Write
an iterator that fits this one-off, special case, of course! Using Boost.STLInterfaces,
an implementation of everything in this proposal and more, I wrote
something like this iterator. To keep all the Unicode-y bits out of our
way, I’m showing the
int
/my_int
analogous solution.
template<typename T, typename Proj, typename R1, typename R2>
struct concat_iter : boost::stl_interfaces::proxy_iterator_interface<
<T, Proj, R1, R2>,
concat_iter::bidirectional_iterator_tag,
std>
T{
using first_iterator = decltype(std::declval<R1 &>().begin());
using second_iterator = decltype(std::declval<R2 &>().begin());
struct end_tag{};
() : in_r1_(false) {}
concat_iter(first_iterator it,
concat_iter
first_iterator r1_last,
second_iterator r2_first,) :
Proj proj(r1_last),
r1_last_(it),
it1_(r2_first),
r2_first_(r2_first),
it2_(proj),
proj_(true)
in_r1_{
if (it1_ == r1_last_)
= false;
in_r1_ }
(second_iterator it,
concat_iter
first_iterator r1_last,
second_iterator r2_first,
Proj proj,) :
end_tag(r1_last),
r1_last_(r1_last),
it1_(r2_first),
r2_first_(it),
it2_(proj),
proj_(false)
in_r1_{}
& operator++() noexcept
concat_iter {
if (in_r1_) {
assert(it1_ != r1_last_);
++it1_;
if (it1_ == r1_last_)
= false;
in_r1_ } else {
++it2_;
}
return *this;
}
& operator--() noexcept
concat_iter {
if (!in_r1_) {
if (it2_ == r2_first_) {
= true;
in_r1_ --it1_;
} else {
--it2_;
}
} else {
--it1_;
}
return *this;
}
operator*() const noexcept
T {
return in_r1_ ? proj_(*it1_) : proj_(*it2_);
}
friend bool operator==(concat_iter lhs, concat_iter rhs)
{
return lhs.in_r1_ == rhs.in_r1_ &&
(lhs.in_r1_ ?
.it1_ == rhs.it1_ : lhs.it2_ == rhs.it2_);
lhs}
using base_type =
::stl_interfaces::proxy_iterator_interface<
boost<T, Proj, R1, R2>,
concat_iter::bidirectional_iterator_tag,
std>;
Tusing base_type::operator++;
using base_type::operator--;
private:
first_iterator r1_last_;
first_iterator it1_;
second_iterator r2_first_;
second_iterator it2_;
Proj proj_;bool in_r1_;
};
That’s a pleasantly low amount of code to write, and I was able to avoid implementing most of the boilerplate bits – the operations that need to exist for my type to be conforming bidirectional iterator, but I which I will not explicitly use.
In case you were wondering about the
T
template parameter, that just
gets around the need for some potentially-messy metaprogramming to
determine what concat_iter
’s
value_type
is – it’s whatever
you specify for T
.
With the iterator in hand, it becomes trivial to construct a
concat_view
, and a
concat()
function:
template<typename T, typename R1, typename R2>
struct concat_view
: std::view_interface<concat_view<T, R1, R2>>
{
using iterator = concat_iter<T, R1, R2>;
(iterator first, iterator last) :
concat_view(first), last_(last)
first_{}
() const noexcept { return first_; }
iterator begin() const noexcept { return last_; }
iterator end
private:
iterator first_;
iterator last_;};
template<typename T, typename R1, typename R2, typename Proj>
auto concat(R1 & r1, R2 & r2, Proj proj)
{
using iterator = concat_iter<T, Proj, R1, R2>;
return concat_view<T, R1, R2>(
(r1.begin(), r1.end(), r2.begin(), proj),
iterator(r2.end(), r1.end(), r2.begin(), proj,
iterator::end_tag{}));
iterator}
Finally, here it is at the point of use:
::vector<int> vec = { 0, 58, 48 };
std::list<my_int> list = { 80, 39 };
stdauto const v = concat<int>(vec, list, [](auto x) { return as_int(x); });
assert(std::ranges::find(v, 42) == v.end());
assert(std::ranges::find(v, 58) == std::next(v.begin()));
Some things to note here:
concat_iter::operator++()
intermixed with the algorithm code.When opportunities like this come up – that is, opportunities to write custom iterators – the dominant consideration is what it costs to write an iterator. If the cost is too high, we don’t write one, and the algorithm that we would have written in terms of the unwritten iterator becomes much more complicated (and thus error-prone). If the cost is low, we will start to notice opportunities to write custom iterators in many more places. One thing in particular that easy-to-write iterators enable is easy-to-write views and view adaptors.
Sometimes, you want to take an existing iterator and adapt it for a different use. Since you already have all the iterator operations defined on it, it would be useful to simply use those, and fill in the missing ones, or perhaps replace the ones that you want to work differently.
For example, if we wanted to make a filtering iterator for something
like filter_view
, we could write
it like this:
template<typename Pred>
struct filtered_int_iterator : boost::stl_interfaces::iterator_interface<
<Pred>,
filtered_int_iterator::forward_iterator_tag,
stdint>
{
() : it_(nullptr) {}
filtered_int_iterator(int * it, int * last, Pred pred) :
filtered_int_iterator(it),
it_(last),
last_(std::move(pred))
pred_{ it_ = std::find_if(it_, last_, pred_); }
// A forward iterator based on iterator_interface usually requires
// three user-defined operations. since we are adapting an existing
// iterator (an int *), we only need to define this one. The others are
// implemented by iterator_interface, using the underlying int *.
& operator++()
filtered_int_iterator {
= std::find_if(std::next(it_), last_, pred_);
it_ return *this;
}
// It is really common for iterator adaptors to have a base() member
// function that returns the adapted iterator.
int * base() const { return it_; }
private:
// Provide access to base_reference.
friend boost::stl_interfaces::access;
// These functions are picked up by iterator_interface, and used to
// implement any operations that you don't define above. They're not
// called base() so that they do not collide with the base() member above.
constexpr decltype(auto) base_reference(this auto& self) noexcept
{ return self.it_; }
int * it_;
int * last_;
Pred pred_;};
Though this implementation is limited to
int*
, it’s pretty easy to see
that you could change it to take an
Iterator
template parameter
instead, and after replacing
int*
with
Iterator
throughout, everything
would “just work”.
The proposal is to add a CRTP base template that will ease writing
iterators, in the same way that
std::view_interface
eases the
writing of views today.
iterator_interface_access
Before looking at the main
iterator_interface
template, we
neeed to look at some of the bits that allow that template to function.
First,
iterator_interface_access
.
iterator_interface_access
is
used by the adaptation logic to get access to the underlying iterator
being adapted, which is usually private. The user just needs to friend
iterator_interface_access
from
the iterator they write (just like the friend declaration in the
filtered_int_iterator
example
above).
struct iterator_interface_access
{
template<typename D>
static constexpr auto base(D& d) noexcept
-> decltype(d.base_reference())
{
return d.base_reference();
}
template<typename D>
static constexpr auto base(const D& d) noexcept
-> decltype(d.base_reference())
{
return d.base_reference();
}
};
proxy_arrow_result
Next, proxy_arrow_result
, a
specialization of which is used as the default
pointer
for proxy iterators
created using the
proxy_iterator_interface
alias
template (shown a bit later).
template<typename T>
requires is_object_v<T>
struct proxy_arrow_result
{
constexpr proxy_arrow_result(const T& value) noexcept(
noexcept(T(value))) :
(value)
value_{}
constexpr proxy_arrow_result(T&& value) noexcept(
noexcept(T(std::move(value)))) :
(std::move(value))
value_{}
constexpr const T* operator->() const noexcept { return &value_; }
constexpr T* operator->() noexcept { return &value_; }
private:
T value_;};
iterator_interface
itselftemplate<typename D, typename DifferenceType>
concept has-plus-eq = // exposition only
requires (D d) { d += DifferenceType(1); };
template<typename Pointer, typename T>
requires is_pointer_v<Pointer>
decltype(auto) make-iterator-pointer(T&& value) // exposition only
{ return addressof(value); }
template<typename Pointer, typename T>
auto make-iterator-pointer(T&& value) // exposition only
{ return Pointer(std::forward<T>(value)); }
template<
typename D,
typename IteratorConcept,
typename ValueType,
typename Reference = ValueType&,
typename Pointer = ValueType*,
typename DifferenceType = ptrdiff_t>
requires is_class_v<D> && same_as<D, remove_cv_t<D>>
class iterator_interface
{
private:
constexpr D& derived() noexcept { // exposition only
return static_cast<D&>(*this);
}
constexpr const D& derived() const noexcept { // exposition only
return static_cast<const D&>(*this);
}
public:
using iterator_concept = IteratorConcept;
using iterator_category = iterator_concept;
using value_type = remove_const_t<ValueType>;
using reference = Reference;
using pointer = conditional_t<
<iterator_concept, output_iterator_tag>>,
is_same_vvoid,
>;
Pointerusing difference_type = DifferenceType;
constexpr decltype(auto) operator*()
requires requires (D d) { *iterator_interface_access::base(d); } {
return *iterator_interface_access::base(derived());
}
constexpr decltype(auto) operator*() const
requires requires (const D d) { *iterator_interface_access::base(d); } {
return *iterator_interface_access::base(derived());
}
constexpr auto operator->()
requires requires (D d) { *d; } {
return make-iterator-pointer<pointer>(*derived());
}
constexpr auto operator->() const
requires requires (const D d) { *d; } {
return make-iterator-pointer<pointer>(*derived());
}
constexpr decltype(auto) operator[](difference_type n) const
requires requires (const D d) { d + n; } {
= derived();
D retval += n;
retval return *retval;
}
constexpr decltype(auto) operator++()
requires requires (D d) { ++iterator_interface_access::base(d); } &&
(!has-plus-eq<D, difference_type>) {
++iterator_interface_access::base(derived());
return derived();
}
constexpr decltype(auto) operator++()
requires requires (D d) { d += difference_type(1); } {
return derived() += difference_type(1);
}
constexpr auto operator++(int) requires requires (D d) { ++d; } {
= derived();
D retval ++derived();
return retval;
}
constexpr decltype(auto) operator+=(difference_type n)
requires requires (D d) { iterator_interface_access::base(d) += n; } {
::base(derived()) += n;
iterator_interface_accessreturn derived();
}
friend constexpr auto operator+(D it, difference_type n)
requires requires { it += n; } {
return it += n;
}
friend constexpr auto operator+(difference_type n, D it)
requires requires { it += n; } {
return it += n;
}
constexpr decltype(auto) operator--()
requires requires (D d) { --iterator_interface_access::base(d); } &&
(!has-plus-eq<D, difference_type>) {
--iterator_interface_access::base(derived());
return derived();
}
constexpr decltype(auto) operator--()
requires requires (D d) { d += -difference_type(1); } {
return derived() += -difference_type(1);
}
constexpr auto operator--(int) requires requires (D d) { --d; } {
= derived();
D retval --derived();
return retval;
}
constexpr decltype(auto) operator-=(difference_type n)
requires requires (D d) { d += -n; } {
return derived() += -n;
}
friend constexpr auto operator-(D lhs, D rhs)
requires requires { iterator_interface_access::base(lhs) -
::base(rhs); } {
iterator_interface_accessreturn iterator_interface_access::base(lhs) -
::base(rhs);
iterator_interface_access}
friend constexpr auto operator-(D it, difference_type n)
requires requires { it += -n; } {
return it += -n;
}
friend constexpr bool operator<(D lhs, D rhs)
requires equality_comparable<D> {
return (lhs - rhs) < typename D::difference_type(0);
}
friend constexpr bool operator<=(D lhs, D rhs)
requires equality_comparable<D> {
return (lhs - rhs) <= typename D::difference_type(0);
}
friend constexpr bool operator>(D lhs, D rhs)
requires equality_comparable<D> {
return (lhs - rhs) > typename D::difference_type(0);
}
friend constexpr bool operator>=(D lhs, D rhs)
requires equality_comparable<D> {
return (lhs - rhs) >= typename D::difference_type(0);
}
};
operator==
overloadAdditionally, we want free a
operator==
(and
compiler-provided operator!=
),
shown here. In addition to the constraints shown, they require that
D1
and
D2
are derived from
specializations of
iterator_interface
. This is so
the overload can pick up comparisons of iterators of different, but
interoperable types (like std::vector<int>::const_iterator
and
std::vector<int>::iterator
).
The full constraint is that one iterator is convertible to the other (in
either direction), and either: 1) their adapted bases are comparable; or
2) a D1
is subtractable from a
D1
.
I know those contraints look oddly specific; here’s why they are the way they are:
Every category of iterator besides output need to define
operator==
. For this reason,
each user-defined iterator derived from a specialization of
iterator_interface
will define
its own operator==
, unless it is
an output iterator, or unless it is a random access (or
contiguous) iterator. The exception for random access/contiguous is
because operator-
, which the
user must define for a random access/contiguous iterator, can be used to
implement operator==
– for two
iterators i1 and i2, just check if
i1 - i2 == 0
. This explains part
of the constraints; this free overload is the implementation for
operator==
for random
access/contiguous iterators.
For the base-equality-comparable part, similar logic applies: we
don’t want the user to have to define
operator==
when the underlying
adapted iterator undoubtedly already has it (unless it’s an output
iterator, of course).
template<typename D1, typename D2>
concept base-iter-comparable = // exposition only
requires (D1 d1, D2 d2) {
::base(d1) ==
iterator_interface_access::base(d2);
iterator_interface_access};
template<typename D1, typename D2>
constexpr bool operator==(D1 lhs, D2 rhs)
requires (is_convertible_v<D2, D1> || is_convertible_v<D1, D2>) &&
(base-iter-comparable<D1, D2> ||
requires (D1 d) { d - d; }) {
if constexpr (base-iter-comparable<D1, D2>) {
return (iterator_interface_access::base(lhs) ==
::base(rhs));
iterator_interface_access} else if constexpr (requires (D1 d) { d - d; }) {
return (lhs - rhs) == typename D1::difference_type(0);
}
}
proxy_iterator_interface
Finally, there’s an alias that makes it easier to define proxy iterators:
template<
typename Derived,
typename IteratorConcept,
typename ValueType,
typename Reference = ValueType,
typename DifferenceType = ptrdiff_t>
using proxy_iterator_interface = iterator_interface<
Derived,
IteratorConcept,
ValueType,
Reference,<Reference>,
proxy_arrow_result>; DifferenceType
The template parameter D
for
iterator_interface
may be an
incomplete type. Before any member of the resulting specialization of
iterator_interface
other than
special member functions is referenced,
D
must be complete, and model
std::derived_from<iterator_interface<D>>
.
Since we’re using CRTP, any operation that is provided by default
that you do not want, say because it has the wrong semantics, or because
you know of a more efficient way to do it, you can simply provided that
operation in your type, and the operation you would have inherited from
the iterator_interface
gets
hidden.
I did attempt to use
operator<=>
instead of the
individually-defined relational operators, but it created all kinds of
difficult-to-sort-out errors.
I also attempted to use deduced
this
in
iterator_interface
, but it did
not get rid of the need for the
D
template parameter.
D
is still needed to define the
hidden friend operators.
To get interoperability between user-created
const_iterator
and
iterator
types, the user must
make iterator
s convertible to
const_iterator
s. You can’t
automate everything.
Here is a handy table listing all the user-provided operations
necessary to implement all the various iterator concepts. This is every
user-defined operation, even though no one iterator requires all of
them. A following table wth indicate which operations are needed when
implementing which iterator concept. In the table,
Iter
is a user-defined type
derived from iterator_interface
;
i
and
i2
are objects of type
Iter
;
reference
is the type passed as
the Reference
template parameter
to iterator_interface
;
pointer
is the type passed as
the Pointer
template parameter
to iterator_interface
; and
n
is a value of type
difference_type
.
Expression
|
Return Type
|
Semantics
|
Notes
|
---|---|---|---|
*i |
Convertible to
reference . |
Dereferences i and returns
the result. |
Precondition: i is
dereferenceable. |
i == i2 |
Contextually convertible to
bool . |
Returns true if and only if
i and
i2 refer to the same value. |
Precondition:
(i, i2) is in the domain of
== . |
i2 - i |
Convertible to
difference_type . |
Returns n . |
Precondition: there exists a value
n of type
difference_type such that
i + n == i2 . |
++i |
Iter & |
Increments i . |
|
--i |
Iter & |
Decrements i . |
|
i += n |
Iter & |
difference_type m = n; if (m >= 0) while (m--) ++i; else while (m++) --i; |
This table shows which user-definable operations must be defined to
meet the requirements of each of the iterator concepts.
i
,
i2
, and
n
have the same meaning here as
they do in the previous table.
Concept
|
Operations
|
---|---|
input_iterator |
*i i == i2 ++i |
output_iterator |
*i ++i |
forward_iterator |
*i i == i2 ++i |
bidirectional_iterator |
*i i == i2 ++i --i |
random_access_iterator /continguous_iterator |
*i i - i2 i += n |