1. Revision History
r0: initial revision, based on the rational from [P2406R0]
2. Problem description
Using
on an input range, e.g.
, usually takes
an additional element from the underlying input source, because
uses
(when the range isn’t
) and
increments the internal iterator even if the counter has reached the requested
count.
Due to the nature of input (non-forward) range, where rereading is usually
impossible, taking this additional element means that element is lost forever.
If no additional element exists in the source (and the source wasn’t closed),
this operation hangs forever.
For example [CE-ISTREAM]:
#include <ranges>#include <iostream>#include <sstream>#include <cassert>namespace rn = std :: ranges ; namespace rv = rn :: views ; int main () { auto iss = std :: istringstream ( "012" ); for ( auto c : rn :: istream_view < char > ( iss ) | rv :: take ( 1 )) { std :: cout << c << '\n' ; } auto c = '\0' ; iss >> c ; std :: cout << c << std :: endl ; // flush it in case the assert fails assert ( c == '1' ); // FAILS, c == '2' }
3. This paper vs. P2406
We propose with [P2406R1] new tools that behave correctly with input ranges
(
and
). The existing tools must must not be
used on input ranges, as the behavior is always wrong, so we propse blocking
this usage to remove this footgun. We use a separated paper, so it’s easier to
merge this even if the new tools require more discussions and take longer to be
merged.
3.1. Current usage of counted_iterator
with input iterators
While investigating possible solutions to this problem, we found a bug in
range-v3 implementation of
when used on input iterators (see
the details in [range-v3-PR1664]). We believe that the fact this bug was there
for so long, suggests there is no much usage of input iterators, at least not
with
, so the potential break is minimal (besides our claim
that the behavior is already wrong and broken).
4. Current behavior is what the standard mandates
Under 23.5.6.5 [counted.iter.nav], the standard defines the behavior of
for
as:
Effects: Equivalent to:
It means that even when
becomes 0, the internal iterator is
incremented, thus consuming an additional item from the range, causing the
mentioned issue.
5. Some input iterators are different
behaves differently than
. While the
latter removes the element from the underlying source on
, the former
removes it only on the next
(the read on dereference done directly from the
underlying
). It means that
works flawlessly with
.
For example, we can adapt the previous example to use
and
get the expected behavior [CE-ISTREAMBUF]:
auto iss = std :: istringstream ( "012" ); auto ibuf_it = std :: istreambuf_iterator < char > ( iss ); for ( auto c : rn :: subrange ( ibuf_it , std :: default_sentinel ) | rv :: take ( 1 )) { std :: cout << c << '\n' ; } auto c = '\0' ; iss >> c ; std :: cout << c << std :: endl ; // flush it in case the assert fails assert ( c == '1' ); // SUCCEEDS
The conclusion is that we have to differentiate between "eager" and "lazy" types.
Side note: the differences in
behavior vs. other input
iterators was the source of other issues in the past, e.g. see [P0541] and [LWG2471]
6. Propagating the laziness
Similarly to what was done with
and similar traits, the trait
of being lazy must be propagated by adaptors like
and
.
7. Being lazy is not just for iterators
One of the adaptors that needs to propagated laziness through is
. As
works on any
, not
only on iterators, the lazy trait must be defined on
, not
just
.
Please notice that
(unlike
) is similar to
being single-pass, as its
operator isn’t
equality-preserving.
8. Open design questions
8.1. output_iterator
The suggested wording below allows constructing
from
, because typically its
doesn’t have any side effect. An
iterator is not
if (a) its
affects its source (and
invalidates all other copies) or (b) it isn’t even
. As a
result, an
that it isn’t
is a signal that
its
has side effects, and this is problematic for
, as
described above, so we block them. But for iterators that don’t model
in first place, we simply don’t know if their
might be
problematic. Still, we don’t expect to find such iterators in the wild. Adding
the requirement to enable explicitly each type of
, when we
expect most or all of them to be enabled, seemed redundant.
As we are not 100% sure about it, we seek for additional feedback here.
8.2. filter_view
Let’s consider the next example [CE-ISTREAMBUF-FILTER]:
auto buf = std :: stringbuf ( "a1x2d3f455gh6a" ); for ( auto c : rn :: subrange ( std :: istreambuf_iterator ( & buf ), std :: default_sentinel ) | rv :: filter ( static_cast < int ( & )( int ) > ( std :: isdigit )) | rv :: take ( 3 )) { std :: cout << c << " " ; } char c = buf . sgetc (); std :: cout << c << std :: endl ; // flush before the assert assert ( c == 'f' ); // FAILS, c == '4'
As we can see,
makes things more complex, as in some sense
is
by nature always eager. It takes the non-matching elements from the range even
if the next matching element will never be used.
Even with the planned
, that stops doing
after the last item has
already taken, with
it still does the wrong thing, because
then the last item (
) is left in the
. While this is the intended
behavior for
when used on things like
, usually
the user can decide between (1) using
and remembering to add a call
to
on the underlying iterator at the end to continue working with it, and
(2) using
and everything works as expected.
+
removes too
many elements even from lazy input iterator, making option 2 unviable, so the
user is forced to use option 1, which is error prone due to the manual advancing
required at the end.
The bottom line is that the wording below doesn’t change
, which
effectively means that
or
can’t be used on a
ed
, even on lazy ones. The open question is if we want
to allow this and give the user the option to use
+
on lazy
input iterator.
Side note: While discussing this, we noticed that there is no easy way to
continue using a range after using
on it, even if it’s a
, because even while the filtered-out elements aren’t lost, there
is no easy way to find the element next to the last one taken from the range.
For
it’s possible to move back with the negation of the
filter, but with
we must go over it again to find first the
matching element and then go to the next one. Probably the design of
assumed the whole range is used only in the current pipeline and
whatever left in it will not be reused later. But such reasoning doesn’t work
for
, where elements from the underlying source (e.g.
)
are lost forever.
8.3. join_view
is in some way a kind of
that filters out empty elements
(see the definition of
). It means those empty
elements are lost forever when using
. Similarly to
, we
don’t propose any change to
, which means it’s blocked from
even for lazy input iterators.
8.4. lazy_split_view
Similarly to
,
filters out the
separators (pattern), and those will lost forever when using
.
Again, we don’t propose changes to this iterator.
OTOH,
can be lazy, so users can write:
We think it should be considered lazy unconditionally, because users shouldn’t touch the original source while iterating the view that reads from it, and after the read is over, there are no assumptions on the source state. (REVISIT RATIONAL!)auto iss = std :: istringstream ( "0;1;2" ); auto ibuf_it = std :: istreambuf_iterator < char > ( iss ); for ( auto c : rn :: subrange ( ibuf_it , std :: default_sentinel ) | rv :: lazy_split ( rv :: single ( ';' ))) { std :: cout << c << '\n' ; }
(Unlike
,
works on
only, is
itself and so is its inner range, so no need to touch it here.)
9. Implementation experience
There is (partial) implementation of this proposal on a fork of MSFT STL [MSFT-STL-FORK]
10. Proposed Wording
Under 25.2 [iterator.synopsis], right after
paragraph, add:
Right before
paragraph add:
template < class CharT , class Traits >
inline constexpr bool enable_lazy_weakly_incrementable < istreambuf_iterator < CharT , Traits >> = true; // freestanding
Under 25.3.4 [iterator.concepts], between 25.3.4.4 [iterator.concept.winc] and 25.3.4.5 [iterator.concept.inc] add:
25.3.4.x Concept lazy_weakly_incrementable [iterator.concept.lazywinc]The
concept defines requirements for a type that is
an
and, if it models
too, doesn’t remove the
current element from the underlying source.
Remarks: Pursuant to [namespace.std], users may specialize
for cv-unqualified program-defined types. Such specializations shall be usable
in constant expressions ([expr.const]) and have type
.
[Example 1: Each specialization
of class template
([istreambuf.iterator]) models
because
is specialized to have the value true
, and
doesn’t remove the current element from the underlying
until moving to the next element.
— end example]
Under 25.5.6.1 [counted.iterator]:
Under 26.2 [ranges.syn],
Under 26.7.10.2 [range.take.view]:
11. Effect on current state
This breaks existing code that usescounted_iteraor
or anything based on it
with (non-lazy) input iterator, if there is any.
We believe this is Good Thing as we showed that
always does
the wrong thing for (non-lazy)
s.
At the very least, if this change isn’t accepted, we have to warn users against
using
(and its consumers) with
. We might
want to encourage implementations to produce diagnostic on such a usage.
We think this is a good candidate to be applied as a Defect Report.
12. TODO
Besides going over the C++23 new ranges (e.g. zip), here are the rest of the
changes that aren’t in the wording above yet, about how to specialize
for various adaptors:
-
-move_iterator < I > lazy < I > -
-common_iterator < I , S > lazy < I > -
-counted_iterator < I > lazy < I > -
-istreambuf_iterator < C , T > true
-
-iota_view < W , B >:: iterator lazy < W > -
-elements_view < V , N >:: iterator lazy < iterator < V >> -
-transform_view < V , F >:: iterator lazy < iterator < V >>
13. Acknowledgements
Many thanks to the Israeli NB members for their feedback and support, in particular Inbal Levi, Dvir Yitzchaki and Dan Raviv. Thanks r/cpp Reddit users for their feedback on P2406R0.