1. Revision History
r0: initial revision, based on the rational from [P2406R0]
2. Problem description
Using on an input range, e.g. , usually takes
an additional element from the underlying input source, because uses (when the range isn’t ) and increments the internal iterator even if the counter has reached the requested
count.
Due to the nature of input (non-forward) range, where rereading is usually
impossible, taking this additional element means that element is lost forever.
If no additional element exists in the source (and the source wasn’t closed),
this operation hangs forever.
For example [CE-ISTREAM]:
#include <ranges>#include <iostream>#include <sstream>#include <cassert>namespace rn = std :: ranges ; namespace rv = rn :: views ; int main () { auto iss = std :: istringstream ( "012" ); for ( auto c : rn :: istream_view < char > ( iss ) | rv :: take ( 1 )) { std :: cout << c << '\n' ; } auto c = '\0' ; iss >> c ; std :: cout << c << std :: endl ; // flush it in case the assert fails assert ( c == '1' ); // FAILS, c == '2' }
3. This paper vs. P2406
We propose with [P2406R1] new tools that behave correctly with input ranges
( and ). The existing tools must must not be
used on input ranges, as the behavior is always wrong, so we propse blocking
this usage to remove this footgun. We use a separated paper, so it’s easier to
merge this even if the new tools require more discussions and take longer to be
merged.
3.1. Current usage of counted_iterator with input iterators
While investigating possible solutions to this problem, we found a bug in
range-v3 implementation of when used on input iterators (see
the details in [range-v3-PR1664]). We believe that the fact this bug was there
for so long, suggests there is no much usage of input iterators, at least not
with , so the potential break is minimal (besides our claim
that the behavior is already wrong and broken).
4. Current behavior is what the standard mandates
Under 23.5.6.5 [counted.iter.nav], the standard defines the behavior of for as:
Effects: Equivalent to:
It means that even when becomes 0, the internal iterator is
incremented, thus consuming an additional item from the range, causing the
mentioned issue.
5. Some input iterators are different
behaves differently than . While the
latter removes the element from the underlying source on , the former
removes it only on the next (the read on dereference done directly from the
underlying ). It means that works flawlessly with .
For example, we can adapt the previous example to use and
get the expected behavior [CE-ISTREAMBUF]:
auto iss = std :: istringstream ( "012" ); auto ibuf_it = std :: istreambuf_iterator < char > ( iss ); for ( auto c : rn :: subrange ( ibuf_it , std :: default_sentinel ) | rv :: take ( 1 )) { std :: cout << c << '\n' ; } auto c = '\0' ; iss >> c ; std :: cout << c << std :: endl ; // flush it in case the assert fails assert ( c == '1' ); // SUCCEEDS
The conclusion is that we have to differentiate between "eager" and "lazy" types.
Side note: the differences in behavior vs. other input
iterators was the source of other issues in the past, e.g. see [P0541] and [LWG2471]
6. Propagating the laziness
Similarly to what was done with and similar traits, the trait
of being lazy must be propagated by adaptors like and .
7. Being lazy is not just for iterators
One of the adaptors that needs to propagated laziness through is . As works on any , not
only on iterators, the lazy trait must be defined on , not
just .
Please notice that (unlike ) is similar to being single-pass, as its operator isn’t
equality-preserving.
8. Open design questions
8.1. output_iterator
The suggested wording below allows constructing from , because typically its doesn’t have any side effect. An
iterator is not if (a) its affects its source (and
invalidates all other copies) or (b) it isn’t even . As a
result, an that it isn’t is a signal that
its has side effects, and this is problematic for , as
described above, so we block them. But for iterators that don’t model in first place, we simply don’t know if their might be
problematic. Still, we don’t expect to find such iterators in the wild. Adding
the requirement to enable explicitly each type of , when we
expect most or all of them to be enabled, seemed redundant.
As we are not 100% sure about it, we seek for additional feedback here.
8.2. filter_view
Let’s consider the next example [CE-ISTREAMBUF-FILTER]:
auto buf = std :: stringbuf ( "a1x2d3f455gh6a" ); for ( auto c : rn :: subrange ( std :: istreambuf_iterator ( & buf ), std :: default_sentinel ) | rv :: filter ( static_cast < int ( & )( int ) > ( std :: isdigit )) | rv :: take ( 3 )) { std :: cout << c << " " ; } char c = buf . sgetc (); std :: cout << c << std :: endl ; // flush before the assert assert ( c == 'f' ); // FAILS, c == '4'
As we can see, makes things more complex, as in some sense is
by nature always eager. It takes the non-matching elements from the range even
if the next matching element will never be used.
Even with the planned , that stops doing after the last item has
already taken, with it still does the wrong thing, because
then the last item () is left in the . While this is the intended
behavior for when used on things like , usually
the user can decide between (1) using and remembering to add a call
to on the underlying iterator at the end to continue working with it, and
(2) using and everything works as expected. + removes too
many elements even from lazy input iterator, making option 2 unviable, so the
user is forced to use option 1, which is error prone due to the manual advancing
required at the end.
The bottom line is that the wording below doesn’t change , which
effectively means that or can’t be used on a ed , even on lazy ones. The open question is if we want
to allow this and give the user the option to use + on lazy
input iterator.
Side note: While discussing this, we noticed that there is no easy way to
continue using a range after using on it, even if it’s a , because even while the filtered-out elements aren’t lost, there
is no easy way to find the element next to the last one taken from the range.
For it’s possible to move back with the negation of the
filter, but with we must go over it again to find first the matching element and then go to the next one. Probably the design of assumed the whole range is used only in the current pipeline and
whatever left in it will not be reused later. But such reasoning doesn’t work
for , where elements from the underlying source (e.g. )
are lost forever.
8.3. join_view
is in some way a kind of that filters out empty elements
(see the definition of ). It means those empty
elements are lost forever when using . Similarly to , we
don’t propose any change to , which means it’s blocked from even for lazy input iterators.
8.4. lazy_split_view
Similarly to , filters out the
separators (pattern), and those will lost forever when using .
Again, we don’t propose changes to this iterator.
OTOH, can be lazy, so users can write:
We think it should be considered lazy unconditionally, because users shouldn’t touch the original source while iterating the view that reads from it, and after the read is over, there are no assumptions on the source state. (REVISIT RATIONAL!)auto iss = std :: istringstream ( "0;1;2" ); auto ibuf_it = std :: istreambuf_iterator < char > ( iss ); for ( auto c : rn :: subrange ( ibuf_it , std :: default_sentinel ) | rv :: lazy_split ( rv :: single ( ';' ))) { std :: cout << c << '\n' ; }
(Unlike , works on only, is itself and so is its inner range, so no need to touch it here.)
9. Implementation experience
There is (partial) implementation of this proposal on a fork of MSFT STL [MSFT-STL-FORK]
10. Proposed Wording
Under 25.2 [iterator.synopsis], right after paragraph, add:
Right before paragraph add:
template < class CharT , class Traits > inline constexpr bool enable_lazy_weakly_incrementable < istreambuf_iterator < CharT , Traits >> = true; // freestanding
Under 25.3.4 [iterator.concepts], between 25.3.4.4 [iterator.concept.winc] and 25.3.4.5 [iterator.concept.inc] add:
25.3.4.x Concept lazy_weakly_incrementable [iterator.concept.lazywinc]The concept defines requirements for a type that is
an and, if it models too, doesn’t remove the
current element from the underlying source.
Remarks: Pursuant to [namespace.std], users may specialize for cv-unqualified program-defined types. Such specializations shall be usable
in constant expressions ([expr.const]) and have type .
[Example 1: Each specialization of class template
([istreambuf.iterator]) models because
is specialized to have the value true, and
doesn’t remove the current element from the underlying
until moving to the next element.
— end example]
Under 25.5.6.1 [counted.iterator]:
Under 26.2 [ranges.syn],
Under 26.7.10.2 [range.take.view]:
11. Effect on current state
This breaks existing code that usescounted_iteraor or anything based on it
with (non-lazy) input iterator, if there is any.
We believe this is Good Thing as we showed that always does
the wrong thing for (non-lazy) s.
At the very least, if this change isn’t accepted, we have to warn users against
using (and its consumers) with . We might
want to encourage implementations to produce diagnostic on such a usage.
We think this is a good candidate to be applied as a Defect Report.
12. TODO
Besides going over the C++23 new ranges (e.g. zip), here are the rest of the
changes that aren’t in the wording above yet, about how to specialize for various adaptors:
-
-move_iterator < I > lazy < I > -
-common_iterator < I , S > lazy < I > -
-counted_iterator < I > lazy < I > -
-istreambuf_iterator < C , T > true -
-iota_view < W , B >:: iterator lazy < W > -
-elements_view < V , N >:: iterator lazy < iterator < V >> -
-transform_view < V , F >:: iterator lazy < iterator < V >>
13. Acknowledgements
Many thanks to the Israeli NB members for their feedback and support, in particular Inbal Levi, Dvir Yitzchaki and Dan Raviv. Thanks r/cpp Reddit users for their feedback on P2406R0.