Document #: | P2881R0 |
Date: | 2023-05-18 |
Project: | Programming Language C++ |
Audience: |
EWG EWGI |
Reply-to: |
Jonathan Müller <[email protected]> Barry Revzin <[email protected]> |
We propose language support for internal iteration in a
for
loop by adding a
generator-based for
loop.
Instead of using
begin()
/end()
,
it transforms the body of the
for
loop into a lambda and
passes it to an overloaded
operator()
. The lambda will then
be invoked with each element of the range.
struct generator123
{
auto operator()(auto&& sink) const
{
::control_flow flow;
std
// With P2561: sink(1)??;
= sink(1);
flow if (!flow) return flow;
// With P2561: sink(2)??;
= sink(2);
flow if (!flow) return flow;
return sink(3);
}
};
for (int x : generator123())
{
::print("{}\n", x);
stdif (x == 2)
break;
}
For complex ranges, iterators are a bit annoying to write: There is
an awkward split between returning the value in
operator*
and computing the next
one in operator++
, and any state
needs to be awkwardly stored in the iterator to turn a state
machine.
C++20 coroutines and C++23
std::generator
make it a lot
more convenient. Compare e.g. the implementation of
std::views::filter
,
std::views::join
or the proposed
std::views::concat
with a
generator-based implementation:
template <typename Rng, typename Predicate>
auto filter(Rng&& rng, Predicate predicate) -> std::generator<…>
{
for (auto&& elem : rng)
if (predicate(elem))
co_yield std::forward<decltype(elem)>(elem);
}
template <typename Rng>
auto join(Rng&& rng_of_rng) -> std::generator<…>
{
for (auto&& rng : rng_of_rng)
co_yield std::ranges::elements_of(std::forward<decltype(rng)>(rng));
}
template <typename ... Rng>
auto concat(Rng&& ... rng ) -> std::generator<…>
{
((co_yield std::ranges::elements_of(std::forward<decltype(rng)>(rng))), ...);
}
However, ranges implemented using
co_yield
have three
disadvantages.
std::generator
, coroutines, and
iteratorsThe coroutine transformation requires heap allocation, splitting one function into multiple, and exception machinery. This adds overhead compared to the iterator version or a hand-written version.
In the simple case of benchmarking
std::views::iota
against a
coroutine-based version, the latter is 3x slower on GCC. Note that the
coroutine uses SimpleGenerator
not std::generator
.
SimpleGenerator
already
optimizes heap allocations by using a pre-allocated buffer and
terminates on an unhandled exception. The benchmark is available on quick-bench.
On clang, coroutines are better optimized, but they are still sometimes
slower than a callback-based version, especially when the range is small
and/or the computation per element cheap.
Just writing manual iterators instead isn’t a solution either. Common
state of the range (like
filter_view
’s predicate) is
stored in the view, which the iterator accesses via pointer. As
compilers can rarely proof non-aliasing with pointers, the state needs
to be repeatedly reloaded for every access. The only way to get full
control over performance is to use neither iterators nor coroutines,
which is a shame.
Yes, in principle compilers could optimize coroutines and iterators better, but there is a difference between having something that is only fast because of optimizers and something that is fast by design. Only the latter is a true zero-overhead abstraction.
co_yield
from nested scopeConsider a simple tree type which is either a leaf storing an
int
or a
std::vector
of child nodes.
struct tree
{
using leaf = int;
::variant<leaf, std::vector<tree>> impl;
std};
Writing a coroutine-based generator that yields data cannot directly
use std::visit
, as
co_yield
needs to be in the
top-level scope:
::generator<int> tree_data(const tree& t)
std{
::visit(overloaded([&](int data) {
stdco_yield data; // error
}, [&](const std::vector<tree>& children) {
for (auto& child : children)
co_yield std::ranges::elements_of(tree_data(child)); // error
}), t.impl);
}
Every single lambda needs to be turned into a generator as well, so we can finally yield its elements:
::generator<int> tree_data(const tree& t)
std{
auto sub =
::visit(overloaded([&](int data) -> std::generator<int> {
stdco_yield data;
}, [&](const std::vector<tree>& children) -> std::generator<int> {
for (auto& child : children)
co_yield std::ranges::elements_of(tree_data(child));
}), t.impl);
co_yield std::ranges::elements_of(sub);
}
This particular problem can also be solved using pattern matching,
but the underlying limitation remains: implementing a coroutine
generator cannot use “normal” helper functions as you cannot use
co_yield
in them; they need to
be turned into generators as well. If you can’t control the helper
functions, such as the case of standard library algorithms, you simply
can’t use them.
As specified, std::generator
only produces a single type, so writing a generator of
std::tuple
is not possible. Note
that this is only a limitation of
std::generator
and the iterator
interface it implements, not of coroutines itself.
In the think-cell-library,
this problem is solved using a concept of generator ranges. It
is in some aspects even weaker than an input range – it can only be
iterated over, and once the iteration is stopped, it cannot resume later
on. However, it is enough for algorithms that require only a single
loop, like
std::ranges::for_each
,
std::ranges::copy
, or
std::ranges::fold_left
(plus
variants and fold-like algorithms like
std::ranges::any_of
).
A generator range is implemented as a type with an overloaded
operator()
that takes another
callable, a sink. It will then use internal iteration invoking the sink
with each argument. Early exit
(i.e. break
) is possible by
returning tc::break_
from the
sink, which the operator()
then
detects, stops iteration, and forwards the result.
Range adapters like filter()
,
join()
or
concat()
are still easy to
write:
template <typename Rng, typename Predicate>
auto filter(Rng&& rng, Predicate predicate)
{
// Returned lambda is a generator range.
return [=](auto sink) {
// rng here is an iterator range, not a generator range.
// With this proposal, it can be either, as the range-based for loop will work with both.
// Without it, generator ranges would need to be handled specially.
for (auto&& elem : rng)
if (predicate(elem))
{
auto result = sink(std::forward<decltype(elem)>(elem));
if (result == tc::break_)
return result;
}
return tc::continue_;
};
}
However, we can also yield values from nested functions:
auto tree_data(const tree& t)
{
return [&](auto sink) {
auto flow =
::visit(overloaded([&](int data) {
std// Forward break/exit.
return sink(data);
}, [&](const std::vector<tree>& children) {
for (auto& child : children)
{
auto flow = tree_data(child)(sink);
if (flow == tc::break_)
// Forward early break and do actually break.
return flow;
}
return tc::continue_;
}), t.impl);
return flow;
}
}
As shown in the benchmark, the performance is comparable to the range implementation and in think-cell’s experience often even faster.
The proposed design consists of two mandatory parts and optional syntax sugar.
std::control_flow
namespace std
{
/// A control flow tag type and object that means `continue`.
struct continue_t
{
// Empty.
constexpr operator std::true_type() const noexcept
{
return {};
}
constexpr std::false_type operator!() const noexcept
{
return {};
}
friend std::strong_ordering operator<=>(continue_t, continue_t) noexcept = default;
};
inline constexpr continue_t continue_;
/// A control flow tag type and object that means `break`.
struct [[nodiscard("need to forward break")]] break_t
{
// Empty.
constexpr operator std::false_type() const noexcept
{
return {};
}
constexpr std::true_type operator!() const noexcept
{
return {};
}
friend std::strong_ordering operator<=>(break_t, break_t) noexcept = default;
};
inline constexpr break_t break_;
/// A control flow object that can mean `continue`, `break` or some implementation-defined `break`-like state.
class [[nodiscard("need to forward control flow")]] control_flow
{
public:
/// Create a control flow object that means `continue`.
constexpr control_flow(continue_t) noexcept;
constexpr control_flow() noexcept : control_flow(continue_) {}
/// Create a control flow object that means `break`.
constexpr control_flow(break_t) noexcept;
/// Trivially copyable.
/// Return `true` if the control flow means `continue`, `false` otherwise.
constexpr explicit operator bool() const noexcept;
constexpr friend bool operator==(control_flow, control_flow) noexcept;
constexpr friend std::strong_ordering operator<=>(control_flow, control_flow) noexcept;
};
}
std::control_flow
is a class
that represents a break
or
continue
result of a function.
It is essentially a strongly-typed version of
bool
, where
true
means
continue_
and
false
means
break_
. However, an
implementation may give it additional states such as
return
(which maps to
break
) or
goto label
(which also maps to
break
).
Note that std::continue_
and
std::break_
are not just a named
constant of type
std::control_flow
, but of a
distinct tag type, which has an implicit conversion to
std::true/false_type
and a
likewise constant operator!
.
This ensures that the common case of “always continue” or “always break”
can be encoded in the type system and guarantee optimizations.
The standardization of
std::control_flow
is the bare
minimum to enable writing generator ranges:
struct generator123
{
// `sink` takes the type yielded by the generator range.
// and returns `std::control_flow`, `std::continue_t`, or `std::break_t`.
auto operator()(auto&& sink) const -> std::control_flow
{
::control_flow flow;
std
= sink(1);
flow if (!flow) return flow;
= sink(2);
flow if (!flow) return flow;
return sink(3);
}
};
template <typename Generator>
void use_generator_range(Generator&& generator)
{
([](auto value) {
generator::print("{}\n", value);
std// Unconditionally continue generating.
return std::continue_;
});
([](auto value) -> std::control_flow {
generator// Exit early if we see the value 42.
if (value == 42)
return std::break_;
::print("{}\n", value);
stdreturn std::continue_;
});
}
The remaining features are “just” language support to make this idiom nicer.
for
loop
User code
|
Lowered code
|
---|---|
|
|
Previously, the range-based
for
loop was always lowered to
an iterator-based for
loop if the type has
begin()
and
end()
. We propose that it can
also be lowered to a generator-based
for
loop if the type has an
operator()
that accepts a
callable and returns
std::control_flow
,
std::break_t
or
std::continue_t
. If the type has
both begin()
and
end()
, the generator-based
version is preferred as it is expected to be faster.
The for
loop is lowered to a
call to operator()
on the
generator, passing it the body of the loop transformed into a lambda.
That lambda takes the type specified in the binding of the loop
qualified with &&
.
Reference collapsing rules ensure that it is turned into an lvalue
reference if necessary. The extra step is necessary as the binding of
the for
-loop might be a
structured binding which is not allowed as a lambda argument. The lambda
body then consists of the binding of the range-based
for
loop, the body of the loop,
and a (potentially unreachable)
return std::continue_
.
The loop body is kept as-is, except for control flow statements which need to be translated:
continue;
is
transformed to
return std::continue_
.break;
is
transformed to
return std::break_
.return;
is transformed to
return implementation-defined
,
which has the same effect as
std::break_
but also causes the
compiler to forward the return
after the loop.return expr;
is
transformed to something that evaluates
expr
and stores it
somewhere, before a return implementation-defined
which has the same effect as
std::break_
but also causes the
compiler to forward the return
after the loop.goto
that exits the
range-based for
loop is
transformed to return implementation-defined
,
which has the same effect as
std::break_
but also causes the
compiler to forward the goto
after the loop.throw
is kept as-is.co_await
/co_yield
/co_return
is ill-formed (but see below for a
discussion about that).The return type of the lambda is
std::continue_t
or
std::break_t
if that is the only
returned control-flow case, or
std::control_flow
otherwise. If
the lambda exits with an exception or returns
std::break_
it must not be
called again. The return value of the lambda must be forwarded through
the operator()
.
After the call to operator()
,
the compiler may insert additional code to handle a
return
or
goto
of the body, to ensure that
control flow jumps to the desired location.
User code
|
Lowered code
|
---|---|
|
|
More example lowerings are given on godbolt. Note that the
compiler may do a more efficient job during codegen than what we can
express in C++. In particular,
return expr
must
directly construct the expression in the return slot to enable copy
elision.
The implementation of a generator will necessarily have code that invoke the sink and does an early return:
auto flow = sink(value);
if (!flow) return flow;
This is annoying, but also precisely the pattern [P2561R1] sets out to solve with its error propagation operator. If adopted, we can instead write:
(value)??; sink
Alternatively, we could keep the association with coroutine
generators and use a special
co_yield
-into statement:
co_yield(sink) value;
// or
co_yield(sink, value);
// or
co_yield[sink] value;
// or
co_yield<sink> value;
However, re-using co_yield
in
that way might result in parsing ambiguities with regular coroutine
co_yield
, so that particular
syntax might be unfeasible.
std::ranges
support (future
paper)If there is interest in the feature, a separate paper will add
support for generator ranges to the standard library. This includes
concepts for generator ranges, turning views into generator ranges for
better iteration performance, and adding support for generator ranges to
single-loop range algorithms like
std::ranges::for_each
,
std::ranges::copy
, or
std::ranges::fold_left
(plus
variants and fold-like algorithms like
std::ranges::any_of
).
As proposed, the generator-based
for
loop is triggered when the
range type has an operator()
that takes a sink and returns
std::control_flow
,
std::break_t
, or
std::continue_t
. This spelling
has the advantage that lambdas can directly be used as generators. For
example, if the range adapters were updated to use generator ranges
(like the ones in the think-cell-library), we could write code like
this:
// Copy a C FILE to a buffer.
::ranges::copy([&](auto&& sink) {
std::control_flow flow;
stdwhile (true)
{
auto c = std::fgetc(file);
if (c == EOF)
break;
= sink(c);
flow if (!flow)
break;
}
return flow;
}, buffer);
However, we are open to a different spelling. An early draft of the
paper used operator for
instead
of operator()
, where
operator for
is a new
overloadable operator. This can make it more obvious that a type has
custom behavior when used with a range-based
for
loop.
Using operator for
means that
lambdas no longer work as-is and would need to be wrapped into a type
that has an operator for
which
calls the lambda. That works, but is also a bit unnecessary.
Alternatively, a lambda that takes a sink and returns a type like
std::control_flow
could
automatically get an
operator for
overload. Either
the language provides one as member, or the standard library provides a
generic non-member overload for callables. At that point, we might as
well use operator()
again
though.
auto
in
for
loopWhat should happen if the for
loop uses auto
for the type of
the loop variable?
for (auto x : generator123())
…
With the specification of
operator()
it is very difficult
for the compiler to figure out what is the actual value type of the
operator()
– it has to look in
the body to see what is passed to the callable. There is also no
limitation to only one type; an implementation may invoke the sink with
different types.
There are three approaches to avoid the deduction:
for
loop, they have to specify a
type for the loop variable.operator()
as
that is std::control_flow
or
related.for
loop into a generic lambda with an
auto
parameter instead of a
fixed type.auto
parameter in that
particular compiler-generated lambda then would be more like the
auto
in a return type or
variable declaration: a way to name a specific, yet unknown type, and
not a template.Option 1 is annoying. Option 2 is not easy because the type of the
range depends on the cv-ref qualifications of
*this
. think-cell uses
decltype()
of a member function
call for that purpose:
struct container
{
// Non-const containers yield `int&`.
int& generator_output_type(); // no definition
// const containers yield `const int&`.
const int& generator_output_type() const; // no definition
};
template <typename T>
using generator_output_type = decltype(std::declval<T>().generator_output_type());
However, adding an undefined member function might be a bit too weird for the standard.
Option 3 has the side-effect of allowing iteration of types like
std::tuple
:
template <typename ... T>
struct tuple
{
auto operator()(auto&& sink) const
{
return generator_impl(std::index_sequence_for<T...>{}, sink);
}
template <std::size_t ... Idx, typename Fn>
auto generator_impl(std::index_sequence<Idx...>, Fn&& sink) const
{
::common_type_t<decltype(sink(std::get<Idx>(*this)))...> flow;
std// && short-circuits once we have a break-like control flow.
((flow = sink(std::get<Idx>(*this))) && ...);
return flow;
}
};
…
for (auto elem : std::make_tuple(42, 3.14f, "hello"))
::print("{}\n", elem); std
This is a different way of implementing expansion statements [P1306R1].
Even if EWG doesn’t want to support multiple output types for
for
, the library idiom of using
operator()
naturally supports
it. Limiting the idiom just for the sake of the language feature is
wrong – we still might want to iterate tuples by calling
operator()
manually. It is
better to just constrain the use in
for
but not the general spelling
of the generator function, as done in option 4.
std::stacktrace
and
__func__
What does the following code print?
void foo()
{
for (int x : generator123())
{
::cout << __func__ << ' ' << std::stacktrace::current() << '\n';
std}
}
Does it print the name foo
and the stack trace beginning at
foo
, as that is what it looks
like syntactically, or does it report the true stack trace involving the
compiler generated lambda and arbitrarily many calls caused by
operator()
?
The authors prefer the second choice where a stack trace reports the true stack trace and not the syntactic one.
for
body and
try
-catch
in operator for
We expect that the following code will
throw
on the first iteration
(assuming the range is
[1, 2, 3]
) and thus never call
g()
or
h()
.
for (int x : generator123())
{
if (x == 1) throw 42;
();
g}
(); h
But what if we have a somewhat malicious
operator()
implementation that
swallows all exceptions?
struct generator123
{
auto operator()(auto&& sink) const
{
try
{
return sink(1);
}
catch (...) {}
}
};
As specified, the exception propagates out of
sink()
and is then discarded by
the operator()
, so we will call
h()
.
There are three approaches here:
operator()
, that’s on them.operator()
. Using
some compiler magic, we set a flag or something on the exception which
disables
try
/catch
.
Once the exception has left the
operator()
, the flag is reset.
This makes it impossible to catch exceptions thrown by the sink until
they’re back in the syntactic scope.operator()
. Similar to 2, but
operator for
is allowed to see
the exception, just not to swallow them. After a
catch
, exceptions are
automatically re-thrown.The problem with 2 is that a generator might change itself during
iteration inside operator()
.
When an exception is thrown, it needs to detect that to restore its
previous state. The problem with 3 is potential overhead caused by the
extra exception machinery, which we don’t want to pay only to guard
against malicious actors. Keep in mind that the
operator()
, the call to the
sink
, and a
try
/catch
surrounding the sink might be separated by an arbitrary amount of
intermediate function calls, so it cannot be done statically. As such,
the authors prefer option 1.
co_await
/co_yield
/co_return
in for
bodyWhen a user co_await
s or
co_yield
s inside the body of a
generator-based for
loop, we
have a problem since we’re no longer inside a coroutine, but a
compiler-generated lambda. To handle that properly both the compiler
generated lambda and the
operator()
needs to be turned
into coroutines to properly propagate the
co_await
.
While that could work for the compiler-generated lambda, it would be
problematic for the operator()
,
as we now sometimes need
co_await
on the sink. This
problem relates to the general problem of coroutines with generic code,
and is best solved by a general solution for conditional
co_await
in generic code.
We thus propose to make it ill-formed to use coroutine statements in
the body of a generator-based
for
loop (at least for now).
Alternatively, we could instead silently fallback to the traditional
iterator-based for
loop which
does not suffer from this problem. This works nicely with generic code,
but requires that the type also has
begin()
and
end()
(otherwise it is
ill-formed again).
The implementation in think-cell allows arbitrary return types for
the sink, not just
std::control_flow
and related
types. If a type is not a control flow type including
void
, it is treated as
std::continue_
. That way, if a
sink is written by hand or wraps an existing function, if it doesn’t
want an early return, it doesn’t need to do anything. Otherwise, it
would need to add an unnecessary
return std::continue_
.
If a generator range is only used with the range-based
for
loop, this is not necessary
as the compiler generated sink will always have the right type, but in
think-cell’s experience where generators can be used with arbitrary
algorithms and their hand-written lambdas, it is very convenient if the
work of adjusting the return type is moved from the caller to the
generator implementation.
Relaxing the return type complicates the implementation for sink
calls, which now need to translate an arbitrary type (including void) to
std::continue_t
before
branching. As such, it is useful to have a standardized helper function
for that logic, e.g. a std::control_flow::invoke(sink, value)
:
template <typename Sink, typename ... Args>
constexpr auto std::control_flow::invoke(Sink&& sink, Args&&... args)
{
using result = std::invoke_result_t<Sink&&, Args&&...>;
if constexpr (std::is_same_v<result, std::control_flow>
|| std::is_same_v<result, std::continue_t>
|| std::is_same_v<result, std::break_t>)
{
return std::invoke(std::forward<Sink>(sink), std::forward<Args>(args)...);
}
else
{
::invoke(std::forward<Sink>(sink), std::forward<Args>(args)...);
stdreturn std::continue_;
}
}
Alternatively, the selected syntax sugar could be specified to do the translation as well.