constexpr coroutines
This paper is proposing making coroutines functional during constant evaluation. Even when most of use-cases for coroutines are based on I/O and event based, coroutines are still useful for compile-time based computation, eg. std::generator
.
Changes
- R2 → R3: added motivation text, added AST transform approach examples
- R1 → R2: Updated wording to elide allocator functionality in coroutine promise types. Added discussion about alternative implementation strategies. Added information about decision by various groups which seen this paper.
- R0 → R1: Changed wording mentioning lifetime of coroutines after expr.const 5.2
Timeline
EWG in Wroclaw
SF | F | N | A | SA |
---|---|---|---|---|
4 | 12 | 3 | 0 | 0 |
Result: Consensus
LEWG in Wroclaw
SF | F | N | A | SA |
---|---|---|---|---|
12 | 5 | 0 | 0 | 0 |
Result: Unanimous consent
CWG in Wroclaw
Multiple implementers expressed concerns about resources needed for the implementation of this feature, where the resources might be better spent implementing C++ features where market demand is higher.
EWG is requested to reconsider putting this feature into the standard.
Note: CWG did review of the wording and only question was about what to do with allocation / deallocation functions. Wording was updated to elide these calls during constant evaluation of a coroutine. Concern about resources was reflected by updating the paper to contain informations about alternative implementation strategies.
Based on anecdotal evidence people don't use coroutines for two reasons: one is missing standard library support (which is partially resolved with std::generator
) and second is mutual exclusivity with much more popular constant evaluated code. Having constant evaluation of couroutines will help alleviate this and I do expect seeing libraries implementing lazy coroutine based parsers.
Quote
Well, you just told me coroutines are the best way to solve some problems, so wouldn't I also want to use the Best Way at compile time? (quote from Jason Turner, co-author of "constexpr all the things" talk)
Motivation
Currently C++'s doesn't allow coroutines to be constexpr
. This limitation forces users to choose between having constexpr
compatible library or (maybe) simpler coroutine interface. As a library author I do prefer constexpr
but I do not want to make this decision.
I think this limitation and weak library support for coroutines are main limiting factors for bigger adoption of coroutines. I personally want to be able to write CTRE coroutine interface which allows partially process input on a regular expression state and continue later when it will be resumed.
Coroutines are not just for asynchronous
Most people see uses for coroutines for asynchronous and event based programming. Coroutines are useful for parsers, modeling clock-perfect timing in emulators, and generally any state machines.
Fibonnaci generator
When this paper is merged into the standard, users will be able to use std::generator
-like coroutines to generate or calculate data.
template <typename T> constexpr auto fib() -> std::generator<T> {
T a = 0;
T b = 1;
co_yield a;
do {
co_yield b;
auto tmp = b;
b += a;
a = tmp;
} while (true);
}
template <typename T, size_t N> constexpr auto calculate_fibonnaci() {
auto res = std::array<T, N>{};
std::ranges::copy(fib<T>() | std::views::take(N), res.begin());
return res;
}
constexpr auto cached_fibonnaci = calculate_fibonnaci<unsigned, 20>();
Tokenizer
TODO: tokenizer
Evaluation of constexpr coroutines
Implementation must make sure to avoid stack exhaustion and store evaluation state and coroutine's local variables in a way to avoid it. By simply resuming coroutine and evaluating it on system stack (in case of AST walking implementations) will lead to it.
This is because coroutines can be interlieved and their state must be maintained to be resumed. The stack then contains mixed coroutines and normal code together. To avoid this situation the easiest way to model a coroutine is to use coroutine (or coroutine-like functionality) to store the state (represented with values on stack) somewhere else. Because AST walk is unbounded, obvious first choice is a stackfull coroutine (fiber, not a thread!).
Lifetime of coroutine is bounded to be only within constant evaluation similary as memory allocation. Any coroutine leaking outside boundaries of constant evaluation means whole constant evalution failed.
Implementation experience
Partially implemented in clang available on my github, implementation should be ready for its presentation at Wroclaw meeting, and also will be soon available on compiler explorer (thanks Matt!).
Most of functionality needed was already present in Clang, it was mostly about removing checks in the parser.
Another part was implementing the functionality in Clang's interpreter and there I needed to add fibers (stackfull coroutines) as the interpreter recursive walks over AST. Ability to save interpreter's stack content did minimize impact of the change to only resuming, suspending, and variable storage and life-time management.
At the end of evaluation the interpret needs to check objects holding fibers if there is still any coroutine not released, if there is it report similar error as when there is an unreleased memory allocation.
Hardest problem was implementing local "stack", as createLocal
function was designed around idea of having only one branch of evaluation. This I solved by providing context of currently evaluated coroutine in EvalInfo
and switching it on every suspension / resume of a coroutine.
Alternative implementation approaches
The implementation is using fibers (not threads!) to create a storage for execution of coroutines outside of main stack. Main requirement here is to make sure jumping between coroutines won't run out of the system stack.
Following subpart of paper shows alternative approaches which can model coroutines. All of them have same expressive ability and model coroutines well. These implementation has various advantages and costs. For every model of constant evaluation should be possible found good and usable match.
Byte-code interpreter
Clang implementation is starting to use a byte-code virtual machine based interpreter. In such implementation the implementation approach is identical for actually compiling coroutines to a native code. Only thing you need is ability to change current stack / or reference coroutine variables based on offset to coroutine frame.
Such interpret has many advantages over AST walking one mostly in speed and security. I was told initial measurements in the byte-code interpreter in clang are three orders of magnitude faster over recursive AST walking. Security is inherit to fact of not using system stack for the evaluation at all.
C++ coroutines
For compilers which are using C++20 code itself another implementation strategy is to not use the system stack for walking AST, but change all necessory functions into coroutines which would model ordinary functions with only one return point to its callee. But these functions can be suspended and evaluation can switch to evaluation of a constexpr coroutine. Because all coroutines will be allocated on heap, this also hardens implementation's constexpr evaluator.
AST transformation / split
Last feasible approach is to implement a transformation of coroutine function, into multiple trees, representing all possible paths which can be taken from each suspension point. Everytime such splitted coroutine is suspended, constexpr evaluator can unroll all its recursive functions. Implementation then needs to implement a storage for coroutine local variables which survive suspension. Hardest part is to implement special AST nodes to represent expressions split for suspension in middle.
Note: similar split is already done in compiler (for Clang at LLVM level, GCC just before leaving frontend, for Swift in its frontend)
C++ coroutines are already defined as a set of transformations over a function based on detection of co_await
, co_yield
, and co_return
keywords. This approach is going a bit further but it can still be done manually.
Standard transformation
Simple generator which take a value, yield it and later yield value2.
auto simple(T start) -> std::generator<T> {
co_yield start;
auto quad = start * start;
co_yield quad;
}
This code is current transformed into:
auto simple(T start) -> std::generator<T> {
return std::generator<T>{ coroutine-frame {
std::generator<T>::promise_type _promise{};
auto start = start; // copy arguments
try {
co_await _promise.initial_suspend();
co_await _promise.yield_value(start);
auto quad = start * start;
co_await _promise.yield_value(quad);
// body
} catch (...) {
if (!initial-await-resume-called) {
throw;
}
_promise.promise.unhandled_exception();
}
// final-suspend:
co_await promise.final_suspend() ;
}};
}
As you can see, it's all just transformation. A function into a coroutine body explicitly managing life-time of a promise type, and co_yield
into co_await
. Compiler is already doing a lot of transformation in frontend. This proposed implementation strategy is just making a bit more transformations.
Transformation into void-style continuation
Any coroutine can be transformed into a set of void
returning functions:
coroutine simple(T argument) {
auto result = co_await normal_function(argument);
co_return result;
}
With resulting code looking somehow like this:
struct coroutine_state_base {
// where to jump
void (*resume_point)(coroutine_state_base *) = nullptr;
};
// unique type per each coroutine function
struct __coroutine_state: coroutine_state_base {
// all arguments here...
T argument;
// promise type
coroutine::promise_type promise;
// all variables / temporaries which can survive suspension
initial_awaiter initial_suspend_result;
final_awaiter final_suspend_result;
decltype(normal_function(argument)) tmp;
};
constexpr coroutine simple(T argument) { // <------------- starting point
// create state
auto * state = new __coroutine_state{};
// copy arguments
state->argument = argument;
// construct promise
new (&state->promise) coroutine::promise_type{...};
// construct result
auto result = state->promise.get_return_object();
// call (jump-into) coroutine body
simple_start(state);
// return result object with coroutine handle (usually)
return result;
}
constexpr void simple_start(coroutine_state_base * vstate) {
auto * state = static_cast<__coroutine_state *>(vstate);
// INITIAL SUSPEND
state->initial_suspend_result = state->promise.initial_suspend()
if (!state->initial_suspend_result.await_ready()) {
state->resume_point = &simple_after_init_suspend;
// tail-recursion jump to next coroutine or continue
NEXT-COROUTINE(state->initial_suspend_result.await_suspend(std::coroutine_handle{state}));
}
return simple_after_init_suspend(state); // tail-recursion
}
constexpr void simple_after_init_suspend(coroutine_state_base * vstate) {
auto * state = static_cast<__coroutine_state *>(vstate);
// BEFORE CUSTOM CO_AWAIT
try {
(void)state->initial_suspend.await_resume();
std::destroy_at(&state->initial_suspend_result); // lifetime of temporary must be destroyed
// normal body
state->tmp = normal_function(state->argument);
if (!state->tmp.await_ready()) {
state->resume_point = &simple_after_first_await;
// tail-recursion jump to next coroutine or continue
NEXT-COROUTINE(state->tmp.await_suspend(std::coroutine_handle{state}));
}
return simple_after_first_await(state);
} catch (...) {
std::destroy_at(&state->tmp);
state->promise.unhandled_exception();
return simple_final_suspend(state);
}
}
constexpr void simple_after_first_await(coroutine_state_base * vstate) {
auto * state = static_cast<__coroutine_state *>(vstate);
// RESUMED or just jumped after CUSTOM CO_AWAIT
try {
// get result (which doesn't survive suspension, so it can be put here on stack)
auto result = state->tmp.await_resume();
// co_return result
state->promise.return_value(result);
// end of scope for state->tmp
std::destroy_at(&state->tmp);
} catch (...) {
std::destroy_at(&state->tmp);
state->promise.unhandled_exception();
}
// jump to final suspend
return simple_final_suspend(state);
}
constexpr void simple_final_suspend(coroutine_state_base * vstate) {
auto * state = static_cast<__coroutine_state *>(vstate);
// FINAL SUSPEND
state->final_suspend_result = state->promise.final_suspend();
if (!state->final_suspend_result.await_ready()) {
state->resume_point = nullptr;
// tail-recursion jump to next coroutine or continue
NEXT-COROUTINE(state->final_suspend_result.await_suspend(std::coroutine_handle{state}));
}
// destruction of state after final-suspend
delete &state;
// no-return ... which will return us to our resumer
}
Whole transformation is just changing all edges to standalone void
returning function between which can be jumped with a tail recursion. Each .await_suspend(coroutine_handle)
can return handle to next coroutine, boolean with information if it should be a suspend or not, or just void, which means simple suspend.
// NEXT-COROUTINE(EXPR) is:
using type = decltype(EXPR);
if constexpr (std::is_void_v<type>) {
EXPR;
return; // return to caller
} else if constexpr (std::same_as<type, bool>) {
if (EXPR) {
return; // return to caller
}
// continue in coroutine
} else if constexpr (std::convertible_to<std::coroutine_handle<void>>) {
auto * next_state = static_cast<coroutine_state_base *>(E(XPR).address());
return next_state->resume_point(next_state); // tail-recursion
}
Most of the transformation is lifetime handling and splitting at each co_await
. The suspend transformation is already part of compilers so await_ready()
, await_suspend(std::coroutine_handle)
, await_resume()
is already part of each compiler.
Loops transformation
constexpr coroutine loops() noexcept {
while (co_await check()) {
another_function();
}
}
Code above is transformed into:
constexpr void loops_main(coroutine_state_base * vstate) noexcept {
// initial suspend skipped for readability
auto * state = static_cast<__coroutine_state *>(vstate);
state->tmp = check();
if (!state->tmp.await_ready()) {
state->resume_point = &loops_main_body;
NEXT-COROUTINE(state->tmp.await_suspend(std::coroutine_handle{state}));
}
return loops_main_body(state); // tail-recursion
}
constexpr void loops_main_body(coroutine_handle_base * vstate) noexcept {
auto * state = static_cast<__coroutine_state *>(vstate);
if (state->tmp.await_resume()) {
another_function();
return loops_main(state); // tail-recursion
}
std::destroy_at(&state->tmp);
return loops_final_suspend(state);
}
Tail-recursion
To implement constexpr coroutines with the AST transformation a tail recursion is needed to be supported in constant evaluator.
Comparison
Implementation | Storage for | Area needed to be changed | |
---|---|---|---|
Variables | State | ||
Stackfull coroutines | overlay over local variables | fiber (alternative stack on a heap) | builtins to control coroutine flow, execution state to keep track of allocations |
Byte-code | coroutine frame | instruction pointer | control of instruction pointer |
Stackless coroutines | overlay over local variables | suspended coroutines | convert all AST walking functions into coroutines |
AST transformation | instance of coroutine state | function pointer | custom lifetime handling needs to be added for temporaries / variables across suspend points |
Impact on existing code
None, this is a pure extension, it allows code to be constexpr which wasn't case before.
Intention for wording changes
Remove all obstacles blocking coroutines from being constant evaluatable. Make sure all coroutines are destroyed at end of constant evaluation.
Proposed wording changes
7.7 Constant expressions [expr.const]
- this ([expr.prim.this]), except
- in a constexpr function ([dcl.constexpr]) that is being evaluated as part of E or
- when appearing as the postfix-expression of an implicit or explicit class member access expression ([expr.ref]);
- a control flow that passes through
a declaration of a block variable ([basic.scope.block]) with
static ([basic.stc.static]) or
thread ([basic.stc.thread]) storage duration,
unless that variable is usable in constant expressions;
[Example 1: constexpr char test() { static const int x = 5; static constexpr char c[] = "Hello World"; return *(c + x); } static_assert(' ' == test()); — end example]
- a construction of a coroutine promise object ([dcl.fct.def.coroutine]), unless the coroutine promise object is destroyed within the evaluation of E;
- an invocation of a non-constexpr function;68
- an invocation of an undefined constexpr function;
- an invocation of an instantiated constexpr function that is not constexpr-suitable;
- an invocation of a virtual function ([class.virtual]) for an object whose dynamic type is constexpr-unknown;
- a call to an instance of std::allocator<T>::allocate ([allocator.members]), unless the allocated storage is deallocated within the evaluation of E;
- a call to an instance of std::allocator<T>::deallocate ([allocator.members]), unless it deallocates a region of storage allocated within the evaluation of E;
- an await-expression ([expr.await]);
- a yield-expression ([expr.yield]);
- a three-way comparison ([expr.spaceship]), relational ([expr.rel]), or equality ([expr.eq]) operator where the result is unspecified;
- a throw-expression ([expr.throw]);
9.2.6 The constexpr and consteval specifiers [dcl.constexpr]
- it is not a coroutine ([dcl.fct.def.coroutine]), and
- if the function it is a constructor or destructor whose class has its class does not have any virtual base classes.
- an invocation of a constexpr function can appear in a constant expression ([expr.const]) and
- copy elision is not performed in a constant expression ([class.copy.elision]).
9.5.4 Coroutine definitions [dcl.fct.def.coroutine]
- If the search finds any declarations, overload resolution is performed on a function call created by assembling an argument list.The first argument is the amount of space requested, and is a prvalue of type std::size_t.The lvalues are the successive arguments.If no viable function is found ([over.match.viable]), overload resolution is performed again on a function call created by passing just the amount of space required as a prvalue of type std::size_t.
17.12 Coroutines [support.coroutine]
17.12.1 General [support.coroutine.general]
17.12.2 Header <coroutine> synopsis [coroutine.syn]
17.12.3 Coroutine traits [coroutine.traits]
17.12.3.1 General [coroutine.traits.general]
17.12.3.2 Class template coroutine_traits [coroutine.traits.primary]
17.12.4 Class template coroutine_handle [coroutine.handle]
17.12.4.1 General [coroutine.handle.general]
17.12.4.2 Construct/reset [coroutine.handle.con]
constexpr coroutine_handle() noexcept;
constexpr coroutine_handle(nullptr_t) noexcept;
static constexpr coroutine_handle from_promise(Promise& p);
constexpr coroutine_handle& operator=(nullptr_t) noexcept;
17.12.4.3 Conversion [coroutine.handle.conv]
constexpr operator coroutine_handle<>() const noexcept;
17.12.4.4 Export/import [coroutine.handle.export.import]
constexpr void* address() const noexcept;
static constexpr coroutine_handle<> coroutine_handle<>::from_address(void* addr);
static constexpr coroutine_handle<Promise> coroutine_handle<Promise>::from_address(void* addr);
17.12.4.5 Observers [coroutine.handle.observers]
constexpr explicit operator bool() const noexcept;
constexpr bool done() const;
17.12.4.6 Resumption [coroutine.handle.resumption]
constexpr void operator()() const;
constexpr void resume() const;
constexpr void destroy() const;
17.12.4.7 Promise access [coroutine.handle.promise]
constexpr Promise& promise() const;
17.12.4.8 Comparison operators [coroutine.handle.compare]
constexpr bool operator==(coroutine_handle<> x, coroutine_handle<> y) noexcept;
constexpr strong_ordering operator<=>(coroutine_handle<> x, coroutine_handle<> y) noexcept;
17.12.4.9 Hash support [coroutine.handle.hash]
template<class P> struct hash<coroutine_handle<P>>;
17.12.5 No-op coroutines [coroutine.noop]
17.12.5.1 Class noop_coroutine_promise [coroutine.promise.noop]
struct noop_coroutine_promise {};
17.12.5.2 Class coroutine_handle<noop_coroutine_promise> [coroutine.handle.noop]
17.12.5.2.1 General [coroutine.handle.noop.general]
17.12.5.2.2 Conversion [coroutine.handle.noop.conv]
constexpr operator coroutine_handle<>() const noexcept;
17.12.5.2.3 Observers [coroutine.handle.noop.observers]
constexpr explicit operator bool() const noexcept;
constexpr bool done() const noexcept;
17.12.5.2.4 Resumption [coroutine.handle.noop.resumption]
constexpr void operator()() const noexcept;
constexpr void resume() const noexcept;
constexpr void destroy() const noexcept;
17.12.5.2.5 Promise access [coroutine.handle.noop.promise]
constexpr noop_coroutine_promise& promise() const noexcept;
17.12.5.2.6 Address [coroutine.handle.noop.address]
constexpr void* address() const noexcept;
17.12.5.3 Function noop_coroutine [coroutine.noop.coroutine]
constexpr noop_coroutine_handle noop_coroutine() noexcept;
17.12.6 Trivial awaitables [coroutine.trivial.awaitables]
Feature test macros
15.11 Predefined macro names [cpp.predefined]
__cpp_constexpr_coroutines 2025??L
17.3.2 Header <version> synopsis [version.syn]
#define __cpp_lib_constexpr_coroutines 2025??L // also in <coroutine>