define_static_string
and define_static_array
Document #: | P3491R0 [Latest] [Status] |
Date: | 2024-11-05 |
Project: | Programming Language C++ |
Audience: |
LEWG |
Reply-to: |
Wyatt Childers <wcc@edg.com> Peter Dimov <pdimov@gmail.com> Dan Katz <dkatz85@bloomberg.net> Barry Revzin <barry.revzin@gmail.com> Daveed Vandevoorde <daveed@edg.com> |
These functions were originally proposed as part of [P2996R7], but are being split off into their own paper.
There are situations where it is useful to take a string (or array) from compile time and promote it to static storage for use at runtime. We currently have neither:
If we had non-transient constexpr allocation, we could just directly
declare a static constexpr variable. And if we could use these container
types like
std::string
and std::vector<T>
as non-type template parameter types, then we would use those directly
too.
But until we have such a language solution, people have over time come up with their own workarounds. For instance, Jason Turner in a recent talk presents what he calls the “constexpr two-step.” It’s a useful pattern, although limited and cumbersome (it also requires specifying a maximum capacity).
Similarly, the lack of general support for non-type template
parameters means we couldn’t have a
std::string
template parameter (even if we had non-transient constexpr allocation),
but promoting the contents of a string to an external linkage, static
storage duration array of const char
means that you can use a pointer to that array as a non-type template
parameter just fine.
So having facilities to solve these problems until the general language solution arises is very valuable.
This paper proposes two new additions — std::define_static_string
and std::define_static_array
,
as well as a helper function for dealing with string literals:
namespace std { consteval auto is_string_literal(char const* p) -> bool; consteval auto is_string_literal(char8_t const* p) -> bool; template <ranges::input_range R> // only if the value_type is char or char8_t consteval auto define_static_string(R&& r) -> ranges::range_value_t<R> const*; template <ranges::input_range R> consteval auto define_static_array(R&& r) -> span<ranges::range_value_t<R> const>; }
is_string_literal
takes a pointer
to either char const
or char8_t const
.
If it’s a pointer to either a string literal
V
or a subobject thereof, these
functions return
true
.
Otherwise, they return
false
. Note
that we can’t necessarily return a pointer to the start of the string
literal because in the case of overlapping string literals — how do you
know which pointer to return?
define_static_string
is limited
to ranges over
char
or
char8_t
and
returns a char const*
or char8_t const*
,
respectively. They return a pointer instead of a
string_view
(or
u8string_view
) specifically to make
it clear that they return something null terminated. If
define_static_string
is passed a
string literal that is already null-terminated, it will not be doubly
null terminated.
define_static_array
exists to
handle the general case for other types, and now has to return a
span
so the caller would have any
idea how long the result is. This function requires that the underlying
type T
be copyable, but does not
mandate structural.
Technically, define_static_array
can be used to implement
define_static_string
:
consteval auto define_static_string(string_view str) -> char const* { return define_static_array(views::concat(str, views::single('\0'))).data(); }
But that’s a fairly awkward implementation, and the string use-case is sufficiently common as to merit a more ergonomic solution.
Consider the existence of template <char const*> struct C;
and the following two translation units:
TU #1 | TU #2 |
---|---|
|
|
In the specification in [P2996R7], the results of
define_static_string
were allowed to
overlap. That is, a possible result of this program could be:
TU #1 | TU #2 |
---|---|
|
|
This means whether c2
and
c4
have the same type is
unspecified. They could have the same type if the implementation chooses
to not overlap (or no overlap is possible). Or they could have different
types.
They would have the same type if the implementation produced a distinct array for each value, more like this (as suggested by [P0424R2]):
TU #1 | TU #2 |
---|---|
|
|
We think the value of ensuring template argument equivalence is more valuable than the potential size savings with overlap. So this paper ensures this.
For define_static_array
, if the
underlying type T
is not structural,
this isn’t actually feasible: how would we know how to return the same
array? If T
is structural, we can
easily ensure that equal invocations produce the same
span
result.
But if T
is not structural, we
have a problem, because
T*
is,
regardless. So we have to answer the question of what to do with:
template <auto V> struct C { }; <define_static_array(r).data()> c1; C<define_static_array(r).data()> c2; C
Either:
c1
and
c2
have the same type.define_static_array
works, but the resulting pointer is not usable as a non-type template
argument (in the same way that string literals are not).define_static_array
mandates that the underlying type is structural.None of these options is particularly appealing. The last prevents
some very motivating use-cases since neither
span
nor
string_view
are structural types
yet, which means you cannot reify a vector<string>
into a span<string_view>
,
but hopefully that can be resolved soon ([P3380R0]). You can at least reify it
into a span<char const*>
?
For now, this paper proposes the last option, as it’s the simplest (and the relative cost will hopefully decrease over time). Allowing the call but rejecting use as non-type template parameters is appealing though.
define_static_string
can be
nearly implemented with the facilities in [P2996R7], we just need
is_string_literal
to handle the
different signature proposed in this paper.
define_static_array
for
structural types is similar, but for non-structural types requires
compiler intrinsic:
template <auto V> inline constexpr auto __array = V.data(); template <size_t N, class T, class R> consteval auto define_static_string_impl(R& r) -> T const* { <T, N+1> arr; array::copy(r, arr.data()); ranges[N] = '\0'; // null terminator arrreturn extract<T const*>(substitute(^^__array, {meta::reflect_value(arr)})); } template <ranges::input_range R> consteval auto define_static_string(R&& r) -> ranges::range_value_t<R> const* { using T = ranges::range_value_t<R>; static_assert(std::same_as<T, char> or std::same_as<T, char8_t>); if constexpr (not ranges::forward_range<R>) { return define_static_string(ranges::to<std::vector>(r)); } else { if constexpr (requires { is_string_literal(r); }) { // if it's an array, check if it's a string literal and adjust accordingly if (is_string_literal(r)) { return define_static_string(basic_string_view(r)); } } auto impl = extract<auto(*)(R&) -> T const*>( (^^define_static_string_impl, substitute{ ::reflect_value(ranges::distance(r)), meta^^T, (^^R) remove_reference})); return impl(r); } }
Demo.
Note that this implementation gives the guarantee we talked about in
the previous section. Two
invocations of define_static_string
with the same contents will both end up returning a pointer into the
same specialization of the (extern linkage) variable template __array<V>
.
We rely on the mangling of V
(and
std::array
is a structural type if T
is, which
char
and
char8_t
are)
to ensure this for us.
template <const char *P> struct C { }; const char msg[] = "strongly in favor"; // just an idea.. <msg> c1; // ok C<"nope"> c2; // ill-formed C<define_static_string("yay")> c3; // ok C
In the absence of general support for non-transient constexpr allocation, such a facility is essential to building utilities like pretty printers.
An example of such an interface might be built as follow:
template <std::meta::info R> requires is_value(R) consteval auto render() -> std::string; template <std::meta::info R> requires is_type(R) consteval auto render() -> std::string; template <std::meta::info R> requires is_variable(R) consteval auto render() -> std::string; // ... template <std::meta::info R> consteval auto pretty_print() -> std::string_view { return define_static_string(render<R>()); }
This strategy lies
at the core of how the Clang/P2996 fork builds its example
implementation of the
display_string_of
metafunction.
In the Jason Turner talk cited earlier, he demonstrates an example of taking a
function that produces a vector<string>
and promoting that into static storage, in a condensed way so that the
function
constexpr std::vector<std::string> get_strings() { return {"Jason", "Was", "Here"}; }
Gets turned into an array of string views. We could do that fairly
straightforwardly, without even needing to take the function get_strings()
as a
template parameter:
consteval auto promote_strings(std::vector<std::string> vs) -> std::span<std::string_view const> { // promote the concatenated strings to static storage ::string_view promoted = std::define_static_string( std::ranges::fold_left(vs, std::string(), std::plus())); std // now build up all our string views into promoted ::vector<std::string_view> views; stdfor (size_t offset = 0; std::string const& s : vs) { .push_back(promoted.substr(offset, s.size())); views+= s.size(); offset } // promote our array of string_views return std::define_static_array(views); } constexpr auto views = promote_strings(get_strings());
Or at least, this will work once
string_view
becomes structural.
Until then, this can be worked around with a
structural_string_view
type that
just has public members for the data and length with an implicit
conversion to string_view
.
Something like this ([P1306R2]) is not doable without non-transient constexpr allocation :
constexpr auto f() -> std::vector<int> { return {1, 2, 3}; } consteval void g() { template for (constexpr int I : f()) { // doesn't work } }
But if we promote the contents of
f()
first,
then this would work fine:
consteval void g() { template for (constexpr int I : define_static_array(f())) { // ok! } }
A number of other papers have been brought up as being related to this problem, so let’s just enumerate them.
std::basic_fixed_string<char, N>
.
It exists to solve the problem that C<"hello">
needs support right now. Nothing in this paper would make C<"hello">
work, although it might affect the way that you would implement the type
that makes it work.std::string
usable as a non-type template parameter. But without non-transient
constexpr allocation, this doesn’t obviate the need for this paper.Given non-transient allocation and a
std::string
and
std::vector
that are usable as non-type template parameters, this paper likely
becomes unnecessary. Or at least, fairly trivial:
template <auto V> inline constexpr auto __S = V.c_str(); template <ranges::input_range R> consteval auto define_static_string(R&& r) -> ranges::range_value_t<R> const* { using T = ranges::range_value_t<R>; static_assert(std::same_as<T, char> or std::same_as<T, char8_t>); auto S = ranges::to<basic_string<T>>(r); return extract<T const*>(substitute(^^__S, {meta::reflect_value(S)})); }
The more interesting paper is actually [P0424R2]. If we bring that paper back,
then extend the normalization model described in [P3380R0] so that string literals are
normalized to external linkage arrays as demonstrated in this paper,
then it’s possible that [P3094R5] becomes obsolete instead —
since then you could just take char const*
template parameters and
define_static_string
would become a
mechanism for producing new string literals.
Add to [meta.syn]:
namespace std {+ // [meta.string.literal], checking string literals + consteval bool is_string_literal(const char* p); + consteval bool is_string_literal(const char8_t* p); + // [meta.define.static], promoting to runtime storage + template <ranges::input_range R> + consteval const ranges::range_value_t<R>* define_static_string(R&& r); + template <ranges::input_range R> + consteval span<const ranges::range_value_t<R>> define_static_array(R&& r); }
Add to the new clause [meta.string.literal]:
consteval bool is_string_literal(const char* p); consteval bool is_string_literal(const char8_t* p);
1 Returns: If
p
points to a string literal or a subobject thereof,true
. Otherwise,false
.
Add to the new clause [meta.define.static]
1 The functions in this clause are useful for promoting compile-time storage into runtime storage.
template <ranges::input_range R> consteval const ranges::range_value_t<R>* define_static_string(R&& r);
2 Let
CharT
beranges::range_value_t<R>
.3 Mandates:
CharT
is eitherchar
orchar8_t
.4 Let
Str
be the variable templatetemplate <class T, T... Vs> inline constexpr T Str[] = {Vs..., T{}}; // exposition-only
5 Let
V
be the pack of elements of typeCharT
inr
. Ifr
is a string literal, thenV
does not include the trailing null terminator ofr
.6 Returns:
Str<CharT, V...>
.template <ranges::input_range R> consteval span<const ranges::range_value_t<R>> define_static_array(R&& r);
7 Let
T
beranges::range_value_t<R>
.8 Mandates:
T
is a structural type ([temp.param]) andconstructible_from<T, ranges::range_reference_t<R>>
istrue
andcopy_constructible<T>
istrue
.9 Let
Arr
be the variable templatetemplate <class T, T... Vs> inline constexpr T Arr[] = {Vs...}; // exposition-only
10 Let
V
be the pack of elements of typeT
constructed from the elements ofr
.11 Returns:
span(Arr<T, V...>)
.
Add to 17.3.2 [version.syn]:
#define __cpp_lib_define_static 2024XX // freestanding, also in <meta>