define_static_string and define_static_array

Document #: P3491R0 [Latest] [Status]
Date: 2024-11-05
Project: Programming Language C++
Audience: LEWG
Reply-to: Wyatt Childers
<>
Peter Dimov
<>
Dan Katz
<>
Barry Revzin
<>
Daveed Vandevoorde
<>

1 Introduction

These functions were originally proposed as part of [P2996R7], but are being split off into their own paper.

There are situations where it is useful to take a string (or array) from compile time and promote it to static storage for use at runtime. We currently have neither:

If we had non-transient constexpr allocation, we could just directly declare a static constexpr variable. And if we could use these container types like std::string and std::vector<T> as non-type template parameter types, then we would use those directly too.

But until we have such a language solution, people have over time come up with their own workarounds. For instance, Jason Turner in a recent talk presents what he calls the “constexpr two-step.” It’s a useful pattern, although limited and cumbersome (it also requires specifying a maximum capacity).

Similarly, the lack of general support for non-type template parameters means we couldn’t have a std::string template parameter (even if we had non-transient constexpr allocation), but promoting the contents of a string to an external linkage, static storage duration array of const char means that you can use a pointer to that array as a non-type template parameter just fine.

So having facilities to solve these problems until the general language solution arises is very valuable.

2 Proposal

This paper proposes two new additions — std::define_static_string and std::define_static_array, as well as a helper function for dealing with string literals:

namespace std {
  consteval auto is_string_literal(char const* p) -> bool;
  consteval auto is_string_literal(char8_t const* p) -> bool;

  template <ranges::input_range R> // only if the value_type is char or char8_t
  consteval auto define_static_string(R&& r) -> ranges::range_value_t<R> const*;

  template <ranges::input_range R>
  consteval auto define_static_array(R&& r) -> span<ranges::range_value_t<R> const>;
}

is_string_literal takes a pointer to either char const or char8_t const. If it’s a pointer to either a string literal V or a subobject thereof, these functions return true. Otherwise, they return false. Note that we can’t necessarily return a pointer to the start of the string literal because in the case of overlapping string literals — how do you know which pointer to return?

define_static_string is limited to ranges over char or char8_t and returns a char const* or char8_t const*, respectively. They return a pointer instead of a string_view (or u8string_view) specifically to make it clear that they return something null terminated. If define_static_string is passed a string literal that is already null-terminated, it will not be doubly null terminated.

define_static_array exists to handle the general case for other types, and now has to return a span so the caller would have any idea how long the result is. This function requires that the underlying type T be copyable, but does not mandate structural.

Technically, define_static_array can be used to implement define_static_string:

consteval auto define_static_string(string_view str) -> char const* {
  return define_static_array(views::concat(str, views::single('\0'))).data();
}

But that’s a fairly awkward implementation, and the string use-case is sufficiently common as to merit a more ergonomic solution.

2.1 To Overlap or Not To Overlap

Consider the existence of template <char const*> struct C; and the following two translation units:

TU #1 TU #2
C<define_static_string("dedup")> c1;
C<define_static_string("dup")> c2;
C<define_static_string("holdup")> c3;
C<define_static_string("dup")> c4;

In the specification in [P2996R7], the results of define_static_string were allowed to overlap. That is, a possible result of this program could be:

TU #1 TU #2
inline char const __arr_dedup[] = "dedup";
C<__arr_dedup> c1;
C<__arr_dedup + 2> c2;
inline char const __arr_holdup[] = "holdup";
C<__arr_holdup> c3;
C<__arr_holdup + 3> c4;

This means whether c2 and c4 have the same type is unspecified. They could have the same type if the implementation chooses to not overlap (or no overlap is possible). Or they could have different types.

They would have the same type if the implementation produced a distinct array for each value, more like this (as suggested by [P0424R2]):

TU #1 TU #2
inline char const __arr_dedup[] = "dedup";
inline char const __arr_dup[] = "dup";
C<__arr_dedup> c1;
C<__arr_dup> c2;
inline char const __arr_holdup[] = "holdup";
inline char const __arr_dup[] = "dup";
C<__arr_holdup> c3;
C<__arr_dup> c4;

We think the value of ensuring template argument equivalence is more valuable than the potential size savings with overlap. So this paper ensures this.

For define_static_array, if the underlying type T is not structural, this isn’t actually feasible: how would we know how to return the same array? If T is structural, we can easily ensure that equal invocations produce the same span result.

But if T is not structural, we have a problem, because T* is, regardless. So we have to answer the question of what to do with:

template <auto V> struct C { };

C<define_static_array(r).data()> c1;
C<define_static_array(r).data()> c2;

Either:

None of these options is particularly appealing. The last prevents some very motivating use-cases since neither span nor string_view are structural types yet, which means you cannot reify a vector<string> into a span<string_view>, but hopefully that can be resolved soon ([P3380R0]). You can at least reify it into a span<char const*>?

For now, this paper proposes the last option, as it’s the simplest (and the relative cost will hopefully decrease over time). Allowing the call but rejecting use as non-type template parameters is appealing though.

2.2 Possible Implementation

define_static_string can be nearly implemented with the facilities in [P2996R7], we just need is_string_literal to handle the different signature proposed in this paper.

define_static_array for structural types is similar, but for non-structural types requires compiler intrinsic:

template <auto V>
inline constexpr auto __array = V.data();

template <size_t N, class T, class R>
consteval auto define_static_string_impl(R& r) -> T const* {
    array<T, N+1> arr;
    ranges::copy(r, arr.data());
    arr[N] = '\0'; // null terminator
    return extract<T const*>(substitute(^^__array, {meta::reflect_value(arr)}));
}

template <ranges::input_range R>
consteval auto define_static_string(R&& r) -> ranges::range_value_t<R> const* {
    using T = ranges::range_value_t<R>;
    static_assert(std::same_as<T, char> or std::same_as<T, char8_t>);

    if constexpr (not ranges::forward_range<R>) {
        return define_static_string(ranges::to<std::vector>(r));
    } else {
        if constexpr (requires { is_string_literal(r); }) {
            // if it's an array, check if it's a string literal and adjust accordingly
            if (is_string_literal(r)) {
                return define_static_string(basic_string_view(r));
            }
        }

        auto impl = extract<auto(*)(R&) -> T const*>(
            substitute(^^define_static_string_impl,
                       {
                           meta::reflect_value(ranges::distance(r)),
                           ^^T,
                           remove_reference(^^R)
                       }));
        return impl(r);
    }
}

Demo.

Note that this implementation gives the guarantee we talked about in the previous section. Two invocations of define_static_string with the same contents will both end up returning a pointer into the same specialization of the (extern linkage) variable template __array<V>. We rely on the mangling of V (and std::array is a structural type if T is, which char and char8_t are) to ensure this for us.

2.3 Examples

2.3.1 Use as non-type template parameter

template <const char *P> struct C { };

const char msg[] = "strongly in favor";  // just an idea..

C<msg> c1;                          // ok
C<"nope"> c2;                       // ill-formed
C<define_static_string("yay")> c3;  // ok

2.3.2 Pretty-printing

In the absence of general support for non-transient constexpr allocation, such a facility is essential to building utilities like pretty printers.

An example of such an interface might be built as follow:

template <std::meta::info R> requires is_value(R)
  consteval auto render() -> std::string;

template <std::meta::info R> requires is_type(R)
  consteval auto render() -> std::string;

template <std::meta::info R> requires is_variable(R)
  consteval auto render() -> std::string;

// ...

template <std::meta::info R>
consteval auto pretty_print() -> std::string_view {
  return define_static_string(render<R>());
}

This strategy lies at the core of how the Clang/P2996 fork builds its example implementation of the display_string_of metafunction.

2.3.3 Promoting Containers

In the Jason Turner talk cited earlier, he demonstrates an example of taking a function that produces a vector<string> and promoting that into static storage, in a condensed way so that the function

constexpr std::vector<std::string> get_strings() {
    return {"Jason", "Was", "Here"};
}

Gets turned into an array of string views. We could do that fairly straightforwardly, without even needing to take the function get_strings() as a template parameter:

consteval auto promote_strings(std::vector<std::string> vs)
    -> std::span<std::string_view const>
{
    // promote the concatenated strings to static storage
    std::string_view promoted = std::define_static_string(
        std::ranges::fold_left(vs, std::string(), std::plus()));

    // now build up all our string views into promoted
    std::vector<std::string_view> views;
    for (size_t offset = 0; std::string const& s : vs) {
        views.push_back(promoted.substr(offset, s.size()));
        offset += s.size();
    }

    // promote our array of string_views
    return std::define_static_array(views);
}

constexpr auto views = promote_strings(get_strings());

Or at least, this will work once string_view becomes structural. Until then, this can be worked around with a structural_string_view type that just has public members for the data and length with an implicit conversion to string_view.

2.3.4 With Expansion Statements

Something like this ([P1306R2]) is not doable without non-transient constexpr allocation :

constexpr auto f() -> std::vector<int> { return {1, 2, 3}; }

consteval void g() {
    template for (constexpr int I : f()) {
        // doesn't work
    }
}

But if we promote the contents of f() first, then this would work fine:

consteval void g() {
    template for (constexpr int I : define_static_array(f())) {
        // ok!
    }
}

A number of other papers have been brought up as being related to this problem, so let’s just enumerate them.

Given non-transient allocation and a std::string and std::vector that are usable as non-type template parameters, this paper likely becomes unnecessary. Or at least, fairly trivial:

template <auto V>
inline constexpr auto __S = V.c_str();

template <ranges::input_range R>
consteval auto define_static_string(R&& r) -> ranges::range_value_t<R> const* {
    using T = ranges::range_value_t<R>;
    static_assert(std::same_as<T, char> or std::same_as<T, char8_t>);

    auto S = ranges::to<basic_string<T>>(r);
    return extract<T const*>(substitute(^^__S, {meta::reflect_value(S)}));
}

The more interesting paper is actually [P0424R2]. If we bring that paper back, then extend the normalization model described in [P3380R0] so that string literals are normalized to external linkage arrays as demonstrated in this paper, then it’s possible that [P3094R5] becomes obsolete instead — since then you could just take char const* template parameters and define_static_string would become a mechanism for producing new string literals.

3 Wording

Add to [meta.syn]:

namespace std {
+ // [meta.string.literal], checking string literals
+ consteval bool is_string_literal(const char* p);
+ consteval bool is_string_literal(const char8_t* p);

+ // [meta.define.static], promoting to runtime storage
+ template <ranges::input_range R>
+   consteval const ranges::range_value_t<R>* define_static_string(R&& r);

+  template <ranges::input_range R>
+    consteval span<const ranges::range_value_t<R>> define_static_array(R&& r);
}

Add to the new clause [meta.string.literal]:

consteval bool is_string_literal(const char* p);
consteval bool is_string_literal(const char8_t* p);

1 Returns: If p points to a string literal or a subobject thereof, true. Otherwise, false.

Add to the new clause [meta.define.static]

1 The functions in this clause are useful for promoting compile-time storage into runtime storage.

template <ranges::input_range R>
consteval const ranges::range_value_t<R>* define_static_string(R&& r);

2 Let CharT be ranges::range_value_t<R>.

3 Mandates: CharT is either char or char8_t.

4 Let Str be the variable template

template <class T, T... Vs> inline constexpr T Str[] = {Vs..., T{}}; // exposition-only

5 Let V be the pack of elements of type CharT in r. If r is a string literal, then V does not include the trailing null terminator of r.

6 Returns: Str<CharT, V...>.

template <ranges::input_range R>
consteval span<const ranges::range_value_t<R>> define_static_array(R&& r);

7 Let T be ranges::range_value_t<R>.

8 Mandates: T is a structural type ([temp.param]) and constructible_from<T, ranges::range_reference_t<R>> is true and copy_constructible<T> is true.

9 Let Arr be the variable template

template <class T, T... Vs> inline constexpr T Arr[] = {Vs...}; // exposition-only

10 Let V be the pack of elements of type T constructed from the elements of r.

11 Returns: span(Arr<T, V...>).

3.1 Feature-Test Macro

Add to 17.3.2 [version.syn]:

#define __cpp_lib_define_static 2024XX // freestanding, also in <meta>

4 References

[P0424R2] Louis Dionne, Hana Dusíková. 2017-11-14. String literals as non-type template parameters.
https://wg21.link/p0424r2
[P1306R2] Dan Katz, Andrew Sutton, Sam Goodrick, Daveed Vandevoorde. 2024-05-07. Expansion statements.
https://wg21.link/p1306r2
[P1974R0] Jeff Snyder, Louis Dionne, Daveed Vandevoorde. 2020-05-15. Non-transient constexpr allocation using propconst.
https://wg21.link/p1974r0
[P2484R0] Richard Smith. 2021-11-17. Extending class types as non-type template parameters.
https://wg21.link/p2484r0
[P2670R1] Barry Revzin. 2023-02-03. Non-transient constexpr allocation.
https://wg21.link/p2670r1
[P2996R7] Barry Revzin, Wyatt Childers, Peter Dimov, Andrew Sutton, Faisal Vali, Daveed Vandevoorde, Dan Katz. 2024-10-13. Reflection for C++26.
https://wg21.link/p2996r7
[P3094R5] Mateusz Pusz. 2024-10-15. std::basic_fixed_string.
https://wg21.link/p3094r5
[P3380R0] Barry Revzin. 2024-09-10. Extending support for class types as non-type template parameters.
https://wg21.link/p3380r0