Sequential hexadecimal digits

Document number:
D4039R0
Date:
2026-02-28
Audience:
SG16
Project:
ISO/IEC 14882 Programming Languages — C++, ISO/IEC JTC1/SC22/WG21
Reply-to:
Jan Schultke <janschultke@gmail.com>
GitHub Issue:
wg21.link/P4039/github
Source:
github.com/eisenwave/cpp-proposals/blob/master/src/sequential-hex-digits.cow

C2y provides the guarantee that the blocks of characters 'a'..'f' and 'A'..'F' are contiguous. C++ should inherit the same guarantee.

Contents

1

Introduction

2

Motivation

3

Design

4

Impact

5

Wording

6

References

1. Introduction

Among other guarantees, [lex.charset] paragraph 5 ensures the following:

The code unit value of each decimal digit character after the digit 0 (U+0030) is one greater than the value of the previous.

The guarantee is useful because it allows

Unfortunately, no similar guarantee is provided for other characters, which makes it non-portable to e.g. test whether a character c is a lower-case letter using c >= 'a' && c <= 'z'. This test only works for ASCII-compatible encodings such as UTF-8, and C++ supports encodings such as EBCDIC, where letters are not contiguous.

However, [N3192] Sequential Hexdigits observed that even in EBCDIC, there are blocks of 8 or 9 contiguous letters in the invariant subset. That is, 'a'..'i', 'j'..'r', and 's'..'z' are contiguous blocks; this is analogous for upper-case letters. [N3192] has been merged into the C2y draft, providing the guarantee that 'a'..'f' is a contiguous block, which is not quite as strong as EBCDIC would allow. At least the C2y guarantee should be provided in C++ as well.

2. Motivation

Having the guarantee that the letters 'a'..'f' form a contiguous block is mainly useful for working with hexadecimal digits.

The following function can be used to portably compute the integer value of a hexadecimal digit in C2y, but not in C++:

int hex_digit_value(char c) { return c >= '0' && c <= '9' ? c - '0' // hex_digit_value('9') → 9 : c >= 'a' && c <= 'f' ? c - 'a' + 10 // hex_digit_value('f') → 15 : c >= 'A' && c <= 'F' ? c - 'A' + 10 // hex_digit_value('F') → 15 : -1; // hex_digit_value('?') → -1 }

There is likely a large amount of C++ code which relies on the contiguity of that range already, possibly unaware or disinterested in the lack of portability. Hexadecimal digits letters are uniquely interesting because of how frequently they are used and how obviously useful contiguity is.

There is substantially less motivation for the other two EBCDIC letter blocks. While the guarantee allows implementing a test for whether a char is a lowercase letter using three range checks, such an implementation would likely perform worse than a bitset lookup anyway. Locale-specific character tests should either be done using standard library functions, or the user should static_assert that their encoding is ASCII-based.

3. Design

The proposed change is to guarantee that the blocks of letters 'a'..'f' and 'A'..'F' are contiguous, similar to '0'..'9'.

Due to the lack of motivation mentioned above, and out of caution not to provide more guarantees than C2y, no guarantee for other blocks of letters is proposed, despite EBCDIC seemingly allowing for a stronger guarantee.

4. Impact

To my understanding and to the understanding of WG14, there is no change in behavior to existing code, nor is any implementation affected. The proposed change is for all intents and purposes on paper.

However, the proposed change makes it impossible to create a hypothetical future C++ implementation where 'a'..'f' is not contiguous. This seems like an acceptable sacrifice, especially considering that such a C++ implementation would be incompatible with C.

5. Wording

The changes are relative to [N5032].

A feature-test macro is deliberately omitted.

Change [lex.charset] paragraph 5 as follows:

A literal encoding or a locale-specific encoding of one of the execution character sets ([character.seq]) encodes each element of the basic literal character set as a single code unit with non-negative value, distinct from the code unit for any other such element.

[Note: A character not in the basic literal character set can be encoded with more than one code unit; the value of such a code unit can be the same as that of a code unit for an element of the basic literal character set. — end note]

The U+0000 NULL character is encoded as the value 0. No other element of the translation character set is encoded with a code unit of value 0. The code unit values of each decimal digit character after the digit 0 (U+0030) is one greater than the value of the previous. of characters in any of the ranges 0..9 (U+0030..U+0039), A..F (U+0041..U+0046), or a..f (U+0061..U+0066) are contiguous and ascending. The ordinary and wide literal encodings are otherwise implementation-defined. For a UTF-8, UTF-16, or UTF-32 literal, the implementation shall encode the Unicode scalar value corresponding to each character of the translation character set as specified in the Unicode Standard for the respective Unicode encoding form.

6. References

[N3192] Alex Celeste. Sequential Hexdigits 2023-11-30 https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3192.pdf
[N5032] Thomas Köppe. Working Draft, Programming Languages — C++ 2025-12-15 https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2025/n5032.pdf