C++26: No More UB in Lexing -- Sandor Dargo

SANDOR_DARGO_ROUND.JPGUndefined behavior in C++ is a well-known source of headaches for developers, but surprisingly, even the lexing process contained cases of it—until now. Thanks to P2621R3 by Corentin Jabot, unterminated strings, macro-generated universal character names, and spliced UCNs are now formally defined, aligning the standard with real-world compiler behavior.

C++26: No More UB in Lexing

by Sandor Dargo

From the article:

If you ever used C++, for sure you had to face undefined behaviour. Even though it gives extra freedom for implementers, it’s dreaded by developers as it may cause havoc in your systems and it’s better to avoid it if possible.

Surprisingly, even the lexing process in C++ can result in undefined behaviour. Thanks to Corentin Jabot’s work and his P2621R3 that won’t be the case anymore. As it was accepted as a defect report starting from C++98, in fact, you benefit from this already if you use a new enough compiler.

Truth be told, compilers didn’t do any dangerous. They handled the below cases safely and deterministically. So this change is really about updating the standard and matching implementers’ work.

Let’s quickly see the three cases.

Unterminated strings

     // unterminated string used to be UB
     const char * foo = "

Who would have thought that an unterminated string or a character was UB?! Despite the permissive standard, all major compilers identified it as ill-formed. From now on, even the standard says so.
 

On Trying to Log an Exception as it Leaves your Scope -- Raymond Chen

RaymondChen_5in-150x150.jpgA customer attempted to log exceptions using a scope_exit handler, expecting it to capture and process exceptions during unwinding. However, they encountered crashes because ResultFromCaughtException requires an actively caught exception, which isn’t available during unwinding—leading to an unexpected termination.

On Trying to Log an Exception as it Leaves your Scope

by Raymond Chen

From the article:

A customer wanted to log exceptions that emerged from a function, so they used the WIL scope_exit object to specify a block of code to run during exception unwinding.

void DoSomething()
{
    auto logException = wil::scope_exit([&] {
        Log("DoSomething failed",
            wil::ResultFromCaughtException());
    });

    ⟦ do stuff that might throw exceptions ⟧

    // made it to the end - cancel the logging
    logException.release();
}

They found, however, that instead of logging the exception, the code in the scope_exit was crashing.

They debugged into the Result­From­Caught­Exception function, which eventually reaches something like this:

try
{
    throw;
}
catch (⟦ blah blah ⟧)
{
    ⟦ blah blah ⟧
}
catch (⟦ blah blah ⟧)
{
    ⟦ blah blah ⟧
}
catch (...)
{
    ⟦ blah blah ⟧
}

The idea is that the code rethrows the exception, then tries to catch it in various ways, and when it is successful, it uses the caught object to calculate a result code.

Bit Fields, Byte Order and Serialization -- Wu Yongwei

logo.pngNetwork packets can be represented as bit fields. Wu Yongwei explores some issues to be aware of and offers solutions.

Bit Fields, Byte Order and Serialization

by Wu Yongwei

From the article:

n order to store data most efficiently, the C language has supported bit fields since its early days. While saving a few bytes of memory isn’t as critical today, bit fields remain widely used in scenarios like network packets. Endianness adds complexity to bit field handling – especially since network packets are typically big-endian, while most modern architectures are little-endian. This article explores these problems and their solutions, including my reflection-based serialization project.

Memory layout of bit fields

The memory layout of bit fields is implementation-defined. In a typical little-endian environment, bit fields start from the lower bits of the lower byte and extend toward higher bits and bytes. In a typical big-endian environment, bit fields start from the higher bits of the lower byte and extend toward lower bits and higher bytes.

Let’s consider a practical scenario. Suppose we want to use a 32-bit integer to store a date. How should we achieve this? A simple approach is to store the number of days from a fixed point of time (e.g. 1 January 1900). We can calculate the number of years that can be expressed as follows:

However, with this approach, extracting specific year, month, and day information becomes very cumbersome. A simpler way is to store the year, month, and day as bit fields. We can define the following struct, using only 32 bits:

  struct Date {
    int      year  : 23;
    unsigned month : 4;
    unsigned day   : 5;
  };

Our intention is to use a 23-bit signed integer for the year (ranging from -4,194,304 to 4,194,303), a 4-bit unsigned integer for the month (0–15, covering legal values 1–12), and a 5-bit unsigned integer for the day (0–31, covering legal values 1–31). This representation is similarly compact, with a slightly narrower range, but it’s quite sufficient and much more convenient for many common usages (excepting interval calculation).

Creating a Generic Insertion Iterator, Part 2 -- Raymond Chen

RaymondChen_5in-150x150.jpgLast time, our generic insertion iterator failed to meet the requirements of default constructibility and assignability because it stored the lambda directly. To fix this, we now store a pointer to the lambda instead, ensuring the iterator meets standard requirements while still allowing flexible insertion logic.

Creating a Generic Insertion Iterator, Part 2

by Raymond Chen

From the article:

Last time, we tried to create a generic insertion iterator but ran into trouble because our iterator failed to satisfy the iterator requirements of default constructibility and assignability.

We ran into this problem because we stored the lambda as a member of the iterator.

So let’s not do that!

Instead of saving the lambda, we’ll just save a pointer to the lambda.

template<typename Lambda>
struct generic_output_iterator
{
    using iterator_category = std::output_iterator_tag;
    using value_type = void;
    using pointer = void;
    using reference = void;
    using difference_type = void;

    generic_output_iterator(Lambda&& lambda) noexcept :
        insert(std::addressof(lambda)) {}

    generic_output_iterator& operator*() noexcept
        { return *this; }
    generic_output_iterator& operator++() noexcept
        { return *this; }
    generic_output_iterator& operator++(int) noexcept
        { return *this; }

    template<typename Value>
    generic_output_iterator& operator=(
        Value&& value)
    {
        (*insert)(std::forward<Value>(value));
        return *this;
    }

protected:
    Lambda* insert;

};

template<typename Lambda>
generic_output_iterator<Lambda>
generic_output_inserter(Lambda&& lambda) noexcept {
    return generic_output_iterator<Lambda>(
        std::forward<Lambda>(lambda));
}

template<typename Lambda>
generic_output_iterator(Lambda&&) ->
    generic_output_iterator<Lambda>;

This requires that the lambda remain valid for the lifetime of the iterator, but that may not a significant burden. Other iterators also retain references that are expected to remain valid for the lifetime of the iterator. For example, std::back_inserter(v) requires that v remain valid for as long as you use the inserter. And if you use the iterator immediately, then the requirement will be satisfied:

Looking for Employers for the Meeting C++ Job Fair and the C++ Jobs Newsletter

Meeting C++ is looking for C++ Employers, as it starts a C++ Jobs Newsletter and hosts an online C++ Job fair in May!

Looking for Employers for the C++ Job Fair and the C++ Jobs Newsletter

by Jens Weller

From the article:

Meeting C++ launches a new jobs newsletter! Share your open positions!

The jobs newsletter already has 1500+ subscribers and aims at a bi-weekly schedule, with sometimes being weekly when lots of jobs are submitted. Once a month Meeting C++ will also send the jobs newsletter on its main newsletter with 25k+ subscribers in 2025. So your open positions will be seen by lots of experienced developers.

 

Creating a Generic Insertion Iterator, Part 1 -- Raymond Chen

RaymondChen_5in-150x150.jpgIn our previous post, we created an inserter iterator for unhinted insertion, and now we’re taking it a step further by generalizing it into a boilerplate-only version. This generic output iterator allows for custom insertion logic using a lambda, but as we’ll see, it doesn’t fully satisfy iterator requirements—something we’ll attempt to fix next time.

Creating a Generic Insertion Iterator, Part 1

by Raymond Chen

From the article:

Last time, we created an inserter iterator that does unhinted insertion. We noticed that most of the iterator is just boilerplate, so let’s generalize it into a version that is all-boilerplate.

// Do not use: See discussion
template<typename Lambda>
struct generic_output_iterator
{
    using iterator_category = std::output_iterator_tag;
    using value_type = void;
    using pointer = void;
    using reference = void;
    using difference_type = void;

    generic_output_iterator(Lambda&& lambda) :
        insert(std::forward<Lambda>(lambda)) {}

    generic_output_iterator& operator*() noexcept
        { return *this; }
    generic_output_iterator& operator++() noexcept
        { return *this; }
    generic_output_iterator& operator++(int) noexcept
        { return *this; }

    template<typename Value>
    generic_output_iterator& operator=(
        Value&& value)
    {
        insert(std::forward<Value>(value));
        return *this;
    }

protected:
    std::decay_t<Lambda> insert;

};

template<typename Lambda>
generic_output_iterator<Lambda>
generic_output_inserter(Lambda&& lambda) {
    return generic_output_iterator<Lambda>(
        std::forward<Lambda>(lambda));
}

template<typename Lambda>
generic_output_iterator(Lambda&&) ->
    generic_output_iterator<Lambda>;

For convenience, I provided both a deduction guide and a maker function, so you can use whichever version appeals to you. (The C++ standard library has a lot of maker functions because they predate class template argument deduction (CTAD) and deduction guides.)

Using Senders/Receivers -- Lucian Radu Teodorescu

1.pngC++26 will introduce senders/receivers. Lucian Radu Teodorescu demonstrates how to use them to write multithreaded code.

Using Senders/Receivers

by Lucian Radu Teodorescu

From the article:

This is a follow-up to the article in the previous issue of Overload, which introduced the upcoming C++26 senders/receivers framework [WG21Exec]. While the previous article focused on presenting the main concepts and outlining what will be standardized, this article demonstrates how to use the framework to build concurrent applications.

The goal is to showcase examples that are closer to real-world software rather than minimal examples. We address three problems that can benefit from multi-threaded execution: computing the Mandelbrot fractal, performing a concurrent sort, and applying a graphical transformation to a set of images.

All the code examples are available on GitHub [ExamplesCode]. We use stdexec [stdexec], the reference implementation for the senders/receivers proposal. Additionally, some features included in the examples are not yet accepted by the standard committee, though we hope they will be soon.

How do I create an inserter iterator for unhinted insertion into std::map? -- Raymond Chen

RaymondChen_5in-150x150.jpgThe C++ standard library provides various inserters like back_inserter, front_inserter, and inserter, but for associative containers like std::map, only inserter is available, requiring a hint. However, if elements arrive in an unpredictable order, providing a hint could be inefficient, so a custom inserter that performs unhinted insertion can be a useful alternative.

How do I create an inserter iterator that does unhinted insertion into an associative container like std::map

by Raymond Chen

From the article:

The C++ standard library contains various types of inserters:

  • back_inserter(c) which uses c.push_back(v).
  • front_inserter(c) which uses c.push_front(v).
  • inserter(c, it) which uses c.insert(it, v).

C++ standard library associative containers do not have push_back or push_front methods; your only option is to use the inserter. But we also learned that the hinted insertion can speed up the operation if the hint is correct, or slow it down if the hint is wrong. (Or it might not have any effect at all.)

What if you know that the items are arriving in an unpredictable order? You don’t want to provide a hint, because that’s a pessimization. The inserter requires you to pass a hint. What do you do if you don’t want to provide a hint?