Undefined behavior in C++ is a well-known source of headaches for developers, but surprisingly, even the lexing process contained cases of it—until now. Thanks to P2621R3 by Corentin Jabot, unterminated strings, macro-generated universal character names, and spliced UCNs are now formally defined, aligning the standard with real-world compiler behavior.
C++26: No More UB in Lexing
by Sandor Dargo
From the article:
If you ever used C++, for sure you had to face undefined behaviour. Even though it gives extra freedom for implementers, it’s dreaded by developers as it may cause havoc in your systems and it’s better to avoid it if possible.
Surprisingly, even the lexing process in C++ can result in undefined behaviour. Thanks to Corentin Jabot’s work and his P2621R3 that won’t be the case anymore. As it was accepted as a defect report starting from C++98, in fact, you benefit from this already if you use a new enough compiler.
Truth be told, compilers didn’t do any dangerous. They handled the below cases safely and deterministically. So this change is really about updating the standard and matching implementers’ work.
Let’s quickly see the three cases.
Unterminated strings
// unterminated string used to be UB
const char * foo = "
Who would have thought that an unterminated string or a character was UB?! Despite the permissive standard, all major compilers identified it as ill-formed. From now on, even the standard says so.
Add a Comment
Comments are closed.