Document #: | D2359r0 |
Date: | 2021-04-12 |
Project: | Programming Language C++ |
Audience: |
Evolution Working Group Evolution Working Group - Incubator SG22 C Liaison |
Reply-to: |
Mathias Stearn <redbeard0531@gmail.com> |
This paper proposes 3 related improvements to the switch
statement:
case 1 ... 10:
↩︎break loop_label;
and continue loop_label;
↩︎goto case 'A';
and goto default;
to jump to another branch ↩︎They are grouped into one paper, both to allow more efficient processing and consideration, and because there are significant overlaps in the wording affected due to the proposals interacting with each other. However if any sub-proposal lacks consensus to proceed, it can be easily removed, since they are logically independent.
These are all to address some annoyances I ran into frequently in a recent project, so each one has at least one example below which is simplified Real World Code.
This proposal is to simply extend the case
syntax to take either a single constant-expression as it currently does, or an inclusive range of constant-expressions separated by ...
. This addresses a common need in code to act on any value within a potentially-large range, and not just on a single value. One place this comes up is parsing, both for text formats that may want to treat all letters the same, and for binary formats that encode small numbers directly within type-dispatching bytes.
This is already implemented in gcc, clang, icc, and nvcc, but not MSVC. In at least clang and gcc, it is available even in -std=c++20
mode, which generally disables most extensions. While this is purely QoI, it seems that compilers generate much better code for case ranges than for if
statements.
As an interesting bit of history it seems that case ranges were added and later removed from C during the standardization process. The primary concern at the time was that case 'a'...'z':
may not have the desired semantics on platforms using encodings like EBCDIC where the letters are not mapped to a contiguous numeric range. I think there are at least two good counterarguments to this: 1) at this point the vast majority of developers are targeting platforms that do have contiguous letters, so we shouldn’t make the language harder for them to use, just because some historic platforms made poor encoding decisions, and 2) this seems like the perfect kind of thing for an optional warning for people that need to worry about this, and implementations can even emit it by default when targeting such unfortunate platforms.
There is a potential pitfall around the lexing of case 0...10
, which is documented by gcc. As proposed, you will need to write this as either 0 ... 10
or (0)...(10)
. There are two issues that cause this 1) The broken pp-number grammar1 (5.9
[lex.ppnumber]) considers 0...10
to be a single token because it doesn’t limit pp-numbers to only have a single .
, and 2) Even if that is solved, maximal munch ([lex.pptoken#3]) means that you would end up with the sequence [0.
|.
|.10
] which is clearly invalid. I think both of these issues are solvable if there is interest. One possible approach is to carve out another exception to maximal munch to say that if a pp-number contains a sequence of two or more dots, the pp-number ends prior to the first dot. I am not too worried about having case ranges without fixing this, because this seems like an easy thing for compilers to generate good diagnostics for (not that they do now, but I think they might if standardized), and because even simple syntax highlighting in editors (and in this paper!) should show that something isn’t quite right. That said, if this paper is accepted I will likely write a follow-up that addresses this, I just don’t think this paper should be blocked on it.
...
in pack expansions?case
labels are not pack expansion contexts, and are unlikely to be in the future, this syntax does not conflict.
case 1 ... 1
is not empty and is allowed both by this proposal and current implementations. While it is unlikely to be typed directly, it may happen in practice if the range bounds are dependent on template parameters.
break
and continue
The proposal is to allow the jump-statements break
and continue
to take an optional label identifier. The label must apply to a loop, or in the case of break
, a switch
statement is also allowed (although for simplicity I’ll also be referring to it as a loop). When a label is present, the jump-statement must be within the labeled-statement, and the jump applies to that loop.
This is perhaps best thought of as an improvement to looping constructs, but it affects switch
because it also uses break
in an unfortunate way that, among its many issues, prevents break
ing out of a loop from within a switch
, even though it allows using continue
on the same loop.
goto
?In addition to the social issues around a general distaste for goto label;
described below, I think there are some technical reasons why this is superior:
goto
, since it requires moving all code inside an extra nested block if any non-trivial variables were declared. It also requires a somewhat awkward empty statement like contin: ;
(as seen in 8.7.3
[stmt.cont]), because labels aren’t themselves statements, and need a statement for the label to apply to.break
and continue
.switch
statement.goto case
Before
|
After
|
---|---|
This proposal will extend goto
to allow it to jump to any kind of label (8.2
[stmt.label]), not just the identifier labels.This addresses a common need where you decide within a case
branch that you actually want to leave it and defer to the logic in another. While both goto case constant-expression;
and goto default;
are allowed (along with case ranges that proposal is accepted), I will be using goto case
to refer to all forms for simplicitly. There are a few places I’ve seen this come up:
[[fallthrough]]
semantics either implicitly or with the annotation.default
handler.There are a few workarounds that code can use today, but all have some downsides:
[[fallthrough]]
semantics
goto case
is more explicit even than using [[fallthrough]]
, and also less brittle in the face of maintenance.continue;
if you are in a looping context and the next pass will re-run with switch
and select the right case. But this is less efficient if you know that only one or a small subset of cases are possible now because it will force consideration of all cases, along with the looping condition.return
from the case, especially conditional returning.label:
above or below the case and using goto label;
goto label;
that leads to developers not using it even when it is the cleanest option, either because they’ve been told not to use it, or they self-censor because they know it will be a fight they don’t want to have during the codereview. While this is a social rather than a technical problem, it is still a problem.I am proposing goto case
largely as a more restricted form of goto
that can only be used in a structured manner so it is easier for a reader to tell this usage apart from the non-structural goto label;
syntactically. There is precedent for this in in C++: all of the looping constructs other than do
-loops have normative definitions that are transitively equivalences to goto label;
desugarings.
switch
statements handled?goto case
statements to refer to cases in the innermost switch
. If proposal 2 (labeled loops) is accepted, I could see some syntax like goto label: case X
to select which switch
statement the case
refers to, but I am not proposing that.
goto default;
to do the default no-op if there is no explicit default:
label in the innermost switch
?break;
instead.
goto
for this.If the committee likes the idea, but not the syntax, I can think of a few alternatives:
switch case constant-expression;
/ switch default;
switch to case constant-expression;
/ switch to default;
(with to
being a contextual keyword following switch
)switchto case constant-expression;
/ switchto default;
(although I don’t know if I want to pay the pound of flesh for a new keyword)goto case X
but goto default
seems useful.goto
-like functionality?goto case runtimeExpr
. That would allow a standardized way to access a common optimization for interpreters that currently requires using non-standard compiler extensions.
goto case A...B
with the same values as a declared case range. See the next question for more details.
goto case
matched with its corresponding case
?goto default
is easy, it matches the default:
label and only that label. goto case A...B
evaluates A
and B
, and will only match with an case range label with identical values. goto case value
is a bit more interesting. The current proposal is to select whichever label contains that value, regardless of the whether it is a range or single value, or go to default
if there is no matching case
. This is similar to how the value would be handled with switch(value)
, however it will be ill-formed if there is no matching case
and no default
. If you want to leave the switch
, you can either use break
, or put default: ;
at the bottom of the block.
Earlier drafts of this proposal used a model where that case labels (including for case ranges) are just like any others, they just use their expressions (rather than their values) for their name. At one point I required the token sequence to match, although that was loosened to using the same definition of “expression equivalence” as in template overloading. This still felt too limiting and inconsistent with how case
labels are generally handled. It also would not provide a clean evolutionary path to allowing a runtime value in goto case expr
, which clearly needs to allow different expressions than used in the labeled-statement.
One concern I had was that the looser matching using the actual value of the expression would present a burden on implementers because it does not allow matching a goto
to its labeled-statement until template instantiation time because the values used in both the case
labels and in goto
statments can depend on template parameters. The implementers I contacted did not anticipate this being a significant burden (although there was some hedging) so I switched to the model that seemed to be the best direction for the language. They also pointed out that while matching goto
could previously be done prior to template instantiation, it was common to reconstruct the control-flow graph after instantiaion anyway due to the need to detect jumps that are illegal due to skipping over non-trivial initialization, which may be template-dependant, and that they are already prepared for case X
to be evaluated after instantiaion in order to detect duplicate case
s.
These are a few examples of the proposed semantics:
enum E {A, B=A};
switch (e) {
case A: return;
default:
goto case A; // All models allow
goto case B; // Allowed in value model
// Not allowed in this proposal, but a natural extension.
goto case runtime_expr;
}
switch (i) {
case 1 ... 10: return;
case 100: return;
default:
goto case 1 ... 10; // All models allow
goto case 100; // All models allow
goto case 1; // Allowed in value model
goto case 2; // Allowed in value model
goto case -1; // Allowed in value model (goes to default:)
goto case 2 ... 8; // Not allowed in any model
}
switch (i) {
case 1: goto case 0; // Not allowed (no default:)
}
template <int I>
void func(int i) {
constexpr int J = I;
switch (i) {
case I: return;
case I + 1: return;
default:
goto case I; // All models allow
goto case I + 0; // Allowed in value model
goto case I + 1; // Allowed in value model
goto case J; // Allowed in value model
constexpr int I = J + 1;
goto case I; // Allowed in value model (goes to I + 1)
}
}
// This is hopefully not Real World Code, but I am including it
// to explore how edge cases are handled. I believe that this code is
// handled the same in both models, and that the wording should have the
// same outcome as well.
constexpr int Evil = 2;
void X(int i) {
constexpr int Evil = 1;
switch (i) {
case Evil: return; // <1>
case 101:
goto case Evil; // Goes to <1>
// This is allowed here even though declaring a constexpr int isn't.
using ::Evil; // 😈
case Evil: return; // <2>
case 102:
goto case Evil; // Goes to <2>
case Evil + 1: return; // <3>
case 103: {
constexpr int Evil = ::Evil + 1;
goto case Evil; // Goes to <3>
}
}
[ Editor's note: This wording has not yet been reviewed by a core expert, so it may be compleate non¢ents. ]
Modify 8.2 [stmt.label] as follows:
[ Editor's note: I opted to go to with label to refer to labels with an identifier, matching common usage since most people think of default
and case
as being something else. This meant that I needed general-label to refer to both kinds of labels. If you would prefer, I could use label as the generic term and something like identifier-label for the specific case. ]
1 A statement can be labeled.
case
constant-expression : statementdefault
: statementcase
constant-expression ...
constant-expressioncase
constant-expressiondefault
The optional attribute-specifier-seq appertains to the labelgeneral-label. The only use of a label with an identifierlabel is as the target of a goto
, break
, or continue
. No two labellabels in a function shall have the same identifier. A labelgeneral-label can be used in a goto statement before its introduction by a labeled-statement.
2 Case labels and default labelscase-labels shall occur only in switch
statements.
3 A case-label with two constant-expressions shall describe a non-empty, inclusive range bounded at the low end by the first constant-expression and at the high end by the second. A case-label with a single constant-expression describes a point range that both begins and ends at that constant-expression. A case-label consisting of the keyword default
does not describe any range.
Modify 8.5.3 [stmt.switch]/2 as follows:
2The condition shall be of integral type, enumeration type, or class type. If of class type, the condition is contextually implicitly converted to an integral or enumeration type. If the (possibly converted) type is subject to integral promotions, the condition is converted to the promoted type. Any statement within the switch
statement can be labeled with one or more case-labels.case labels as follows:
case
constant-expression :
where theAny constant-expressions within the case-labels shall be a converted constant expressions of the adjusted type of the switch
condition. No two of the case constantscase-labels in the same switch
shall have the same valueintersecting ranges after conversion.
Modify 8.5.3 [stmt.switch]/5 as follows:
5 When the switch
statement is executed, its condition is evaluated. If one of the case constantscase-labels has the same value asa range that includes the condition, control is passed to the statement following the matched case label. If no case constant matches the condition, and if there is a default
label, control passes to the statement labeled by the default label. If no case matches and if there is no default
then none of the statements in the switch
isare executed.
[ Editor's note: Everwhere in 8.5.3
[stmt.switch] that refers to a “case
or default
label” can be changed to use the new case-label production, but I have not made that change here. ]
Modify 8.7.1 [stmt.jump.general]/1 as follows:
1Jump statements unconditionally transfer control.
break
labelopt ;
continue
labelopt ;
return
expr-or-braced-init-listopt ;
goto
;
Modify 8.7.2 [stmt.break] as follows:
1The break
statement shall occur only in an iteration-statement or a switch
statement and causes termination of the smallest enclosing iteration-statement or selected statement; control passes to the statement following the terminated statement, if any.switch
2 A break
without the optional label selects the smallest enclosing iteration-statement or switch
statement.
3 A break
with the optional label selects the statement labeled by that label which shall be an iteration-statement or switch
statement that encloses the break
statement.
Modify 8.7.3 [stmt.cont] as follows:
1 The continue
statement shall occur only in an iteration-statement and causes control to pass to the loop-continuation portion of the smallest enclosingselected iteration-statement, that is, to the end of the loop. More precisely, in each of the statements
while (foo) { do { for (;;) {
{ { {
// ... // ... // ...
} } }
contin: ; contin: ; contin: ;
} } while (foo); }
where contin
is a synthetic label unique to each iteration-statement, a continue
not contained in an enclosed iteration statement is equivalent to goto contin
for the synthetic label of the selected iteration-statement.
[ Editor's note: All uses of contin
above should be styled like contin
to indicate that it is a placeholder even if they weren’t before. ]
2 A continue
without the optional label selects the smallest enclosing iteration-statement.
3 A continue
with the optional label selects the iteration-statement labeled by that label, which shall enclose the continue
statement.
Modify 8.7.6 [stmt.goto] as follows:
1 The goto
statement unconditionally transfers control to the statement labeled by the identifiera matching general-label. The identifier shall be a label located in the current function.
2 If the general-label is a label, it shall match the label of a labeled-statement located in the current function.
3 If the general-label is a case-label, all constant-expressions shall be converted constant expressions of the adjusted type of the smallest enclosing switch
statement. The case-label shall then match a labeled-statement present in that switch
statement, determined as follows:
if the case-label is the keyword default
, then it matches with a default
label;
if the case-label is of the form
case
constant-expression ...
constant-expression
then it matches with a case
label of the same form with the same values;
if the case-label is of the form
case constant-expression
it selects a case
label with a range that contains the value of that expression, or if none match and there is a default
label, it selects that. [ Note: This is similar to how the condition in a switch
statement is handled (8.5.3
[stmt.switch]), however, if the value does not match a case
label and there is no default
, it is ill-formed rather than leaving the switch
statement. — end note ]
I am separately working on an as-yet unpublished paper to fix pp-number (P2180).↩︎