Switch Statement Improvements

Document #: D2359r0
Date: 2021-04-12
Project: Programming Language C++
Audience: Evolution Working Group
Evolution Working Group - Incubator
SG22 C Liaison
Reply-to: Mathias Stearn
<>

1 Introduction

This paper proposes 3 related improvements to the switch statement:

They are grouped into one paper, both to allow more efficient processing and consideration, and because there are significant overlaps in the wording affected due to the proposals interacting with each other. However if any sub-proposal lacks consensus to proceed, it can be easily removed, since they are logically independent.

These are all to address some annoyances I ran into frequently in a recent project, so each one has at least one example below which is simplified Real World Code.

2 Detailed proposal

2.1 Case ranges

Note: some function calls are placeholders for a handful of lines of code
Before
After
switch (foo) {
    case 1:
    case 2:
    case:
    case 10:
        doSomething();
}
switch (foo) {
    case 1 ... 10:
        doSomething();
}
auto c = nextChar();
if (c >= 'a' && c <= 'z') {
    doLower(c);
} else if (c >= 'A' && c <= 'Z') {
    doUpper(c);
} else if (c >= '0' && c <= '9') {
    doDigit(c);
} else {
    switch (c) {
        case '{':
        case '}':
            doBrace(c);
        case '[':
        case ']':
            doBracket(c);
        default:
            doUnknown(c);
    }
}
switch (auto c = nextChar()) {
    case 'a'...'z': doLower(c); break;
    case 'A'...'Z': doUpper(c); break;
    case '0'...'9': doDigit(c); break;
    case '{':
    case '}':
        doBrace(c);
        break;
    case '[':
    case ']':
        doBracket(c);
        break;
    default:
        doUnknown(c);
}
uint8_t c = readByte();
if (c >= kMinTinyInt32 && c <= kMaxTinyInt32) {
    return Value(int32_t(c) - kTinyInt32Zero)
} else if (c >= kMinTinyInt64 && c <= kMaxTinyInt64) {
    return Value(int64_t(c) - kTinyInt64Zero)
} else if (c >= kMinTinyStrSize && c <= kMaxTinyStrSize) {
    return Value(readStr(c - kMinTinyStrSize);
} else {
    switch (c) {
        // In real code has 1, 2, and 4 byte versions
        case kInt32:
            return Value(readInt32());

        // In real code has 1, 2, 4 and 8 byte versions
        case kInt64:
            return Value(readInt64());

        case kBigStr:
            return Value(readStr(readStrSize()));

        case// some more cases here

        default:
            doUnknown(c);
    }
}
switch (uint8_t c = readByte()) {
    // In real code has 1, 2, and 4 byte versions
    case kInt32:
        return Value(readInt32());
    case kMinTinyInt32...kMaxTinyInt32:
        return Value(int32_t(c) - kTinyInt32Zero)

    // In real code has 1, 2, 4 and 8 byte versions
    case kInt64:
        return Value(readInt64());
    case kMinTinyInt64...kMaxTinyInt64:
        return Value(int64_t(c) - kTinyInt64Zero)

    case kBigStr:
        return Value(readStr(readStrSize()));
    case kMinTinyStrSize...kMaxTinyStrSize:
        return Value(readStr(c - kMinTinyStrSize);

    case// some more cases here

    default:
        doUnknown(c);
}

This proposal is to simply extend the case syntax to take either a single constant-expression as it currently does, or an inclusive range of constant-expressions separated by .... This addresses a common need in code to act on any value within a potentially-large range, and not just on a single value. One place this comes up is parsing, both for text formats that may want to treat all letters the same, and for binary formats that encode small numbers directly within type-dispatching bytes.

This is already implemented in gcc, clang, icc, and nvcc, but not MSVC. In at least clang and gcc, it is available even in -std=c++20 mode, which generally disables most extensions. While this is purely QoI, it seems that compilers generate much better code for case ranges than for if statements.

As an interesting bit of history it seems that case ranges were added and later removed from C during the standardization process. The primary concern at the time was that case 'a'...'z': may not have the desired semantics on platforms using encodings like EBCDIC where the letters are not mapped to a contiguous numeric range. I think there are at least two good counterarguments to this: 1) at this point the vast majority of developers are targeting platforms that do have contiguous letters, so we shouldn’t make the language harder for them to use, just because some historic platforms made poor encoding decisions, and 2) this seems like the perfect kind of thing for an optional warning for people that need to worry about this, and implementations can even emit it by default when targeting such unfortunate platforms.

There is a potential pitfall around the lexing of case 0...10, which is documented by gcc. As proposed, you will need to write this as either 0 ... 10 or (0)...(10). There are two issues that cause this 1) The broken pp-number grammar1 (5.9 [lex.ppnumber]) considers 0...10 to be a single token because it doesn’t limit pp-numbers to only have a single ., and 2) Even if that is solved, maximal munch ([lex.pptoken#3]) means that you would end up with the sequence [0.|.|.10] which is clearly invalid. I think both of these issues are solvable if there is interest. One possible approach is to carve out another exception to maximal munch to say that if a pp-number contains a sequence of two or more dots, the pp-number ends prior to the first dot. I am not too worried about having case ranges without fixing this, because this seems like an easy thing for compilers to generate good diagnostics for (not that they do now, but I think they might if standardized), and because even simple syntax highlighting in editors (and in this paper!) should show that something isn’t quite right. That said, if this paper is accepted I will likely write a follow-up that addresses this, I just don’t think this paper should be blocked on it.

2.1.1 Q&A

Does this conflict with the usage of ... in pack expansions?
Because case labels are not pack expansion contexts, and are unlikely to be in the future, this syntax does not conflict.
How should enums be handled?
If I was designing this from scratch I would restrict scoped enums to only be usable in case ranges if the entire range is populated with named enumerators. However, since current implementations allow this I am not proposing this restriction.
Are there any restrictions?
Yes, empty ranges are disallowed. This in one place where existing practice differs with gcc and clang emitting a warning and icc emitting an error. Since all implementation at least warn, and one errors, I am choosing to enforce this restriction, however if there is implementer objection to rejecting code they currently accept, I do not feel strongly about this. Note that case 1 ... 1 is not empty and is allowed both by this proposal and current implementations. While it is unlikely to be typed directly, it may happen in practice if the range bounds are dependent on template parameters.

2.2 Labeled loops for break and continue

Note: some function calls are placeholders for a handful of lines of code
Before
After
bool found_it = false;
for (auto&& row : table) {
    for (auto&& cell : row) {
        if (check(cell)) {
            handle(cell);
            found_it = true;
            break; // only breaks out of inner loop
        }
    }
    if (found_it)
        break; // break out of outer loop
}
outer_loop:
for (auto&& row : table) {
    for (auto&& cell : row) {
        if (check(cell)) {
            handle(cell);
            break outer_loop;
        }
    }
}
void parseArray() {
    bool hitEnd = false;
    while (haveMoreData() && !hitEnd) {
        switch (auto c = readByte()) {
            case ']':
                hitEnd = true;
                break;
            case '+':
                index += readNum();
                // works now, but behaves differently from break
                continue;
            case '|':
                insertElement();
                break;

            case// some more cases here
        }
        index++;
    }
    finishArray();
}
void parseArray() {
    reader_loop:
    while (haveMoreData()) {
        switch (auto c = readByte()) {
            case ']':

                break reader_loop;
            case '+':
                index += readNum();
                // same as continue, but explicit
                continue reader_loop;
            case '|':
                insertElement();
                break;

            case// some more cases here
        }
        index++;
    }
    finishArray();
}

The proposal is to allow the jump-statements break and continue to take an optional label identifier. The label must apply to a loop, or in the case of break, a switch statement is also allowed (although for simplicity I’ll also be referring to it as a loop). When a label is present, the jump-statement must be within the labeled-statement, and the jump applies to that loop.

This is perhaps best thought of as an improvement to looping constructs, but it affects switch because it also uses break in an unfortunate way that, among its many issues, prevents breaking out of a loop from within a switch, even though it allows using continue on the same loop.

2.2.1 Q&A

Why don’t you just use a goto?

In addition to the social issues around a general distaste for goto label; described below, I think there are some technical reasons why this is superior:

  • It puts the label at the top of the loop which is generally more informative, and more likely to have been read by a human reader, than the bottom of the loop.
  • It is complicated to replace a continue with a goto, since it requires moving all code inside an extra nested block if any non-trivial variables were declared. It also requires a somewhat awkward empty statement like contin: ; (as seen in 8.7.3 [stmt.cont]), because labels aren’t themselves statements, and need a statement for the label to apply to.
  • If you want to both break and continue from the same loop, you are able to use a single name for the loop, rather than separately named labels for break and continue.
I don’t like the idea of a labeled switch statement.
While it seems useful, I don’t think it is as useful as labeled loops, so if it increases consensus, I’d be ok restricting it to just actual loops.

2.3 goto case

Note: some function calls are placeholders for a handful of lines of code
Before
After
switch(elem.type()) {
    case Object:
        if (!elem.empty())
            goto defaultCase;
        encodeEmptyObj();
        break;
    case Array:
        if (!elem.empty())
            goto defaultCase;
        encodeEmptyArr();
        break;
    case String:
        if (elem.length() > 64)
            goto defaultCase;
        encodeShortStr(elem);
        break;
    defaultCase:
    default:
        encodeComplex(elem);
}
switch(elem.type()) {
    case Object:
        if (!elem.empty())
            goto default;
        encodeEmptyObj();
        break;
    case Array:
        if (!elem.empty())
            goto default;
        encodeEmptyArr();
        break;
    case String:
        if (elem.length() > 64)
            goto default;
        encodeShortStr(elem);
        break;

    default:
        encodeComplex(elem);
}
switch(action) {
    case REPEAT_LAST:
        prepareToRepeat();
        if (lastAction == OPEN_OBJECT)
            goto objectCase;
        goto arrayCase;
    objectCase:
    case OPEN_OBJECT:
        doObject();
        break;
    arrayCase:
    case OPEN_ARRAY:
        doArray();
        break;
    case CLOSE:
        return;

    case// some more cases here
}
switch(action) {
    case REPEAT_LAST:
        prepareToRepeat();
        if (lastAction == OPEN_OBJECT)
            goto case OPEN_OBJECT;
        goto case OPEN_ARRAY;

    case OPEN_OBJECT:
        doObject();
        break;

    case OPEN_ARRAY:
        doArray();
        break;
    case CLOSE:
        return;

    case// some more cases here
}

This proposal will extend goto to allow it to jump to any kind of label (8.2 [stmt.label]), not just the identifier labels.This addresses a common need where you decide within a case branch that you actually want to leave it and defer to the logic in another. While both goto case constant-expression; and goto default; are allowed (along with case ranges that proposal is accepted), I will be using goto case to refer to all forms for simplicitly. There are a few places I’ve seen this come up:

There are a few workarounds that code can use today, but all have some downsides:

I am proposing goto case largely as a more restricted form of goto that can only be used in a structured manner so it is easier for a reader to tell this usage apart from the non-structural goto label; syntactically. There is precedent for this in in C++: all of the looping constructs other than do-loops have normative definitions that are transitively equivalences to goto label; desugarings.

2.3.1 Q&A

How are nested switch statements handled?
You can only use goto case statements to refer to cases in the innermost switch. If proposal 2 (labeled loops) is accepted, I could see some syntax like goto label: case X to select which switch statement the case refers to, but I am not proposing that.
Can I use goto default; to do the default no-op if there is no explicit default: label in the innermost switch?
No, use break; instead.
I don’t like using goto for this.

If the committee likes the idea, but not the syntax, I can think of a few alternatives:

  • switch case constant-expression; / switch default;
  • switch to case constant-expression; / switch to default; (with to being a contextual keyword following switch)
  • switchto case constant-expression; / switchto default; (although I don’t know if I want to pay the pound of flesh for a new keyword)
I don’t like the general goto case X but goto default seems useful.
If this is the feeling of the committee, it would be easy to restrict it to just that.
Does this allow computed goto-like functionality?
Not as proposed, because the destination must be a constant-expression. However, if the committee would prefer, I could lift that restriction and allow goto case runtimeExpr. That would allow a standardized way to access a common optimization for interpreters that currently requires using non-standard compiler extensions.
If proposal 1 is accepted, can I jump to a case range?
Yes. As proposed, you would need to use goto case A...B with the same values as a declared case range. See the next question for more details.
How is a goto case matched with its corresponding case?

goto default is easy, it matches the default: label and only that label. goto case A...B evaluates A and B, and will only match with an case range label with identical values. goto case value is a bit more interesting. The current proposal is to select whichever label contains that value, regardless of the whether it is a range or single value, or go to default if there is no matching case. This is similar to how the value would be handled with switch(value), however it will be ill-formed if there is no matching case and no default. If you want to leave the switch, you can either use break, or put default: ; at the bottom of the block.


Earlier drafts of this proposal used a model where that case labels (including for case ranges) are just like any others, they just use their expressions (rather than their values) for their name. At one point I required the token sequence to match, although that was loosened to using the same definition of “expression equivalence” as in template overloading. This still felt too limiting and inconsistent with how case labels are generally handled. It also would not provide a clean evolutionary path to allowing a runtime value in goto case expr, which clearly needs to allow different expressions than used in the labeled-statement.


One concern I had was that the looser matching using the actual value of the expression would present a burden on implementers because it does not allow matching a goto to its labeled-statement until template instantiation time because the values used in both the case labels and in goto statments can depend on template parameters. The implementers I contacted did not anticipate this being a significant burden (although there was some hedging) so I switched to the model that seemed to be the best direction for the language. They also pointed out that while matching goto could previously be done prior to template instantiation, it was common to reconstruct the control-flow graph after instantiaion anyway due to the need to detect jumps that are illegal due to skipping over non-trivial initialization, which may be template-dependant, and that they are already prepared for case X to be evaluated after instantiaion in order to detect duplicate cases.


These are a few examples of the proposed semantics:

enum E {A, B=A};
switch (e) {
    case A: return;
    default:
        goto case A; // All models allow
        goto case B; // Allowed in value model

        // Not allowed in this proposal, but a natural extension.
        goto case runtime_expr;
}

switch (i) {
    case 1 ... 10: return;
    case 100: return;
    default:
      goto case 1 ... 10; // All models allow
      goto case 100;      // All models allow
      goto case 1;        // Allowed in value model
      goto case 2;        // Allowed in value model
      goto case -1;       // Allowed in value model (goes to default:)
      goto case 2 ... 8;  // Not allowed in any model
}

switch (i) {
    case 1: goto case 0; // Not allowed (no default:)
}

template <int I>
void func(int i) {
    constexpr int J = I;
    switch (i) {
        case I: return;
        case I + 1: return;
        default:
            goto case I;     // All models allow
            goto case I + 0; // Allowed in value model
            goto case I + 1; // Allowed in value model
            goto case J;     // Allowed in value model

            constexpr int I = J + 1;
            goto case I; // Allowed in value model (goes to I + 1)
    }
}

// This is hopefully not Real World Code, but I am including it 
// to explore how edge cases are handled. I believe that this code is
// handled the same in both models, and that the wording should have the
// same outcome as well.
constexpr int Evil = 2;
void X(int i) {
    constexpr int Evil = 1;
    switch (i) {
        case Evil: return; // <1>
        case 101:
            goto case Evil; // Goes to <1>

        // This is allowed here even though declaring a constexpr int isn't.
        using ::Evil; // 😈

        case Evil: return; // <2>
        case 102:
            goto case Evil; // Goes to <2>

        case Evil + 1: return; // <3>
        case 103: {
            constexpr int Evil = ::Evil + 1;
            goto case Evil; // Goes to <3>
        }
}

3 Proposed Wording

[ Editor's note: This wording has not yet been reviewed by a core expert, so it may be compleate non¢ents. ]

Modify 8.2 [stmt.label] as follows:

[ Editor's note: I opted to go to with label to refer to labels with an identifier, matching common usage since most people think of default and case as being something else. This meant that I needed general-label to refer to both kinds of labels. If you would prefer, I could use label as the generic term and something like identifier-label for the specific case. ]

1 A statement can be labeled.

labeled-statement:
    attribute-specifier-seqopt identifier : statement
    attribute-specifier-seqopt case constant-expression : statement
    attribute-specifier-seqopt default : statement
    attribute-specifier-seqopt general-label : statement

general-label:
    label
    case-label

label:
    identifier

case-label:
    case constant-expression ... constant-expression
    case constant-expression
    default

The optional attribute-specifier-seq appertains to the labelgeneral-label. The only use of a label with an identifierlabel is as the target of a goto, break, or continue. No two labellabels in a function shall have the same identifier. A labelgeneral-label can be used in a goto statement before its introduction by a labeled-statement.

2 Case labels and default labelscase-labels shall occur only in switch statements.

3 A case-label with two constant-expressions shall describe a non-empty, inclusive range bounded at the low end by the first constant-expression and at the high end by the second. A case-label with a single constant-expression describes a point range that both begins and ends at that constant-expression. A case-label consisting of the keyword default does not describe any range.

Modify 8.5.3 [stmt.switch]/2 as follows:

2The condition shall be of integral type, enumeration type, or class type. If of class type, the condition is contextually implicitly converted to an integral or enumeration type. If the (possibly converted) type is subject to integral promotions, the condition is converted to the promoted type. Any statement within the switch statement can be labeled with one or more case-labels.case labels as follows:

case constant-expression :

where theAny constant-expressions within the case-labels shall be a converted constant expressions of the adjusted type of the switch condition. No two of the case constantscase-labels in the same switch shall have the same valueintersecting ranges after conversion.

Modify 8.5.3 [stmt.switch]/5 as follows:

5 When the switch statement is executed, its condition is evaluated. If one of the case constantscase-labels has the same value asa range that includes the condition, control is passed to the statement following the matched case label. If no case constant matches the condition, and if there is a default label, control passes to the statement labeled by the default label. If no case matches and if there is no default then none of the statements in the switch isare executed.

[ Editor's note: Everwhere in 8.5.3 [stmt.switch] that refers to a “case or default label” can be changed to use the new case-label production, but I have not made that change here. ]

Modify 8.7.1 [stmt.jump.general]/1 as follows:

1Jump statements unconditionally transfer control.

jump-statement:
    break labelopt ;
    continue labelopt ;
    return expr-or-braced-init-listopt ;
    coroutine-return-statement
    goto identifiergeneral-label ;

Modify 8.7.2 [stmt.break] as follows:

1The break statement shall occur only in an iteration-statement or a switch statement and causes termination of the smallest enclosing iteration-statement or switchselected statement; control passes to the statement following the terminated statement, if any.

2 A break without the optional label selects the smallest enclosing iteration-statement or switch statement.

3 A break with the optional label selects the statement labeled by that label which shall be an iteration-statement or switch statement that encloses the break statement.

Modify 8.7.3 [stmt.cont] as follows:

1 The continue statement shall occur only in an iteration-statement and causes control to pass to the loop-continuation portion of the smallest enclosingselected iteration-statement, that is, to the end of the loop. More precisely, in each of the statements

while (foo) {    do {               for (;;) {
  {                {                  {       
    // ...           // ...             // ...
  }                }                  }       
contin: ;        contin: ;          contin: ; 
}                } while (foo);     }         

where contin is a synthetic label unique to each iteration-statement, a continue not contained in an enclosed iteration statement is equivalent to goto contin for the synthetic label of the selected iteration-statement.

[ Editor's note: All uses of contin above should be styled like contin to indicate that it is a placeholder even if they weren’t before. ]

2 A continue without the optional label selects the smallest enclosing iteration-statement.

3 A continue with the optional label selects the iteration-statement labeled by that label, which shall enclose the continue statement.

Modify 8.7.6 [stmt.goto] as follows:

1 The goto statement unconditionally transfers control to the statement labeled by the identifiera matching general-label. The identifier shall be a label located in the current function.

2 If the general-label is a label, it shall match the label of a labeled-statement located in the current function.

3 If the general-label is a case-label, all constant-expressions shall be converted constant expressions of the adjusted type of the smallest enclosing switch statement. The case-label shall then match a labeled-statement present in that switch statement, determined as follows:

  • if the case-label is the keyword default, then it matches with a default label;

  • if the case-label is of the form

    case constant-expression ... constant-expression

    then it matches with a case label of the same form with the same values;

  • if the case-label is of the form

    case constant-expression

    it selects a case label with a range that contains the value of that expression, or if none match and there is a default label, it selects that. [ Note: This is similar to how the condition in a switch statement is handled (8.5.3 [stmt.switch]), however, if the value does not match a case label and there is no default, it is ill-formed rather than leaving the switch statement.end note ]


  1. I am separately working on an as-yet unpublished paper to fix pp-number (P2180).↩︎