ISO/IEC JTC1 SC22 WG21
N3981
Richard Smith
2014-05-06

Removing trigraphs??!

Case study

The uses of trigraph-like constructs in one large codebase were examined. We discovered:

Trigraphs continue to pose a burden on users of C++.

Proposal

Trigraphs are handled in the first phase of translation:

Physical source file characters are mapped, in an implementation-defined manner, to the basic source character set (introducing new-line characters for end-of-line indicators) if necessary. The set of physical source file characters accepted is implementation-defined. Trigraph sequences (2.4) are replaced by corresponding single-character internal representations.

Note that the mapping from physical source file characters to the basic source character set is implementation-defined. If trigraphs are removed from the language entirely, an implementation that wishes to support them can continue to do so: its implementation-defined mapping from physical source file characters to the basic source character set can include trigraph translation (and can even avoid doing so within raw string literals). We do not need trigraphs in the standard for backwards compatibility.

This paper proposes that trigraphs be removed entirely.

Proposed wording

Change in 2.2 (lex.phases) paragraph 1 bullet 1:

Physical source file characters are mapped, in an implementation-defined manner, to the basic source character set (introducing new-line characters for end-of-line indicators) if necessary. The set of physical source file characters accepted is implementation-defined. Trigraph sequences (2.4) are replaced by corresponding single-character internal representations. Any source file character not in the basic source character set (2.3) is replaced by […]

Delete subclause 2.4 (lex.trigraph) "Trigraph sequences"

Change in 2.5 (lex.pptoken) paragraph 3 bullet 1:

If the next character begins a sequence of characters that could be the prefix and initial double quote of a raw string literal, such as R", the next preprocessing token shall be a raw string literal. Between the initial and final double quote characters of the raw string, any transformations performed in phases 1 and 2 (trigraphs, universal-character-names, and line splicing) are reverted; this reversion shall apply before any d-char, r-char, or delimiting parenthesis is identified. […]

Change footnote 24 in 2.14.3 (lex.ccon) paragraph 3:

Using an escape sequence for a question mark can avoid accidentally creating a trigraph is supported for compatibility with ISO C++14 and ISO C.

Add a subclause to Annex C:

Clause 2: lexical conventions [diff.cpp14.lex]

Change: Removal of trigraphs.
Rationale: Undesirable feature that prevents some uses of ?? in non-raw string literals and comments.
Effect on original feature: Valid C++2014 code that uses trigraphs may not be valid or may have different semantics. Implementations may choose to provide trigraphs as part of the implementation-defined mapping from physical source file characters to the basic source character set, but are encouraged not to do so.

Feature test macro

No feature test macro is provided for this feature. Code that wishes to be portable between implementations that provide trigraphs and those that do not should avoid using basic source character sequences containing trigraphs.