Document #: | N4131 |
---|---|
Date: | 2014-08-09 |
Reply to: | Filip Roséen <[email protected]> |
Summary: | Arguments for not allowing return {expr} to call an explicit constructor. |
explicit
return
-statementIf one were to agree with the contents of N4074, the following snippet should compile without diagnostics;
struct Type1 { explicit Type1 (int); }; Type1 example_f1 () { return { 0 }; }
The main arguments of N4074:
return { expr }
cannot mean anything besides that we,
explicitly, want to initialize the return-value.
explicit
initialization is allowed when writing the return-statement
is redundant; both the compiler and, the developer, know what is going
on.
This paper will try to prove why the proposed change of ISO C++ in N4074 shouldn't be allowed using several methods, among them are:
explicit
Marking a constructor as explicit
is often equivalent of
saying: "such initialization sure is possible, but it's potentially
not what you want, if you really want to do this; go a head, but I won't
let it happen without your explicit consent."
If a developer would like to use our explicit
constructor,
we'd like him to go the extra mile and explicitly show us that
this is the case. We'd like him to show some effort, and more
specifically; consider if this is really what he wants...
explicit
constructors are, by the invisible contract involved,
potentially dangerous.
// meaning-of-explicit.example.1 std::unique_ptr<T> func () { static T x; return { &x }; // error: chosen constructor is explicit in copy-initialization }
There's no way for an implementation to force a developer to actual walk
around the block every time he tries to initialize an object using an
explicit
constructor, instead we require him to explicitly
state his request by writing out the type he'd like to initialize at the
point where such initialization takes place.
"I'll refuse to do this unless you show some effort."
N4074 will effectively make the previously described contract disappear
in the context of return { expr }
, which further
means that we completely disregard the original intent expressed by the
author of said constructor.
// meaning-of-explicit.example.1 std::unique_ptr<T> func () { static T x; return { &x }; // compiles, but triggers undefined-behavior } // if/when the unique_ptr is destroyed
If the author didn't want the user to "walk the extra mile",
the author wouldn't have marked the constructor as explicit
.
A braced-init-list is often referred to as means of uniform initialization, meaning that all types can be initialized using the same syntax. It doesn't matter if we are initializing an fundamental type, or a user-defined type that is initialized with one, or several, arguments; the initialization is uniform.
The current praxis, backed up by the Standard, does not state that
uniform initialization is a way to bypass the rules associated
with initialization of an object of type T
, we merely have a way to
express initialization of any type.
Another point of value is that you often hear developers state that one of the greatest perks of using a braced-init-list is that it's equivalent of saying: "Dear compiler, if you know what type I'm trying to initialize.. please, go-ahead."
It is important to note the usage of
"you know", nowhere does it imply that both the compiler and
the developer "knows the type". When an initialization requires
the use of an explicit
constructor the compiler sure knows,
but with the meaning of explicit
in mind, an implemenation
should be worried that the developer doesn't, which is why we get a
diagnostic in such case.
There are many rules to C++, some more complicated than others, but what really makes people go "hmpf" is when seemingly equivalent constructs behaves differently.
Allowing return { ... }
to use an explicit
constructor contradicts the previously, far more simple explanation:
"Unless a braced-init-list has a {type, object, cast}
explicitly stated where it is being used, a potential conversion must
be one that can happen implicitly."
Is the proposed change by N4074 really worth it?
There is a very close relationship between narrowing conversions,
and the use of a constructor marked as explicit
.
If a fundamental type T
is initialized with a compile-time
known value which isn't suitable for that type, or if such type is
initialized with an object of type U
which potentially can
hold a value that isn't representable in T
, a diagnostic is
required.
The introduction of narrowing conversions in C++ was, and is, a very good step towards increased type-safety. It prevents developers from making mistakes that can potentially result in a program that behaves in a manner which was never intended.
// narrowing-conversions.example.1 std::size_t multiply (int x, int y) { return { x * y }; // error: non-constant-expression cannot be narrowed from } // type 'int' to 'std::size_t'
It is certainly possible to initialize a std::size_t
with the
result of x * y
, but since std::size_t
cannot
handle negative numbers this is potentially unsafe.
If we play with the idea of writing a wrapper around
std::size_t
, we could end up with something like the below:
// narrowing-conversions.example.2 struct SizeType { explicit SizeType ( signed int); SizeType (unsigned int); … }; SizeType multiply (int x, int y) { return { x * y }; // error: chosen constructor is explicit in } // copy-initialization
The reason SizeType (signed int)
is marked
explicit
, is the same as to why we rely on diagnostics
to inform us of potential narrowing conversions. We rely on the
compiler to tell us when we are doing something that might lead
to unforeseen consequences.
Since C++11 the use of return { expr }
has become almost
synonym to "safe initialization of any return-type", if N4074 is
approved this will no longer be true. This would be one of the scarier
forms of a breaking change; one that cannot be caught by
something other than a watchful eye.
return
-statementT func1 () { return expression-or-braced-init-list; }
As the name implies, a return-statement is used to return
a value to the caller of a function. However, it is of utterly importance
that we understand that we never directly return the value of the
expression-or-braced-init-list
associated with the
statement; we merely say that it is to be used as the initializer for the
returned value.
The return-type of a function is per definition a distant type; one cannot know the actual return-type by only interpreting the expression-or-braced-init-list used to initialize it. The opposite also applies; one cannot know the initializers for the return-value by only inspecting the return-type.
With the mentioned relation between the return-type and its initializer(s), there are side-effects that one has to properly consider:
A developer should be allowed to change the return-type of a function without having to review every return-statement in its body. The expected behavior is that such change results in a diagnostic unless every initialization of the new return-type follows the rules of strict type-safety (meaning that a potential dangerous initialization should not implicitly apply).
In the below a developer inaccurately thought "ms" was the SI unit for microseconds, long story short, it's not. The error is however caught during compilation.
// return-statement.example.1 /*! * \brief Benchmark `f()` * \return The duration in ms spent evaluating `f()` * */ unsigned long benchmark (std::function<void()> f) { … }
commit message: * updating codebase to C++11, `benchmark` now returns the appropriate duration type from <chrono> commit diff: --- benchmark.cpp 2014-07-28 03:56:32.255764544 +0200 +++ benchmark.cpp 2014-07-28 03:56:53.175682956 +0200 @@ -5,6 +5,6 @@ * \return The duration in ms spent evaluating `f()` * */- unsigned long benchmark (std::function<void()> f) {+ std::chrono::microseconds benchmark (std::function<void()> f) { … }
A developer might not know the return-type of a function when he writes his return-statement, therefore he should have a mechanism to disable initializations that potentially does something which was never intended - no matter if such initialization makes use of one, or several, arguments.
// return-statement.example.2 template<class T> struct Vector { explicit Vector (int size, int capacity = 0); Vector (std::initializer_list<T> data); }; template<class T, class... Ts> Vector<T> make_vector (Ts... args) { return { args... }; }
int main () { using secs = std::chrono::seconds; auto x = make_vector< int> (1,5,10); auto y = make_vector<secs> (10, 20); // error: chosen constructor is explicit in copy-initialization }
Even though I agree with the opinion raised by N4074, that a developer should know the return-type and the return-paths of the function he is working on, I find it of higher value that the compiler is able to stop potential brainfarts from ever making it as far as to runtime.
Neither of the two previous examples would be caught during compilation if N4074 is approved. This means that the somewhat trivial errors leaked out into the world of runtime, something which the strict type-safety of C++ has saved us from in the past.
The proposed changes by N4074 are a violation of one of the fundamental type-safety philosophies of C++; if it's not clear that a potentially unsafe conversion can happen, we - as developers - would like the compiler to diagnose the potential error. It doesn't make sense for the rules of copy-list-initialization to differ in return-statements since we are per definition initializing a distant type - and with that, a distant value.
If N4074 is approved there are other cases where such a change need to
propogate for it to make sense. With the philosophy expressed by N4074,
private
member-functions of a class are maintained by the
same developer who is calling them (as they are implementation details),
should we then allow explicit
constructors to be used when
invoking such function having copy-list-initialization of the
arguments involved? After all, the developer should know what
is going on.