Clarifying Memory Allocation

ISO/IEC JTC1 SC22 WG21 N3537 - 2013-03-12

Lawrence Crowl, [email protected], [email protected]
Chandler Carruth, [email protected]

Introduction
Problem
Solution
    Memory
    Data Races
Wording
    3.7.4 Dynamic storage duration [basic.stc.dynamic]
    3.7.4.1 Allocation functions [basic.stc.dynamic.allocation]
    5.3.4 New [expr.new]
    17.6.4.6 Replacement functions [replacement.functions]
    18.6.1.4 Data races [new.delete.dataraces]
Revision History
References

Introduction

The allocation and deallocation of memory has become a significant expense in modern systems. The optimization of that process is important to good performance. However, it is important to distinguish between micro-optimization of the calls and macro-optimization of the allocation strategy. In particular, good system performance may well require adapting the allocation stragegy to the dynamic behavior of the application, or even to hints provided by the application.

Problem

As strict reading of the current C and C++ standards may lead one to conclude that the allocation strategy shall not consider any information not derivable from the sequence of allocation and deallocation calls. In essence, the standards may exclude macro-optimization of allocation.

On the other hand, a strict reading of the standards may lead one to conclude that the implementation must make an implementation function call for each and every abstract call. This reading may exclude micro-optimization of allocation.

Solution

We propose to replace existing mechanistic wording with wording more precisely focused on essential requirements. The intent is to enable behavior that some existing memory allocators already have. For example, see TCMalloc [TCM].

Memory

An essential requirement on implementations is that they deliver usable memory, not that they have a particular sequence of calls. We propose to explicitly decouple the implementation calls from the abstract calls.

  1. The number of implementation calls is not part of the observable behavior of the program.

  2. The parameters and return values of implementation calls is not part of the observable behavior of the program, except that the sum of the size parameters of live implementation allocation calls shall not exceed the sum of the size parameters, adjusted for alignment, of live abstract allocation calls. (That is, implementations may "round up" size parameters, as they already do.)

Together these changes enable implementations to reduce the number of malloc calls by avoiding them or fusing them. However, it would only enable fusing mallocs together into larger mallocs provided it can prove that both mallocs have overlapping lifetimes (ended by corresponding calls to free) such that the peak allocated memory of the program remains unchanged.

Because C++ class-specific memory allocators are often tuned to specific class sizes, we do not apply this relaxation to those allocators.

Data Races

An essential requirement on implementations is that they be data-race free, yet the standards do not say so directly. We propose to replace the current wording with direct wording, thus explicitly enabling an implementation to consider information beyond the strict sequence of allocation and deallocation calls.

Wording

The wording in this section is relative to WG21 N3337.

3.7.4 Dynamic storage duration [basic.stc.dynamic]

Paragraph 2 is unchanged.

The library provides default definitions for the global allocation and deallocation functions. Some global allocation and deallocation functions are replaceable (18.6.1). A C++ program shall provide at most one definition of a replaceable allocation or deallocation function. Any such function definition replaces the default version provided in the library (17.6.4.6). The following allocation and deallocation functions (18.6) are implicitly declared in global scope in each translation unit of a program.

void* operator new(std::size_t);
void* operator new[](std::size_t);
void operator delete(void*);
void operator delete[](void*);

These implicit declarations introduce only the function names operator new, operator new[], operator delete, and operator delete[]. [Note: The implicit declarations do not introduce the names std, std::size_t, or any other names that the library uses to declare these names. Thus, a new-expression, delete-expression or function call that refers to one of these functions without including the header <new> is well-formed. However, referring to std or std::size_t is ill-formed unless the name has been declared by including the appropriate header. —end note] Allocation and/or deallocation functions can also be declared and defined for any class (12.5).

Add a new paragraph between the existing paragraphs 2 and 3.

A live allocation call is one in which the call has occured but the corresponding deallocation call has not occured. An abstract allocation call is a call executed by the abstract machine (1.9 [intro.execution]). An implemented allocation call is a call executed by the physical machine. When allocation calls are relaxed, the number of live implementation allocation calls may be less than the number of live abstract allocation calls. That is, two or more relaxed abstract allocation calls may be merged into a single implementation allocation, but only provided that the sum of the size parameters of the live implementation calls shall not exceed the sum of the size parameters of live abstract calls, where each parameter is rounded up to a multiple of the corresponding alignment. Corresponding deallocation calls are likewise merged. Non-class non-array allocation calls are relaxed with respect to each other. Non-class array allocation calls are relaxed with respect to each other. Other allocation calls are not relaxed. [Note: Placement new expressions (5.3.4 [expr.new]) and pseudo destructor call expressions (5.2.4 [expr.pseudo]) do not call allocators and are therefore unaffected. —end note]

Paragraph 3 is unchanged.

Any allocation and/or deallocation functions defined in a C++ program, including the default versions in the library, shall conform to the semantics specified in 3.7.4.1 and 3.7.4.2.

3.7.4.1 Allocation functions [basic.stc.dynamic.allocation]

Paragraph 2 is unchanged.

The allocation function attempts to allocate the requested amount of storage. If it is successful, it shall return the address of the start of a block of storage whose length in bytes shall be at least as large as the requested size. There are no constraints on the contents of the allocated storage on return from the allocation function. The order, contiguity, and initial value of storage allocated by successive calls to an allocation function are unspecified. The pointer returned shall be suitably aligned so that it can be converted to a pointer of any complete object type with a fundamental alignment requirement (3.11) and then used to access the object or array in the storage allocated (until the storage is explicitly deallocated by a call to a corresponding deallocation function). Even if the size of the space requested is zero, the request can fail. If the request succeeds, the value returned shall be a non-null pointer value (4.10) p0 different from any previously returned value p1, unless that value p1 was subsequently passed to an operator delete. The effect of dereferencing a pointer returned as a request for zero size is undefined. [Footnote: The intent is to have operator new() implementable by calling std::malloc() or std::calloc(), so the rules are substantially the same. C++ differs from C in requiring a zero request to return a non-null pointer. —end footnote]

5.3.4 New [expr.new]

Paragraph 8 is unchanged, but note that the number of allocation implementation calls may be less than the number of abstract calls specified below via the provision of 3.7.4/new.

A new-expression obtains storage for the object by calling an allocation function (3.7.4.1). If the new-expression terminates by throwing an exception, it may release storage by calling a deallocation function (3.7.4.2). If the allocated type is a non-array type, the allocation function's name is operator new and the deallocation function's name is operator delete. If the allocated type is an array type, the allocation function's name is operator new[] and the deallocation function's name is operator delete[]. [Note: an implementation shall provide default definitions for the global allocation functions (3.7.4, 18.6.1.1, 18.6.1.2). A C++ program can provide alternative definitions of these functions (17.6.4.6) and/or class-specific versions (12.5). —end note]

Paragraph 8 is unchanged, but note that the number of allocation implementation calls may be less than the number of abstract calls specified below via the provision of 3.7.4/new.

A new-expression passes the amount of space requested to the allocation function as the first argument of type std::size_t. That argument shall be no less than the size of the object being created; it may be greater than the size of the object being created only if the object is an array. For arrays of char and unsigned char, the difference between the result of the new-expression and the address returned by the allocation function shall be an integral multiple of the strictest fundamental alignment requirement (3.11) of any object type whose size is no greater than the size of the array being created. [Note: Because allocation functions are assumed to return pointers to storage that is appropriately aligned for objects of any type with fundamental alignment, this constraint on array allocation overhead permits the common idiom of allocating character arrays into which objects of other types will later be placed. —end note]

17.6.4.6 Replacement functions [replacement.functions]

Paragraph 2 is unchanged.

A C++ program may provide the definition for any of eight dynamic memory allocation function signatures declared in header <new> (3.7.4, 18.6):

Paragraph 3 is unchanged.

The program's definitions are used instead of the default versions supplied by the implementation (18.6). Such replacement occurs prior to program startup (3.2, 3.6). The program's definitions shall not be specified as inline. No diagnostic is required.

18.6.1.4 Data races [new.delete.dataraces]

Edit paragraph 1 as follows.

For purposes of determining the existence of data races, the library versions of operator new, user replacement versions of global operator new, and the C standard library functions calloc and malloc shall behave as though they accessed and modified only the storage referenced by the return value. shall conform to the provisions of 17.6.5.9 [res.on.data.races]. The library versions of operator delete, user replacement versions of operator delete, and the C standard library function free shall behave as though they accessed and modified only the storage referenced by their first argument. shall conform to the provisions of 17.6.5.9 [res.on.data.races]. The C standard library function realloc shall behave as though it accessed and modified only the storage referenced by its first argument and by its return value. shall conform to the provisions of 17.6.5.9 [res.on.data.races]. Calls to these functions that allocate or deallocate a particular unit of storage p shall occur in a single total order, and each such deallocation call shall happen before (1.10 [intro.multithread]) the next allocation (if any) in this order. Programs shall ensure that a call that allocates a region p of memory happens before all accesses to p and that all modifications of p happen before the call that deallocates p.

Revision History

This paper revises N3433 - 2012-09-23 as follows.

References

[TCM]
TCMalloc : Thread-Caching Malloc, http://goog-perftools.sourceforge.net/doc/tcmalloc.html.