Audience: LEWG, SG14, WG21
Document number: P0447R20
Date: 2022-06-04
Project: Introduction of std::hive to the standard library
Reply-to: Matthew Bentley <[email protected]>

Introduction of std::hive to the standard library

Table of Contents

  1. Introduction
  2. Questions for the committee
  3. Motivation and Scope
  4. Impact On the Standard
  5. Design Decisions
  6. Technical Specification
  7. Acknowledgments
  8. Appendices:
    1. Basic usage examples
    2. Reference implementation benchmarks
    3. Frequently Asked Questions
    4. Specific responses to previous committee feedback
    5. Typical game engine requirements
    6. Time complexity requirement explanations
    7. Original reference implementation differences and link
    8. User experience reports
    9. Brief guide for selecting an appropriate container based on usage and performance

Revision history

I. Introduction

The purpose of a container in the standard library cannot be to provide the optimal solution for all scenarios. Inevitably in fields such as high-performance trading or gaming, the optimal solution within critical loops will be a custom-made one that fits that scenario perfectly. However, outside of the most critical of hot paths, there is a wide range of application for more generalized solutions.

Hive is a formalisation, extension and optimization of what is typically known as a 'bucket array' container in game programming circles; similar structures exist in various incarnations across the high-performance computing, high performance trading, 3D simulation, physics simulation, robotics, server/client application and particle simulation fields (see: https://groups.google.com/a/isocpp.org/forum/#!topic/sg14/1iWHyVnsLBQ).

The concept of a bucket array is: you have multiple memory blocks of elements, and a boolean token for each element which denotes whether or not that element is 'active' or 'erased', commonly known as a skipfield. If it is 'erased', it is skipped over during iteration. When all elements in a block are erased, the block is removed, so that iteration does not lose performance by having to skip empty blocks. If an insertion occurs when all the blocks are full, a new memory block is allocated.

The advantages of this structure are as follows: because a skipfield is used, no reallocation of elements is necessary upon erasure. Because the structure uses multiple memory blocks, insertions to a full container also do not trigger reallocations. This means that element memory locations stay stable and iterators stay valid regardless of erasure/insertion. This is highly desirable, for example, in game programming because there are usually multiple elements in different containers which need to reference each other during gameplay and elements are being inserted or erased in real time.

Problematic aspects of a typical bucket array are that they tend to have a fixed memory block size, do not re-use memory locations from erased elements, and utilize a boolean skipfield. The fixed block size (as opposed to block sizes with a growth factor) and lack of erased-element re-use leads to far more allocations/deallocations than is necessary. Given that allocation is a costly operation in most operating systems, this becomes important in performance-critical environments. The boolean skipfield makes iteration time complexity undefined, as there is no way of knowing ahead of time how many erased elements occur between any two non-erased elements. This can create variable latency during iteration. It also requires branching code, which may cause issues on processors with deep pipelines and poor branch-prediction failure performance.

A hive uses a non-boolean method for skipping erased elements, which allows for O(1) amortized iteration time complexity and more-predictable iteration performance than a bucket array. It also utilizes a growth factor for memory blocks and reuses erased element locations upon insertion, which leads to fewer allocations/reallocations. Because it reuses erased element memory space, the exact location of insertion is undefined. In most implementations it's likely (for performance reasons) that unless no erasures have occurred or an equal number of erasures and insertions have occurred, the insertion location would be the back of the container. The container is therefore considered unordered but sortable. Lastly, because there is no way of predicting in advance where erasures ('skips') may occur during iteration, an O(1) time complexity [ ] operator is not necessarily possible (depending on implementation) and therefore, the container is bidirectional but not random-access.

There are two patterns for accessing stored elements in a hive: the first is to iterate over the container and process each element (or skip some elements using the advance/prev/next/iterator ++/-- functions). The second is to store the iterator returned by the insert() function (or a pointer derived from the iterator) in some other structure and access the inserted element in that way. To better understand how insertion and erasure work in a hive, see the following images.

Insertion to back

The following images demonstrate how insertion works in a hive compared to a vector when size == capacity (note: images use old name for this proposal, colony. it is the same container).

Visual demonstration of inserting to a full vector Visual demonstration of inserting to a full colony

Non-back erasure

The following images demonstrate how non-back erasure works in a hive compared to a vector.

Visual demonstration of randomly erasing from a vector Visual demonstration of randomly erasing from a colony

There is additional introductory information about the container's structure in this CPPcon talk, though some of it's information is out of date (hive/colony no longer uses a stack but a free list instead, benchmark data is out of date, etc), and more detailed implementation information is available in this CPPnow talk.

II. Questions for the Committee

None at present.

III. Motivation and Scope

Note: Throughout this document I will use the term 'link' to denote any form of referencing between elements whether it be via ids/iterators/pointers/indexes/references/etc.

There are situations where data is heavily interlinked, iterated over frequently, and changing often. An example is the typical video game engine. Most games will have a central generic 'entity' or 'actor' class, regardless of their overall schema (an entity class does not imply an ECS). Entity/actor objects tend to be 'has a'-style objects rather than 'is a'-style objects, which link to, rather than contain, shared resources like sprites, sounds and so on. Those shared resources are usually located in separate containers/arrays so that they can re-used by multiple entities. Entities are in turn referenced by other structures within a game engine, such as quadtrees/octrees, level structures, and so on.

Entities may be erased at any time (for example, a wall gets destroyed and no longer is required to be processed by the game's engine, so is erased) and new entities inserted (for example, a new enemy is spawned). While this is all happening the links between entities, resources and superstructures such as levels and quadtrees, must stay valid in order for the game to run. The order of the entities and resources themselves within the containers is, in the context of a game, typically unimportant, so an unordered container is okay.

Unfortunately the container with the best iteration performance in the standard library, vector[1], loses pointer validity to elements within it upon insertion, and pointer/index validity upon erasure. This tends to lead to sophisticated and often restrictive workarounds when developers attempt to utilize vector or similar containers under the above circumstances.

std::list and the like are not suitable due to their poor locality, which leads to poor cache performance during iteration. This is however an ideal situation for a container such as hive, which has a high degree of locality. Even though that locality can be punctuated by gaps from erased elements, it still works out better in terms of iteration performance[1] than every existing standard library container other than deque/vector, regardless of the ratio of erased to non-erased elements.

Some more specific requirements for containers in the context of game development are listed in the appendix.

As another example, particle simulation (weather, physics etcetera) often involves large clusters of particles which interact with external objects and each other. The particles each have individual properties (spin, momentum, direction etc) and are being created and destroyed continuously. Therefore the order of the particles is unimportant, what is important is the speed of erasure and insertion. No current standard library container has both strong insertion and non-back erasure speed, so again this is a good match for hive.

Reports from other fields suggest that, because most developers aren't aware of containers such as this, they often end up using solutions which are sub-par for iterative performance such as std::map and std::list in order to preserve pointer validity, when most of their processing work is actually iteration-based. So, introducing this container would both create a convenient solution to these situations, as well as increasing awareness of better-performing approaches in general. It will also ease communication across fields, as opposed to the current scenario where each field uses a similar container but each has a different name for it.

IV. Impact On the Standard

This is purely a library addition, requiring no changes to the language.

V. Design Decisions

The three core aspects of a hive from an abstract perspective are:

  1. A collection of element memory blocks + metadata, to prevent reallocation during insertion (as opposed to a single memory block)
  2. A method of skipping erased elements in O(1) time during iteration (as opposed to reallocating subsequent elements during erasure)
  3. An erased-element location recording mechanism, to enable the re-use of memory from erased elements in subsequent insertions, which in turn increases cache locality and reduces the number of block allocations/deallocations

Each memory block houses multiple elements. The metadata about each block may or may not be allocated with the blocks themselves (could be contained in a separate structure). This metadata should include at a minimum, the number of non-erased elements within each block and the block's capacity - which allows the container to know when the block is empty and needs to be removed from the iterative chain, and also allows iterators to judge when the end of one block has been reached. A non-boolean method of skipping over erased elements during iteration while maintaining O(1) amortized iteration time complexity is required (amortized due to block traversal, which would typically require a few more operations). Finally, a mechanism for keeping track of elements which have been erased must be present, so that those memory locations can be reused upon subsequent element insertions.

The following aspects of a hive must be implementation-defined in order to allow for variance and possible performance improvement, and to conform with possible changes to C++ in the future:

However the implementation of these is significantly constrained by the requirements of the container (lack of reallocation, stable pointers to non-erased elements regardless of erasures/insertions).

In terms of the original reference implementation (current reference implementation here) the specific structure and mechanisms have changed many times over the course of development, however the interface to the container and its time complexity guarantees have remained largely unchanged (with the exception of the time complexity for updating skipfield nodes - which has not impacted significantly on performance). So it is reasonably likely that regardless of specific implementation, it will be possible to maintain this general specification without obviating future improvements in implementation, so long as time complexity guarantees for the above list are implementation-defined.

Below I explain the reference implementation's approach in terms of the three core aspects described above, along with descriptions of some alternatives implementation approaches.

1. Collection of element memory blocks + metadata

In the reference implementation this is essentially a doubly-linked list of 'group' structs containing (a) a dynamically-allocated element memory block, (b) memory block metadata and (c) a dynamically-allocated skipfield. The memory blocks and skipfields have a growth factor of 2 from one group to the next. The metadata includes information necessary for an iterator to iterate over hive elements, such as the last insertion point within the memory block, and other information useful to specific functions, such as the total number of non-erased elements in the node. This approach keeps the operation of freeing empty memory blocks from the hive container at O(1) time complexity. Further information is available here.

Using a vector of group structs with dynamically-allocated element memory blocks, using the swap-and-pop idiom where groups need to be erased from the iterative sequence, would not work. To explain, when a group becomes empty of elements, it must be removed from the sequence of groups, because otherwise you end up with highly-variable latency during iteration due to the need to skip over an unknown number of empty groups when traversing from one non-empty group to the next. Simply erasing the group will not suffice, as this would create a variable amount of latency during erasure when the group becomes empty, based on the number of groups after that group which would need to be reallocated backward in the vector. But even if one swapped the to-be-erased group with the back group, and then pop'd the to-be-erased group off the back, this would not solve the problem, as iterators require a stable pointer to the group they are traversing in order to traverse to the next group in the sequence. If an iterator pointed to an element in the back group, and the back group was swapped with the to-be-erased group, this would invalidate the iterator.

A vector of pointers to group structs is more-possible. Erasing groups would still have highly-variable latency due to reallocation, however the cost of reallocating pointers may be negligible depending on architecture. While the number of pointers can be expected to be low in most cases due to the growth factor in memory blocks, if the user has defined their own memory block capacity limits the number of pointers could be large, and this has to be taken into consideration. In this case using a pop-and-swap idiom is still not possible, because while it would not necessarily invalidate the internal references of an iterator pointing to an element within the back group, the sequence of blocks would be changed and therefore the iterator would be moved backwards in the iterative sequence.

A vector of memory blocks, as opposed to a vector of pointers to memory blocks or a vector of group structs with dynamically-allocated memory blocks, would also not work, both due to the above points and because as it would (a) disallow a growth factor in the memory blocks and (b) invalidate pointers to elements in subsequent blocks when a memory block became empty of elements and was therefore removed from the vector. In short, negating hive's beneficial aspects.

2. A non-boolean method of skipping erased elements in O(1) time during iteration

The reference implementation currently uses a skipfield pattern called the Low complexity jump-counting pattern. This effectively encodes the length of runs of consecutive erased elements, into a skipfield, which allows for O(1) time complexity during iteration. Since there is no branching involved in iterating over the skipfield aside from end-of-block checks, it can be less problematic computationally than a boolean skipfield (which has to branch for every skipfield read) in terms of CPUs which don't handle branching or branch-prediction failure efficiently (eg. Core2). It also does not have the variable latency associated with a boolean skipfield.

The pattern stores and modifies the run-lengths during insertion and erasure with O(1) time complexity. It has a lot of similarities to the High complexity jump-counting pattern, which was a pattern previously used by the reference implementation. Using the High complexity jump-counting pattern is an alternative, though the skipfield update time complexity guarantees for that pattern are effectively undefined, or between O(1) and O(skipfield length) for each insertion/erasure. In practice those updates result in one memcpy operation which may resolve to a much smaller number of SIMD copies at the hardware level. But it is still a little slower than the Low complexity jump-counting pattern. The method you use to skip erased elements will typically also have an effect on the type of memory-reuse mechanism you can utilize.

A pure boolean skipfield is not usable because it makes iteration time complexity undefined - it could for example result in thousands of branching statements + skipfield reads for a single ++ operation in the case of many consecutive erased elements. In the high-performance fields for which this container was initially designed, this brings with it unacceptable latency. However another strategy using a combination of a jump-counting and boolean skipfield, which saves memory at the expense of computational efficiency, is possible as follows:

  1. Instead of storing the data for the low complexity jump-counting pattern in its own skipfield, have a boolean bitfield indicating which elements are erased. Store the jump-counting data in the erased element's memory space instead (possibly alongside free list data).
  2. When iterating, check whether the element is erased or not using the boolean bitfield; if it is not erased, do nothing. If it is erased, read the jump value from the erased element's memory space and skip forward the appropriate number of nodes both in the element memory block and the boolean bitfield.

This approach has the advantage of still performing O(1) iterations from one non-erased element to the next, unlike a pure boolean skipfield approach, but compared to a pure jump-counting approach introduces 3 additional costs per iteration via (1) a branch operation when checking the bitfield, (2) an additional read (of the erased element's memory space) and (3) a bitmasking operation + bitshift to read the bit. But it does reduce the memory overhead of the skipfield to 1 bit per-element. In the early days of hive/colony I experimented with using both byte-based boolean skipfields and bit-based boolean skipfields. The bit-based ones were always slower, regardless of the technique. And the jump-counting skipfield was faster than both of those.

Another method worth mentioning is the use of a referencing array - for example, having a vector of elements, together with a vector of either indexes or pointers to those elements. When an element is erased, the vector of elements itself is not updated - no elements are reallocated. Meanwhile the referencing vector is updated and the index or pointer to the erased element is erased. When iteration occurs it iterates over the referencing vector, accessing each element in the element vector via the indexes/pointers. The disadvantages of this technique are (a) much higher memory usage, particularly for small elements and (b) highly-variable latency during erasure due to reallocation in the referencing array. Since once of the goals of hive is predictable latency, this is likely not suitable.

Packed arrays are not worth mentioning as the iteration method is considered separate from the referencing mechanism, making them unsuitable for a std:: container.

3. Erased-element location recording mechanism

There are two valid approaches here; both involve per-memory-block free lists, utilizing the memory space of erased elements. The first approach forms a free list of all erased elements. The second forms a free list of the first element in each run of consecutive erased elements ("skipblocks", in terms of the terminology used in the jump-counting pattern papers). The second can be more efficient, but requires a doubly-linked free list rather than a singly-linked free list, at least with a low-complexity jump-counting skipfield - otherwise it would become an O(N) operation to update links in the skipfield, when a skipblock expands or contracts during erasure or insertion.

The reference implementation currently uses the second approach, using three things to keep track of erased element locations:

  1. Metadata for each memory block includes a 'next block with erasures' pointer. The container itself contains a 'blocks with erasures' list-head pointer. These are used by the container to create an intrusive doubly-linked list of memory blocks with erased elements which can be re-used for future insertions.
  2. Metadata for each memory block also includes a 'free list head' index number, which records the index (within the memory block) of the first element of the last-created skipblock - the 'head' skipblock.
  3. The memory space of the first erased element in each skipblock is reinterpret_cast'd via pointers as two index numbers, the first giving the index of the previous skipblock in that memory block, the second giving the index of the next skipblock in the sequence. In the case of the 'head' skipblock in the sequence, a unique number is used for the 'next' index. This forms a free list of runs of erased element memory locations which may be re-used.

Using indexes for next and previous links, instead of pointers, reduces the necessary bit-depth of the next and previous links, thereby reducing the necessary over-alignment of the container's element type. If a global (ie. all memory blocks) free list were used, pointers would be necessary, as hive is bidirectional and does not support the [ ] operator. This would potentially increase the necessary over-alignment of the element type to 128 bits for a doubly-linked free list. A global free list would also decrease cache locality when traversing the free list by jumping between memory blocks.

Previous versions of the reference implementation used a singly-linked free list of erased elements instead of a doubly-linked free list of skipblocks. This was possible with the High complexity jump-counting pattern, but not possible using the Low complexity jump-counting pattern as it cannot calculate a skipblock's start node location from a middle node's value like the High complexity pattern can. But using free-lists of skipblocks is a more efficient approach as it requires fewer free list nodes. In addition, re-using only the start or end nodes of a skipblock is faster because it never splits a single skipblock in two (which would require adding a new skipblock to the free list).

The main reason a doubly-linked free list is necessary is for when you erase an element which is in between two skipblocks. In that case two skipblocks must be combined into one skipblock, and the previous secondary skipblock must be removed from that block's free list of skipblocks. If the free list is singly-linked, the hive must do a linear search through the free list to find the skipblock prior to the secondary skipblock mentioned, in order to update that free list node's "next" index link. This is a non-O(1) operation. However if a doubly-linked free list is used, the prior skipblock is linked to from the secondary skipblock mentioned, making updating all free list links O(1). One could however revert to a singly-linked free list of skipblocks for very small value_type's, in order to reduce the overalignment necessary to store both free list nodes in the element memory space, and eat the cost of that reduction.

One cannot use a stack of pointers (or similar) to erased elements for this mechanism, as early versions of the reference implementation did, because this can create allocations during erasure, which changes the exception guarantees of erase(). One could instead scan all skipfields until an erased location was found, or simply have the first item in the list above and then scan the first available block, though both of these approaches would be slow.

In terms of the alternative boolean + jump-counting skipfield approach described in the erased-element-skip-method section above, one could store both the jump-counting data and free list data in any given erased element's memory space, provided of course that elements are aligned to be wide enough to fit both.

A final note: due to the strong variability in terms of time complexity requirements between the different ways of implementing these three core aspects of the container, where there likely to be significant variance based on implementation the time complexity of each implementation will not be included in the technical specification.

Implementation of iterator class

Any iterator implementation is going to be dependent on the erased-element-skipping mechanism used. The reference implementation's iterator stores a pointer to the current 'group' struct mentioned above, plus a pointer to the current element and a pointer to its corresponding skipfield node. An alternative approach is to store the group pointer + an index, since the index can indicate both the offset from the memory block for the element, as well as the offset from the start of the skipfield for the skipfield node. However multiple implementations and benchmarks across many processors have shown this to be worse-performing than the separate pointer-based approach, despite the increased memory cost for the iterator class itself.

++ operation is as follows, utilising the reference implementation's Low-complexity jump-counting pattern:

  1. Add 1 to the existing element and skipfield pointers.
  2. Dereference skipfield pointer to get value of skipfield node, add value of skipfield node to both the skipfield pointer and the element pointer. If the node is erased, its value will be a positive integer indicating the number of nodes until the next non-erased node, if not erased it will be zero.
  3. If element pointer is now beyond end of element memory block, change group pointer to next group, element pointer to the start of the next group's element memory block, skipfield pointer to the start of the next group's skipfield. In case there is a skipblock at the beginning of this memory block, dereference skipfield pointer to get value of skipfield node and add value of skipfield node to both the skipfield pointer and the element pointer. There is no need to repeat the check for end of block, as the block would have been removed from the iteration sequence if it were empty of elements.

-- operation is the same except both step 1 and 2 involve subtraction rather than adding, and step 3 checks to see if the element pointer is now before the beginning of the memory block. If so it traverses to the back of the previous group, and subtracts the value of the back skipfield node from the element pointer and skipfield pointer.

Iterators are bidirectional but also provide constant time complexity >, <, >=, <= and <=> operators for convenience (eg. in for loops when skipping over multiple elements per loop and there is a possibility of going past a pre-determined end element). This is achieved by keeping a record of the order of memory blocks. In the reference implementation this is done by assigning a number to each memory block in its metadata. In an implementation using a vector of pointers to memory blocks instead of a linked list, one could use the position of the pointers within the vector to determine this. Comparing relative order of the two iterators' memory blocks via this number, then comparing the memory locations of the elements themselves, if they happen to be in the same memory block, is enough to implement all greater/lesser comparisons.

Additional notes on specific functions

Results of implementation

In practical application the reference implementation is generally faster for insertion and (non-back) erasure than current standard library containers, and generally faster for iteration than any container except vector and deque. For full details, see benchmarks.

VI. Technical Specification

Suggested location of hive in the standard is Sequence Containers.

Header <version> synopsis [version.syn]

#define __cpp_lib_hive <editor supplied value> // also in <hive>

General [containers.general]

Containers library summary [tab:containers.summary]

SubclauseHeader
Requirements
Sequence containers<array>, <deque>, <forward_list>, <list>, <vector>, <hive>
Associative containers<map>, <set>
Unordered associative containers<unordered_map>, <unordered_set>
Container adaptors<queue>, <stack>
Views<span>

Sequence containers [sequence.reqmts]

  1. A sequence container organizes a finite set of objects, all of the same type, into a strictly linear arrangement. The library provides four basic kinds of sequence containers: vector, forward_list, list, and deque. In addition, array is provided as a sequence container which provides limited sequence operations because it has a fixed number of elements and hive are provided as sequence containers which provide limited sequence operations, in array's case because it has a fixed number of elements, and in hive's case because insertion order is unspecified. The library also provides container adaptors that make it easy to construct abstract data types, such as stacks or queues, out of the basic sequence container kinds (or out of other kinds of sequence containers that the user defines).
  2. [Note 1: The sequence containers offer the programmer different complexity trade-offs. vector is appropriate in most circumstances. array has a fixed size known during translation. list or forward_list support frequent insertions and deletions from the middle of the sequence. deque supports efficient insertions and deletions taking place at the beginning or at the end of the sequence. When choosing a container, remember vector is best; leave a comment to explain if you choose from the rest! -end note]

Despite the cute poem, section 2 is removed because the note is inaccurate, misleading and over-simplified. See Appendix J for a more complete basic guide to container selection. Such a guide is too large to fit in the standard and will be subject to changes in architecture in terms of usefulness. The note above conflates time complexity and performance, and oddly recommends vector as a default without mentioning its time complexity, unlike the other containers. Time complexity's relevance to both performance and latency is situational and subject to architectural differences. If the committee chose to re-include such a note, it should simply describe time complexity, not make recommendations.

Header <hive> synopsis [hive.syn]

#include <initializer_list> // see [initializer.list.syn]
#include <compare> // see [compare.syn]

namespace std {
  // class template hive

  struct hive_limits;

  template <class T, class Allocator = allocator<T>> class hive;

  template<class T, class Allocator>
    void swap(hive<T, Allocator>& x, hive<T, Allocator>& y)
      noexcept(noexcept(x.swap(y)));

  template<class T, class Allocator, class U>
    typename hive<T, Allocator>::size_type
      erase(hive<T, Allocator>& c, const U& value);

  template<class T, class Allocator, class Predicate>
    typename hive<T, Allocator>::size_type
      erase_if(hive<T, Allocator>& c, Predicate pred);

  namespace pmr {
    template <class T>
      using hive = std::hive<T, polymorphic_allocator<T>>;
  }
}

22.3.14 Class template hive [hive]

22.3.14.1 Class template hive overview [hive.overview]

  1. A hive is a sequence container that allows constant-time insert and erase operations. Insertion position is not specified, but will in most implementations typically be the back of the container when no erasures have occurred, or when erasures have occurred it will re-use existing erased element memory locations. Storage management is handled automatically and is specifically organized in multiple blocks of sequential elements.
  2. Erasures use unspecified techniques to mark erased elements as skippable, as opposed to relocating subsequent elements during erasure as is expected in a vector or deque. These elements are subsequently skipped during iteration. If a memory block becomes empty of unskipped elements as the result of an erasure, that memory block is removed from the iterative sequence. The same, or different unspecified techniques may be used to record the locations of erased elements, such that those locations may be reused later during insertions.
  3. Operations pertaining to the updating of any data associated with the erased-element skipping mechanism or erased-element location-recording mechanism are not factored into individual function time complexity. The time complexity of these unspecified techniques is implementation-defined and may be constant, linear or otherwise defined.
  4. Memory block element capacities have an unspecified growth factor greater than 1, for example a new block's capacity could be equal to the summed capacities of the existing blocks.
  5. Limits can be placed on the minimum and maximum element capacities of memory blocks, both by a user and by an implementation. In neither case shall minimum capacity be greater than maximum capacity. When limits are not specified by a user, the implementation's default limits are used. The default limits of an implementation are not guaranteed to be the same as the minimum and maximum possible values for an implementation's limits. The latter are defined as hard limits. If user-specified limits are supplied to a function which are not within an implementation's hard limits, or if the user-specified minimum is larger than the user-specified maximum capacity, behaviour is undefined.
  6. Memory blocks can be removed from the iterative sequence [Example: by erase or clear - end example] without being deallocated. Other memory blocks can be allocated without becoming part of the iterative sequence [Example: by reserve - end example]. These are both referred to as reserved blocks. Blocks which form part of the iterative sequence of the container are referred to as active blocks.
  7. A hive conforms to the requirements for Containers ([container.reqmts]), with the exception of operators ==, != and <=>. A hive also meets the requirements of a reversible container ([container.rev.reqmts]), of an allocator-aware container ([container.alloc.reqmts]), and some of the requirements of a sequence container, including several of the optional sequence container requirements ([sequence.reqmts]). Descriptions are provided here only for operations on hive that are not described in that table or for operations where there is additional semantic information.
  8. Hive iterators meet the Cpp17BidirectionalIterator requirements but also provide relational operators <, <=, >, >= and <=> which compare the relative ordering of two iterators in the sequence of a hive instance.
namespace std {

struct hive_limits
{
  size_t min;
  size_t max;
  constexpr hive_limits(size_t minimum, size_t maximum) noexcept : min(minimum), max(maximum) {}
};



template <class T, class Allocator = allocator<T>>
class hive {
private:
  hive_limits current-limits = implementation-defined; // exposition only

public:

  // types
  using value_type = T;
  using allocator_type = Allocator;
  using pointer = typename allocator_traits<Allocator>::pointer;
  using const_pointer = typename allocator_traits<Allocator>::const_pointer;
  using reference = value_type&;
  using const_reference = const value_type&;
  using size_type = implementation-defined; // see [container.requirements]
  using difference_type = implementation-defined; // see [container.requirements]
  using iterator = implementation-defined; // see [container.requirements]
  using const_iterator = implementation-defined; // see [container.requirements]
  using reverse_iterator = std::reverse_iterator<iterator>; // see [container.requirements]
  using const_reverse_iterator = std::reverse_iterator<const_iterator>; // see [container.requirements]



  constexpr hive() noexcept(noexcept(Allocator())) : hive(Allocator()) { }
  explicit hive(const Allocator&) noexcept;
  explicit hive(hive_limits block_limits) noexcept(noexcept(Allocator())) : hive(block_limits, Allocator()) { }
  hive(hive_limits block_limits, const Allocator&) noexcept;
  explicit hive(size_type n, const Allocator& = Allocator());
  hive(size_type n, hive_limits block_limits, const Allocator& = Allocator());
  hive(size_type n, const T& value, const Allocator& = Allocator());
  hive(size_type n, const T& value, hive_limits block_limits, const Allocator& = Allocator());
  template<class InputIterator>
    hive(InputIterator first, InputIterator last, const Allocator& = Allocator());
  template<class InputIterator>
    hive(InputIterator first, InputIterator last, hive_limits block_limits, const Allocator& = Allocator());
  template<container-compatible-range<T> R>
    hive(from_range_t, R&& rg, const Allocator& = Allocator());
  template<container-compatible-range<T> R>
    hive(from_range_t, R&& rg, hive_limits block_limits, const Allocator& = Allocator());
  hive(const hive& x);
  hive(hive&&) noexcept;
  hive(const hive&, const type_identity_t<Allocator>&);
  hive(hive&&, const type_identity_t<Allocator>&);
  hive(initializer_list<T> il, const Allocator& = Allocator());
  hive(initializer_list<T> il, hive_limits block_limits, const Allocator& = Allocator());
  ~hive();
  hive& operator=(const hive& x);
  hive& operator=(hive&& x) noexcept(allocator_traits<Allocator>::propagate_on_container_move_assignment::value || allocator_traits<Allocator>::is_always_equal::value);
  hive& operator=(initializer_list<T>);
  template<class InputIterator>
    void assign(InputIterator first, InputIterator last);
  template<container-compatible-range <T> R>
    void assign_range(R&& rg);
  void assign(size_type n, const T& t);
  void assign(initializer_list<T>);
  allocator_type get_allocator() const noexcept;



  // iterators
  iterator               begin() noexcept;
  const_iterator         begin() const noexcept;
  iterator               end() noexcept;
  const_iterator         end() const noexcept;
  reverse_iterator       rbegin() noexcept;
  const_reverse_iterator rbegin() const noexcept;
  reverse_iterator       rend() noexcept;
  const_reverse_iterator rend() const noexcept;

  const_iterator         cbegin() const noexcept;
  const_iterator         cend() const noexcept;
  const_reverse_iterator crbegin() const noexcept;
  const_reverse_iterator crend() const noexcept;


  // capacity
  [[nodiscard]] bool empty() const noexcept;
  size_type size() const noexcept;
  size_type max_size() const noexcept;
  size_type capacity() const noexcept;
  void reserve(size_type n);
  void shrink_to_fit();
  void trim_capacity() noexcept;
  void trim_capacity(size_type n) noexcept;


  // modifiers
  template <class... Args> iterator emplace(Args&&... args);
  iterator insert(const T& x);
  iterator insert(T&& x);
  void insert(size_type n, const T& x);
  template<class InputIterator>
    void insert(InputIterator first, InputIterator last);
  template<container-compatible-range <T> R>
    void insert_range(R&& rg);
  void insert(initializer_list<T> il);
  iterator erase(const_iterator position);
  iterator erase(const_iterator first, const_iterator last);
  void swap(hive&) noexcept(allocator_traits<Allocator>::propagate_on_container_swap::value || allocator_traits<Allocator>::is_always_equal::value);
  void clear() noexcept;


  // hive operations
  void splice(hive& x);
  void splice(hive&& x);
  size_type unique();
  template<class BinaryPredicate>
    size_type unique(BinaryPredicate binary_pred);

  hive_limits block_capacity_limits() const noexcept;
  static constexpr hive_limits block_capacity_hard_limits();
  void reshape(hive_limits block_limits);

  iterator get_iterator(const_pointer p) noexcept;
  const_iterator get_iterator(const_pointer p) const noexcept;
  bool is_active(const_iterator it) const noexcept;

  void sort();
  template <class Compare> void sort(Compare comp);
}


template<class InputIterator, class Allocator = allocator<iter-value-type <InputIterator>>
  hive(InputIterator, InputIterator, Allocator = Allocator())
    -> hive<iter-value-type <InputIterator>, Allocator>;

template<class InputIterator, class Allocator = allocator<iter-value-type <InputIterator>>
  hive(InputIterator, InputIterator, hive_limits block_limits, Allocator = Allocator())
    -> hive<iter-value-type <InputIterator>, block_limits, Allocator>;

template<ranges::input_range R, class Allocator = allocator<ranges::range_value_t<R>>>
  hive(from_range_t, R&&, Allocator = Allocator())
    -> hive<ranges::range_value_t<R>, Allocator>;

template<ranges::input_range R, class Allocator = allocator<ranges::range_value_t<R>>>
  hive(from_range_t, R&&, hive_limits block_limits, Allocator = Allocator())
    -> hive<ranges::range_value_t<R>, block_limits, Allocator>;
}

An incomplete type T may be used when instantiating hive if the allocator meets the allocator completeness requirements ([allocator.requirements.completeness]). T shall be complete before any member of the resulting specialization of hive is referenced.

hive constructors, copy, and assignment [hive.cons]

explicit hive(const Allocator&) noexcept;
  1. Effects: Constructs an empty hive, using the specified allocator.
  2. Complexity: Constant.

hive(hive_limits block_limits, const Allocator&) noexcept;
  1. Effects: Constructs an empty hive, with the specified Allocator. Initializes current-limits with block_limits.
  2. Complexity: Constant.

explicit hive(size_type n, const Allocator& = Allocator());
hive(size_type n, hive_limits block_limits, const Allocator& = Allocator());
  1. Preconditions: T is Cpp17DefaultInsertable into hive.
  2. Effects: Constructs a hive with n default-inserted elements, using the specified allocator. If the second overload is called, also initializes current-limits with block_limits.
  3. Complexity: Linear in n. Creates at most (n / current-limits.max) + 1 element block allocations.

hive(size_type n, const T& value, const Allocator& = Allocator());
hive(size_type n, const T& value, hive_limits block_limits, const Allocator& = Allocator());
  1. Preconditions: T is Cpp17CopyInsertable into hive.
  2. Effects: Constructs a hive with n copies of value, using the specified allocator. If the second overload is called, also initializes current-limits with block_limits.
  3. Complexity: Linear in n. Creates at most (n / current-limits.max) + 1 element block allocations.


template<container-compatible-range<T> R>
  hive(from_range_t, R&& rg, const Allocator& = Allocator());
template<container-compatible-range<T> R>
  hive(from_range_t, R&& rg, hive_limits block_limits, const Allocator& = Allocator());
  1. Preconditions: T is Cpp17EmplaceConstructible into hive from *ranges::begin(rg).
  2. Effects: Constructs a hive object with the elements of the range rg. If the second overload is called, also initializes current-limits with block_limits.
  3. Complexity: Linear in ranges::distance(rg). Creates at most (ranges::distance(rg) / current-limits.max) + 1 element block allocations.

hive(initializer_list<T> il, const Allocator& = Allocator());
hive(initializer_list<T> il, hive_limits block_limits, const Allocator& = Allocator());
  1. Preconditions: T is Cpp17CopyInsertable into hive.
  2. Effects: Constructs a hive object with the elements of il. If the second overload is called, also initializes current-limits with block_limits.
  3. Complexity: Linear in il.size(). Creates at most (il.size() / current-limits.max) + 1 element block allocations.

hive capacity [hive.capacity]

size_type capacity() const noexcept;
  1. Returns: The total number of elements that the hive can hold without requiring allocation of more element memory blocks.
  2. Complexity: Constant time.

void reserve(size_type n);
  1. Effects: A directive that informs a hive of a planned change in size, so that it can manage the storage allocation accordingly. Does not cause reallocation of elements. Iterators to elements in *this remain valid. If n <= capacity() there are no effects.
  2. Complexity: It does not change the size of the sequence and creates at most (n / block_capacity_limits().max) + 1 element block allocations.
  3. Throws: length_error if n > max_size()numbered_note.
  4. Postconditions: capacity() >= n is true.

numbered_note) uses Allocator::allocate() which may throw an appropriate exception.


void shrink_to_fit();
  1. Preconditions: T is Cpp17MoveInsertable into hive.
  2. Effects: shrink_to_fit is a non-binding request to reduce capacity() to be closer to size().
    [ Note: The request is non-binding to allow latitude for implementation-specific optimizations. - end note ]
    It does not increase capacity(), but may reduce capacity(). It may reallocate elements.
  3. Complexity: If reallocation happens, linear in the size of the sequence.
  4. Remarks: Reallocation invalidates all the references, pointers, and iterators referring to the elements in the sequence as well as the past-the-end iterator.
    [Note: If no reallocation happens, they remain valid. - end note] [Note: This operation may change the iterative order of the elements in *this. - end note]

void trim_capacity() noexcept;
void trim_capacity(size_type n) noexcept;
  1. Effects: Removes and deallocates reserved blocks created by prior calls to reserve(), clear() or erase(). If such blocks are present, for the first overload capacity() is reduced. For the second overload capacity() will be reduced to no less than n, may not be reduced at all.
  2. Complexity: Linear in the number of reserved blocks deallocated.
  3. Remarks: Does not reallocate elements and no iterators or references to elements in *this are invalidated.

hive modifiers [hive.modifiers]


template <class... Args>
  iterator emplace(Args&&... args);
iterator insert(const T& x);
iterator insert(T&& x);
void insert(size_type n, const T& x);
template<class InputIterator>
  void insert(InputIterator first, InputIterator last);
template<container-compatible-range <T> R>
  void insert_range(R&& rg);
void insert(initializer_list<T> il);
  1. Complexity: For emplace, and for insert overloads 1 and 2, takes constant amortized time and exactly one call to a constructor of T.
    For insert overload 3, equivalent to a.insert(x) for n elements.
    For insert overload 4, equivalent to a.insert(x) for each element in [first,last).
    For insert overload 5, equivalent to a.insert(x) for each element in range rg.
    For insert overload 6, equivalent to a.insert(x) for il.size() elements.
  2. Remarks: For emplace and for insert overloads 1 and 2, if an exception is thrown there are no effects on *this.
    For all other insert overloads, if an exception is thrown for a given individual element insertion, all prior insertions during that function call remain valid.
    For all functions, invalidates the past-the-end iterator.

iterator erase(const_iterator position);
  1. Effects: Invalidates iterators and references to the erased element. An erase operation that erases the last element of a hive also invalidates the past-the-end iterator. If the active block which position is located within becomes empty of non-erased elements as a result of the function call, that active block is either deallocated or transformed into a reserved block.
  2. Complexity: Constant if the active block which position is located within does not become empty as a result of this function call. If it does become empty, at worst linear in the number of subsequent blocks in the iterative sequence.

iterator erase(const_iterator first, const_iterator last);
  1. Effects: Invalidates only the iterators and references to the erased elements. An erase operation that erases the last element in *this also invalidates the past-the-end iterator. If any active blocks within the range [first, last) become empty of elements as a result of the function call, those active blocks are either deallocated or transformed into reserved blocks.
  2. Complexity: Linear in the number of elements erased. In addition, if any active blocks within the range [first, last) become empty of elements as a result of the function call, at worst linear in the number of subsequent blocks in the iterative sequence.

void swap(hive& x) noexcept(allocator_traits<Allocator>::propagate_on_container_swap::value || allocator_traits<Allocator>::is_always_equal::value);
  1. Effects: Exchanges the contents and capacity() of *this with that of x.
  2. Complexity: Constant.

void clear() noexcept;
  1. Effects: Destroys all elements in *this. Invalidates all references, pointers, and iterators referring to the elements of *this and may invalidate the past-the-end iterator. Transforms active blocks into reserved blocks.
  2. Postconditions: empty() returns true.
  3. Complexity: Linear in the size of the sequence for non-trivially-destructible types. Additionally, at worst linear in the number of active element blocks.

Operations [hive.operations]

  1. In this subclause, arguments for a template parameter named Predicate or BinaryPredicate shall meet the corresponding requirements in [algorithms.requirements]. The semantics of i + n and i - n, where i is an iterator into the hive and n is an integer, are the same as those of next(i, n) and prev(i, n), respectively. For sort, the definitions and requirements in [alg.sorting] apply.
  2. hive provides a splice operation that destructively moves all elements from one hive to another. The behavior of splice operations is undefined if get_allocator() != x.get_allocator().
void splice(hive &x);
void splice(hive &&x);
  1. Preconditions: addressof(x) != this is true.
  2. Effects: Inserts the contents of x into *this and x becomes empty. Pointers and references to the moved elements of x now refer to those same elements but as members of *this. Iterators referring to the moved elements shall continue to refer to their elements, but they now behave as iterators into *this, not into x.
  3. Complexity: At worst, linear in the number of active and reserved blocks in x + the number of active and reserved blocks in *this.
  4. Throws: length_error if any of x's element memory block capacities are outside of the current minimum and maximum element memory block capacity limits of *this.
  5. Remarks: The behavior of splice operations is undefined if get_allocator() != x.get_allocator(). Reserved blocks in x are not transferred into *this.

size_type unique();
template<class BinaryPredicate>
  size_type unique(BinaryPredicate binary_pred);
  1. Let binary_pred be equal_to<>{} for the first overload.
  2. Preconditions: binary_pred is an equivalence relation.
  3. Effects: Erases all but the first element from every consecutive group of equivalent elements. That is, for a nonempty hive, erases all elements referred to by the iterator i in the range [begin() + 1, end()) for which binary_pred(*i, *(i - 1)) is true. Invalidates only the iterators and references to the erased elements.
  4. Returns: The number of elements erased.
  5. Throws: Nothing unless an exception is thrown by the predicate.
  6. Complexity: If empty() is false, exactly size() - 1 applications of the corresponding predicate, otherwise no applications of the predicate.

hive_limits block_capacity_limits() const noexcept;
  1. Effects: Returns current-limits.
  2. Complexity: Constant.

static constexpr hive_limits block_capacity_hard_limits();
  1. Returns: A hive_limits struct with the min and max members set to the implementation's hard limits.
  2. Complexity: Constant.

void reshape(hive_limits block_limits);
  1. Preconditions: T shall be Cpp17MoveInsertable into hive.
  2. Effects: Sets minimum and maximum element memory block capacities to the min and max members of the supplied hive_limits struct. If the hive is not empty, adjusts existing memory block capacities to conform to the new minimum and maximum block capacities, where necessary. If existing memory block capacities are within the supplied minimum/maximum range, no reallocation of elements takes place. If they are not within the supplied range, elements are reallocated to new or existing memory blocks which fit within the supplied range, and the old memory blocks are deallocated. If elements are reallocated, all iterators and references to reallocated elements are invalidated.
  3. Complexity: At worst linear in the number of active and reserved blocks in *this. If reallocation occurs, also linear in the number of elements reallocated.
  4. Throws: If reallocation occurs, uses Allocator::allocate() which may throw an appropriate exception.
    [Note: This operation may change the iterative order of the elements in *this. - end note]

iterator get_iterator(const_pointer p) const noexcept;
const_iterator get_iterator(const_pointer p) const noexcept;
  1. Complexity: At worst linear in the number of active blocks in *this.
  2. Returns: An iterator or const_iterator pointing to the same element as p. If p does not point to an element in *this, the past-the-end iterator is returned.
  3. Remarks: If p points to an erased element, behaviour is undefined.

bool is_active(const_iterator it) const noexcept;
  1. Complexity: At worst linear in the number of active blocks in *this.
  2. Returns: true if and only if it points to an element in *this.
    [Note: Returns false if the past-the-end iterator is supplied. - end note]

void sort();
template <class Compare>
  void sort(Compare comp);
  1. Preconditions: T shall be Cpp17MoveInsertable and Cpp17MoveAssignable into hive. lvalues of type T are swappable.
  2. Effects: Sorts the hive according to the operator < or a Compare function object. If an exception is thrown, the order of the elements in *this is unspecified. Iterators and references to elements may be invalidated.
  3. Complexity: N log N comparisons, where N == size().
  4. Throws: bad_alloc if it fails to allocate any memory necessary for the sort process. comp may also throw.
  5. Remarks: May allocatenumbered_note.
    [Note: Not required to be stable ([algorithm.stable]) - end note]

numbered_note) uses Allocator::allocate() which may throw an appropriate exception.

Erasure [hive.erasure]


template<class T, class Allocator, class U>
  typename hive<T, Allocator>::size_type
    erase(hive<T, Allocator>& c, const U& value);
  1. Effects: Equivalent to:
    return erase_if(c, [&](auto& elem) { return elem == value; });

template<class T, class Allocator, class Predicate>
  typename hive<T, Allocator>::size_type
    erase_if(hive<T, Allocator>& c, Predicate pred);
  1. Effects: Equivalent to:
    
    auto original_size = c.size();
    for (auto i = c.begin(), last = c.end(); i != last; ) {
      if (pred(*i)) {
        i = c.erase(i);
      } else {
        ++i;
      }
    }
    return original_size - c.size();
    

VII. Acknowledgments

Matt would like to thank: Glen Fernandes and Ion Gaztanaga for restructuring advice, Robert Ramey for documentation advice, various Boost and SG14 members for support, critiques and corrections, Baptiste Wicht for teaching me how to construct decent benchmarks, Jonathan Wakely, Sean Middleditch, Jens Maurer (very nearly a co-author at this point really), Patrice Roy and Guy Davidson for standards-compliance advice and critiques, support, representation at meetings and bug reports, Henry Miller for getting me to clarify why the intrusive list/free list approach to memory location reuse is the most appropriate, Ville Voutilainen and Gašper Ažman for help with the colony/hive rename paper, Ben Craig for his critique of the tech spec, that ex-Lionhead guy for annoying me enough to force me to implement the original skipfield pattern, Arthur O'Dwyer for his bug-testing of the hive implementation, Jon Blow for some initial advice and Mike Acton for some influence, the community at large for giving me feedback and bug reports on the reference implementation.
Also Nico Josuttis for doing such a great job in terms of explaining the general format of the structure to the committee.

VIII. Appendices

Appendix A - Basic usage examples

Using reference implementation.

#include <iostream>
#include <numeric>
#include "plf_hive.h"

int main(int argc, char **argv)
{
  plf::hive<int> i_hive;

  // Insert 100 ints:
  for (int i = 0; i != 100; ++i)
  {
    i_hive.insert(i);
  }

  // Erase half of them:
  for (plf::hive<int>::iterator it = i_hive.begin(); it != i_hive.end(); ++it)
  {
    it = i_hive.erase(it);
  }

  std::cout << "Total: " << std::accumulate(i_hive.begin(), i_hive.end(), 0) << std::endl;
  std::cin.get();
  return 0;
} 

Example demonstrating pointer stability

#include <iostream>
#include "plf_hive.h"

int main(int argc, char **argv)
{
  plf::hive<int> i_hive;
  plf::hive<int>::iterator it;
  plf::hive<int *> p_hive;
  plf::hive<int *>::iterator p_it;

  // Insert 100 ints to i_hive and pointers to those ints to p_hive:
  for (int i = 0; i != 100; ++i)
  {
    it = i_hive.insert(i);
    p_hive.insert(&(*it));
  }

  // Erase half of the ints:
  for (it = i_hive.begin(); it != i_hive.end(); ++it)
  {
    it = i_hive.erase(it);
  }

  // Erase half of the int pointers:
  for (p_it = p_hive.begin(); p_it != p_hive.end(); ++p_it)
  {
    p_it = p_hive.erase(p_it);
  }

  // Total the remaining ints via the pointer hive (pointers will still be valid even after insertions and erasures):
  int total = 0;

  for (p_it = p_hive.begin(); p_it != p_hive.end(); ++p_it)
  {
    total += *(*p_it);
  }

  std::cout << "Total: " << total << std::endl;

  if (total == 2500)
  {
    std::cout << "Pointers still valid!" << std::endl;
  }

  std::cin.get();
  return 0;
} 

Appendix B - Reference implementation benchmarks

Benchmark results for the colony (hive) reference implementation under GCC on an Intel Xeon E3-1241 (Haswell) are here.

Old benchmark results for an earlier version of colony under MSVC 2015 update 3, on an Intel Xeon E3-1241 (Haswell) are here. There is no commentary for the MSVC results.

Appendix C - Frequently Asked Questions

  1. Where is it worth using a hive in place of other std:: containers?

    See the final appendix for a more intensive answer to this question, however for a brief overview, as mentioned, it is worthwhile for performance reasons in situations where the order of container elements is not important and:

    1. Insertion order is unimportant
    2. Insertions and erasures to the container occur frequently in performance-critical code, and
    3. Links to non-erased container elements may not be invalidated by insertion or erasure.

    Under these circumstances a hive will generally out-perform other std:: containers. In addition, because it never invalidates pointer references to container elements (except when the element being pointed to has been previously erased) it may make many programming tasks involving inter-relating structures in an object-oriented or modular environment much faster, and could be considered in those circumstances.

  2. What are some examples of situations where a hive might improve performance?

    Some ideal situations to use a hive: cellular/atomic simulation, persistent octtrees/quadtrees, game entities or destructible-objects in a video game, particle physics, anywhere where objects are being created and destroyed continuously. Also, anywhere where a vector of pointers to dynamically-allocated objects or a std::list would typically end up being used in order to preserve pointer stability but where order is unimportant.

  3. Is it similar to a deque?

    A deque is reasonably dissimilar to a hive - being a double-ended queue, it requires a different internal framework. In addition, being a random-access container, having a growth factor for memory blocks in a deque is problematic (though not impossible). A deque and hive have no comparable performance characteristics except for insertion (assuming a good deque implementation). Deque erasure performance varies wildly depending on the implementation, but is generally similar to vector erasure performance. A deque invalidates pointers to subsequent container elements when erasing elements, which a hive does not, and guarantees ordered insertion.

  4. What are the thread-safe guarantees?

    Unlike a std::vector, a hive can be read from and inserted into at the same time (assuming different locations for read and write), however it cannot be iterated over and written to at the same time. If we look at a (non-concurrent implementation of) std::vector's thread-safe matrix to see which basic operations can occur at the same time, it reads as follows (please note push_back() is the same as insertion in this regard):

    std::vector Insertion Erasure Iteration Read
    Insertion No No No No
    Erasure No No No No
    Iteration No No Yes Yes
    Read No No Yes Yes

    In other words, multiple reads and iterations over iterators can happen simultaneously, but the potential reallocation and pointer/iterator invalidation caused by insertion/push_back and erasure means those operations cannot occur at the same time as anything else.

    Hive on the other hand does not invalidate pointers/iterators to non-erased elements during insertion and erasure, resulting in the following matrix:

    hive Insertion Erasure Iteration Read
    Insertion No No No Yes
    Erasure No No No Mostly*
    Iteration No No Yes Yes
    Read Yes Mostly* Yes Yes

    * Erasures will not invalidate iterators unless the iterator points to the erased element.

    In other words, reads may occur at the same time as insertions and erasures (provided that the element being erased is not the element being read), multiple reads and iterations may occur at the same time, but iterations may not occur at the same time as an erasure or insertion, as either of these may change the state of the skipfield which is being iterated over, if a skipfield is used in the implementation. Note that iterators pointing to end() may be invalidated by insertion.

    So, hive could be considered more inherently thread-safe than a (non-concurrent implementation of) std::vector, but still has some areas which would require mutexes or atomics to navigate in a multithreaded environment.

  5. Any pitfalls to watch out for?

    Because erased-element memory locations may be reused by insert() and emplace(), insertion position is essentially random unless no erasures have been made, or an equal number of erasures and insertions have been made.

  6. What is hive's Abstract Data Type (ADT)?

    Though I am happy to be proven wrong I suspect hives/colonies/bucket arrays are their own abstract data type. Some have suggested its ADT is of type bag, I would somewhat dispute this as it does not have typical bag functionality such as searching based on value (you can use std::find but it's o(n)) and adding this functionality would slow down other performance characteristics. Multisets/bags are also not sortable (by means other than automatically by key value). hive does not utilize key values, is sortable, and does not provide the sort of functionality frequently associated with a bag (e.g. counting the number of times a specific value occurs).

  7. Why must blocks be removed from the iterative sequence when empty?

    Two reasons:

    1. Standards compliance: if blocks aren't removed then ++ and -- iterator operations become undefined in terms of time complexity, making them non-compliant with the C++ standard. At the moment they are O(1) amortized, in the reference implementation this constitutes typically one update for both skipfield and element pointers, but two if a skipfield jump takes the iterator beyond the bounds of the current block and into the next block. But if empty blocks are allowed, there could be anywhere between 1 and block_capacity_limits().max empty blocks between the current element and the next. Essentially you get the same scenario as you do when iterating over a boolean skipfield. It would be possible to move these to the back of the hive as trailing blocks, or house them in a separate list or vector for future usage, but this may create performance issues if any of the blocks are not at their maximum size (see below).
    2. Performance: iterating over empty blocks is slower than them not being present, of course - but also if you have to allow for empty blocks while iterating, then you have to include a while loop in every iteration operation, which increases cache misses and code size. The strategy of removing blocks when they become empty also statistically removes (assuming randomized erasure patterns) smaller blocks from the hive before larger blocks, which has a net result of improving iteration, because with a larger block, more iterations within the block can occur before the end-of-block condition is reached and a jump to the next block (and subsequent cache miss) occurs. Lastly, pushing to the back of a hive, provided there is still space and no new block needs to be allocated, will be faster than recycling memory locations as each subsequent insertion occurs in a subsequent memory location (which is cache-friendlier) and also less computational work is necessary. If a block is removed from the iterative sequence its recyclable memory locations are also not usable, hence subsequent insertions are more likely to be pushed to the back of the hive.
  8. Why not reserve all empty memory blocks for future use during erasure, or None, rather than leaving this decision undefined by the specification?

    In my view the default scenario, for reasons of predictability and memory use, should be to free the memory block in most cases. But future implementations may find better strategies, somehow, and it is best not to overly constraint potential implementation. For the reasons described in the design decisions section on erase(), retaining the back block at least has performance and latency benefits, in the current implementation. Therefore retaining no memory blocks is non-optimal in cases where the user is not using a custom allocator. Meanwhile, retaining All memory blocks is bad for performance as many small memory blocks will be retained, which decreases iterative performance due to lower cache locality. However, one perspective is that if a scenario calls for retaining All memory blocks, this should be left to an allocator to manage. This is an open topic for discussion.

  9. Why is there no default constructor for hive_limits?

    The user must obtain the block capacity hard limits of the implementation (via block_capacity_hard_limits()) prior to supplying their own limits as part of a constructor or reshape(), so that they do not trigger undefined behaviour by supplying limits which are outside of the hard limits. Hence it was not perceived by LEWG that there would be a reason for a hive_limits struct to ever be used with non-user-supplied values eg. zero.

  10. Memory block capacities - what are they based on, how do they expand?

    There are 'hard' capacity limits, 'default' capacity limits and user-defined capacity limits. Default limits (what a hive is instantiated with if user-defined capacity limits are not supplied) and user-defined limits are not allowed to go outside of an implementation's hard limits. Newly-allocated blocks also have a non-1 implementation-defined growth factor.

    While implementations are free to chose their own limits and strategies here, in the reference implementation memory block sizes start from either the dynamically-defined default minimum size (8 elements, larger if the type stored is small) or an amount defined by the end user (with a minimum of 3 elements, as there is enough metadata per-block that less than 3 elements is generally a waste of memory unless the element type is extremely large).

    Subsequent block sizes then increase the total capacity of the hive by a factor of 2 (so, 1st block 8 elements, 2nd 8 elements, 3rd 16 elements, 4th 32 elements etcetera) until the current maximum block size is reached. The default maximum block size in the reference implementation is 255 (if the type sizeof is < 10 bytes) or 8192, based on multiple benchmark comparisons between different maximum block capacities, with different sized types. For larger-than-10-byte types the skipfield bitdepth is (at least) 16 so the maximum capacity 'hard' limit would be 65535 elements in that context, for < 10-byte types the skipfield bitdepth is (at least) 8, making the maximum capacity hard limit 255.

  11. What are user-defined memory block minimum and maximum capacities good for?

    1. Minimum block capacity limits allow the user to prevent the hive from creating too-small blocks in the case where they know the minimum number of elements they will have.
    2. The combination of minimum and maximum block capacity limits allow the user to ensure that memory blocks entirely fit within a given processor's cache-lines or memory pathway sizes.
    3. The limits also allow the user to optimize for specific scenarios where they know their erasure/insertion patterns and can figure out how much memory will be wasted or performance lost for a given block capacity and element type. If a block size is large and many erasures have occurred but the block is not completely empty of elements, in addition to having a lot of wasted unused memory, iterative performance will suffer due to the large memory gaps between any two non-erased elements and subsequent drops in data locality and cache performance. In this scenario the user would desire to experiment with benchmarking and limiting the minimum/maximum capacities of the blocks, such that memory blocks are freed earlier given the user's specific erasure patterns, and performance is improved.
  12. Why are hive_limits specified in constructors and not relegated to a secondary function?

    1. They have always been required in range/fill constructors for the obvious reason that otherwise the user must construct, call reshape and then call range-assign/insert. This is obviously slower and more cumbersome for use.
    2. They were originally not in the default constructors due to creating ambiguity with the fill constructors, but users have asked for this since 2016. One reason for this is consistency. Another is usage with non-movable/copyable types, which cannot be used with reshape(). The guarantees of reshape have to be specified in a concrete way, so it must be able to reallocate elements when the existing blocks do not fit within the user-supplied range, and throw when it cannot do so (either due to lack of memory or some other problem). It cannot be respecified for non-movable/copyable types. The non-noexcept status of this function also caused problems for some.
    3. In 2020 the issue was discussed in SG14 and Jens Maurer suggested using an external struct to make the calls unambiguous (link here), and this has been the ongoing solution. This meets the needs of those using non-movable/copyable types and is unambiguous in terms of specification.
    4. Lastly, block capacity limits are a first-order feature of the container, and something which users have repeatedly thanked me for. They are needed & will not be removed. As a side-note, it is of annoyance to many developers that similar functionality was never specified for deque, as this has led to all of the major deque implementations being unusable for large sizeof types.
  13. Can a hive be used with SIMD instructions?

    No and yes. Yes if you're careful, no if you're not.
    On platforms which support scatter and gather operations via hardware (e.g. AVX512) you can use hive with SIMD as much as you want, using gather to load elements from disparate or sequential locations, directly into a SIMD register, in parallel. Then use scatter to push the post-SIMD-process values elsewhere after. On platforms which do not support this in hardware, you would need to manually implement a scalar gather-and-scatter operation which may be significantly slower.

    In situations where gather and scatter operations are too expensive, which require elements to be contiguous in memory for SIMD processing, this is more complicated. When you have a bunch of erasures in a hive, there's no guarantee that your objects will be contiguous in memory, even though they are sequential during iteration. Some of them may also be in different memory blocks to each other. In these situations if you want to use SIMD with hive, you must do the following:

    Generally if you want to use SIMD without gather/scatter, it's probably preferable to use a vector or an array.

  14. Why were container operators ==, != and <=> removed?

    Since this is a container where insertion position is unspecified, situations such as the following may occur:
    hive<int> t = {1, 2, 3, 4, 5}, t2 = {6, 1, 2, 3, 4};
    t2.erase(t2.begin());
    t2.insert(5);

    In this case it is implementation-defined as to whether or not t == t2, if the == operator is order-sensitive.
    If the == operator is order-insensitive, there is only one reasonable way to compare the two containers, which is with is_permutation. is_permutation has a worst-case time complexity of o(n2), which, while in keeping with how other unordered containers are implemented, was considered to be out of place for hive, which is a container where performance and consistent latency are a focus and most operations are O(1) as a result. While there are order-insensitive comparison operations which can be done in o(n log n) time, these allocate, which again was considered inappropriate for a == operator. Those operations may become the subject of a future paper.

    In light of all of this the bulk of SG14 and LEWG considered it more appropriate to remove the ==, != and <=> operators entirely, as these were unlikely to be used significantly with hive anyway. This gives the user the option of using is_permutation if they want an order-insensitive comparison, or std::equal if they want an order-sensitive comparison. In either case, this removes ambiguity about what kind of operation they are expecting, and the time complexity associated with that operation.

  15. What functions can potentially stop a hive from being sort()'ed?

    insert, emplace, reshape, splice and operator = (where *this is destination).

  16. Why was memory() removed?

    This was a convenience function to allow programmers to find current container memory usage without using a debugger or profiler, however this was considered out of keeping with current standard practice ie. unordered_map also uses a lot of additional memory but we don't provide such a function. In addition, the context where it would've been useful in realtime ie. determining whether or not it's worth calling trim_capacity(), is better approached by comparing size() to capacity() (although this is not foolproof either since this doesn't tell us anything about whether the empty capacity is in empty blocks or between elements).

  17. Why was the Priority template parameter removed?

    This was a hint to the implementation to prioritize for lowered memory usage or performance specifically. In the reference implementation this told the container which skipfield type to use (smaller types limited block sizes due to the constraints of the jump-counting skipfield pattern). In other implementations this could've taken the form of using a bitfield with jump-counting information pushed into the erased element memory space, for the memory usage priority. However, prior to a particular LEWG meeting there had not been sufficient benchmarking and memory testing done on this - all benchmarking had been done at an earlier time without checking memory usage.

    When more thorough benchmarking including memory measurements were done it was found that the vast bulk of unnecessary memory usage came from erased elements in hive when an element memory block is not yet empty (and therefore freed to the OS or retained depending on implementation), rather than the skipfield type itself. This meant that assuming a randomised erasure pattern, smaller block capacities had far more to do with how much memory was wasted than the skipfield type, as they were more likely to be freed than larger ones. And block capacities could already be specified by the user. Further, the better performance in some benchmarks was primarily related to this fact - reusing erased element memory space in existing blocks was much faster than having to deallocate/reserve blocks and subsequently allocate/unreserve new blocks.

    The only caveat to this was when using low sizeof types such as scalars, where the additional memory from the skipfield (proportional to the type sizeof) was significant, and this use-case can be worked around at compile-type by switching to a smaller skipfield type (or bitfield as described) based on sizeof(type), either using concepts and overloads or another mechanism. Unfortunately, I personally think the priority parameter would've also been useful for a number of other compile-time decision processes, such as deciding what block retention strategy to use when erasing and a block becomes empty of elements.

    And since smaller block sizes result in less memory usage, using unsigned char for the jump-counting skipfield type and forcing the max block capacity down to 255, as the reference did for priority::memory_use, still met the requirement of lowering memory usage - though the same thing can be achieved with slightly more memory use by using priority::performance (unsigned short) and specifying the block size directly. Lastly, having a priority tag gave the ability to specify new priority values in future as part of the standard, potentially allowing for new and better changes without breaking ABI in implementations. So these are the pros and cons, but the committee has made it's decision for better or for worse.

  18. Why does a bidirectional container have iterator operators > <, >=, <= and <=>?

    These are useful for several reasons:

    1. It can also be used in situations where the user is looping over data and has a non-1 addition/subtraction to the iterator per-cycle (or a potential non-1 addition), and so hence having a for-loop end condition of != end() or similar would not necessarily work.
    2. It is be used by the distance() implementation to determine whether first > last, and correctly calculate distance without crashing.
    3. Because hive insert location is unspecified, if you have a specific range of elements which you've calculated the distance between, iterator ops </>/<=/>=/<=> are the only way to determine whether an element you just inserted is within that range. Likewise if external objects/entities are removing elements from a hive via stored pointers/iterators, those ops are the only way to determine if the element was within that range.
  19. Why is the time complexity of updating the erased-element-skipping mechanism not factored into time complexity for erase and insertion operations?

    Primarily to allow for implementation improvement. While the reference implementation uses a mechanism that is O(1), it is possible that someone in future could come up with a mechanism that was not O(1), but still out-performed the reference implementation's mechanism substantially and was never slower. In that case we would want to switch to that mechanism. Further, since the effects of time complexity on implementation performance are both constrained by hardware and software, it's relevance is situation-dependent. But it's relevance to, for example, reallocation of elements is more reliably understood in this context, as this can be calculated based on sizeof(value_type), whereas the erased-element skipping mechanism could take many forms.

    The same logic applies for the erased-element recording mechanism.

  20. Why is the insertion time complexity for singular insert/emplace O(1) amortized as opposed to O(1)?

    Two reasons, one is that a new block may have to be allocated or transferred from the reserved blocks if all active blocks are full. Second reason is that (at least in the current reference implementation at time of writing) in the event of a block allocation or transfer there is an update of block numbers which may occur if the last current active block has a group number == std::numeric_limits<size_type>::max(). The occurence of this event is 1 in every std::numeric_limits<size_type>::max() block allocations/transfers. It updates the group numbers in every active block. The number of active blocks at this point could be small or large.

    In addition, if a hive were implemented as a vector of pointers to groups, rather than a linked list of groups, this would also necessitate amortized time complexity as when the vector became full, all group pointers would need to be reallocated to a new memory block.

    Hence O(1) amortized.

Appendix D - Specific responses to previous committee feedback

  1. Naming

    See paper number 2332R0.

  2. "Unordered and no associative lookup, so this only supports use cases where you're going to do something to every element."

    As noted the container was originally designed for highly object-oriented situations where you have many elements in different containers linking to many other elements in other containers. This linking can be done with pointers or iterators in hive (insert returns an iterator which can be dereferenced to get a pointer, pointers can be converted into iterators with the supplied functions (for erase etc)) and because pointers/iterators stay stable regardless of insertion/erasure, this usage is unproblematic. You could say the pointer is equivalent to a key in this case (but without the overhead). That is the first access pattern, the second is straight iteration over the container, as you say. Secondly, the container does have (typically better than O(n)) advance/next/prev implementations, so multiple elements can be skipped.

  3. "Prove this is not an allocator"

    I'm not really sure how to answer this, as I don't see the resemblance, unless you count maps, vectors etc as being allocators also. The only aspect of it which resembles what an allocator might do, is the memory re-use mechanism. It would be impossible for an allocator to perform a similar function while still allowing the container to iterate over the data linearly in memory, preserving locality, in the manner described in this document.

  4. "If this is for games, won't game devs just write their own versions for specific types in order to get a 1% speed increase anyway?"

    This is true for many/most AAA game companies who are on the bleeding edge, but they also do this for vector etc, so they aren't the target audience of std:: for the most part; sub-AAA game companies are more likely to use third party/pre-existing tools. As mentioned earlier, this structure (bucket-array-like) crops up in many, many fields, not just game dev. So the target audience is probably everyone other than AAA gaming, but even then, it facilitates communication across fields and companies as to this type of container, giving it a standardized name and understanding.

  5. "Is there active research in this problem space? Is it likely to change in future?"

    The only current analysis has been around the question of whether it's possible for this specification to fail to allow for a better implementation in future. This is unlikely given the container's requirements and how this impacts on implementation. Bucket arrays have been around since the 1990s, there's been no significant innovation in them until now. I've been researching/working on hive since early 2015, and while I can't say for sure that a better implementation might not be possible, I am confident that no change should be necessary to the specification to allow for future implementations, if it is done correctly. This's in part because of the C++ container requirements and how these constrain implementation.

    The requirement of allowing no reallocations upon insertion or erasure, truncates possible implementation strategies significantly. Memory blocks have to be independently allocated so that they can be removed (when empty) without triggering reallocation of subsequent elements. There's limited numbers of ways to do that and keep track of the memory blocks at the same time. Erased element locations must be recorded (for future re-use by insertion) in a way that doesn't create allocations upon erasure, and there's limited numbers of ways to do this also. Multiple consecutive erased elements have to be skipped in O(1) time in order for the iterator to meet the C++ iterator O(1) function requirement, and again there's limits to how many ways you can do that. That covers the three core aspects upon which this specification is based. See Design Decisions for the various ways these aspects can be designed.

    The time complexity of updates to whatever erased-element skipping mechanism is used should, I think, be left implementation-defined, as defining time complexity may obviate better solutions which are faster but are not necessarily O(1). These updates would likely occur during erasure, insertion, splicing and container copying.

  6. "Why not support push_back and push_front?"

    1. Ordered insertion would create performance penalties due to not reusing previously-erased element locations, which in turn increases the number of block allocations necessary and reduces iteration speed due to wider gaps between active elements and the resultant reduced cache locality. This negates the performance benefits of using this container.
    2. Newcomers will get confused and use push_back instead of insert, because they will assume this is faster based on their experience of other containers, and the function call itself may actually be faster in some circumstances. But it will also inhibit performance for the reasons above. Further, explaining how the container works and operates has proved to be difficult even with C++ committee members, so begin able to explain it adequately to novices such that they avoid this pitfall is in-no-way guaranteed.
    3. It should be unambiguous as to its interface and how it works, and what guarantees are in place. Making insertion guarantees straightforward is key to performant usage. Having fewer constraints is also important for allowing future, potentially-faster, implementation.
    4. Supporting push_back and push_front introduces other performance disadvantages in addition to those mentioned above. As one example if you support push_front you have to maintain another variable which records the point at the beginning of the container beyond which nothing has yet been inserted (usually but not always begin() minus 1) and be cognisant of it in your insert and erase functions. Then you also have to check in all insert/assign functions whether or not there is empty space at the front of the container prior to begin() which can be used prior to creating a new block.
    5. There are other, better containers for ordered insertion, even ones which support contiguous allocation and the re-use of erased element memory (eg. plf::list).
  7. "Why not constexpr?"

    UPDATE May 2022: Re-testing these assumptions, later compiler versions appear to not suffer from the issues around codegen and speed (although codegen is still very much different for versions of the container with constexpr specified for all functions). There are still however issues with compilation and I think a staggered approach to rolling out constexpr containers is still the right one, so that any potential "gotcha"s can be addressed before ABIs in the various STL flavours essentially become frozen. Original text follows.

    TLDR; this may be possible in future with better understanding of constexpr container problems and advantages, presently both compiler support and programmer experience is lacking, and tests result in less performance in runtime code when constexpr is enabled on all of hives container methods. Updating at a later stage is unproblematic, and I think this is a wait-and-see scenario.

    non-TLDR: I am somewhat awkwardly forced into a position where I have to question and push back slightly against the current enthusiasm around constexpr containers. At the time of writing there are no compilers which both support constexpr non-trivial destructors and also have a working implementation of a constexpr container. Until that is remedied, we won't really know what we're dealing with. My own testing in terms of making hive functions constexpr has not been encouraging. 2% performance decrease in un-altered benchmark code was common, and I suspect the common cause of this was the caching of return values from functions called at compile-time when it was cheaper to calculate them on-the-fly than to return them from main memory. This suspicion was based on the substantial increases in executable size in the constexpr versions. It is also a well-known situation in modern game development.

    For an example of the latter, think about size() in std::vector. This can be calculated in most implementations by (vector.end_iterator.pointer - vector.memory_block), both of which will most likely be in cache at the time of calling size(). That's if size isn't a member variable or something. Calculating a minus operation on stuff that's already in cache is about 100x faster than making a call out to main memory for a compile-time-stored value of this function, if that is necessary. Hence calculating size() will typically be faster than storing it, but a constexpr implementation and compiler currently won't make that distinction.

    None of which is an issue if a container is being entirely used within a constexpr function which has been determined to be evaluated at compile time. The problems occur when constexpr containers are used in runtime code, but certain functions such as size() are determined to be able to be evaluated at compile time, and therefore have their results cached. If there was a mechanism which specified that for a given class instance, its constexpr functions may not be evaluated at compile time, then I would give the go-ahead. Or if there were a rule which stated that a class instance's member functions may only be evaluated at compile time if the class instance is entirely instantiated and destructed at compile time, I would give the go-ahead. This does not appear to be the situation we have, as far as I can tell at the moment. If however my benchmark results above are in fact the result of compiler bugs, I will eat my words.

    Time may sort these issues out, though I am personally happier for std::array and std::vector to be the "canaries in the coalmine" here.

  8. Licensing for the reference implementation (zLib) - is this compatible with libstdc++/libc++/MS-STL usage?

    Yes. zLib license is compatible with both GPL3 and Apache licenses (libc++/MS-STL). zLib is a more permissive license than all of these, only requiring the following:

    This software is provided 'as-is', without any express or implied warranty. In no event will the authors be held liable for any damages arising from the use of this software.

    Permission is granted to anyone to use this software for any purpose, including commercial applications, and to alter it and redistribute it freely, subject to the following restrictions:

    1. The origin of this software must not be misrepresented; you must not claim that you wrote the original software. If you use this software in a product, an acknowledgment in the product documentation would be appreciated but is not required.
    2. Altered source versions must be plainly marked as such, and must not be misrepresented as being the original software.
    3. This notice may not be removed or altered from any source distribution.

    Please note that "product" in this instance doesn't mean 'source code', as in a library, but a program or executable. This is made clear by line 3 which clearly differentiates source distributions from products.

    In addition, high-level representatives from libc++, libstdc++ and MS-STL have stated they will either use the reference or may use it as a starting point and that licensing is unproblematic (with the exception of libc++ who stated they would need to run it past LLVM legal reps). However if in any case licensing becomes problematic as the sole author of the reference implementation I am in a position to grant use of the code under other licenses as I see fit.

  9. How does hive solve the ABA problem where a given iterator/pointer points to a given element, then that element is erased, another element is inserted and re-uses the previous element's memory location? We now have invalidated iterators/pointers which point to valid elements.

    It doesn't. Detecting these cases is down to the end user, as it is in deque or vector when elements are erased. In the case of hive I would recommend the use of either a generation counter or some other kind of unique ID within the element itself. The end user can build their own "handle" wrapper around a pointer or iterator which stores a copy of this ID, then compares it against the element itself upon accessing it.

    In terms of guarantees that an element has not been replaced via hive usage, replacement may occur if:

    1. Any number of erasures have occurred, and then at least one insertion has occurred.
    2. clear() has been called and then at least one insertion has occurred.
    3. shrink_to_fit(), reshape(), assign(), unique(), std::erase_if(), std::swap() or swap() have been called.

Appendix E - Typical game engine requirements

Here are some more specific requirements with regards to game engines, verified by game developers within SG14:

  1. Elements within data collections refer to elements within other data collections (through a variety of methods - indices, pointers, etc). These references must stay valid throughout the course of the game/level. Any container which causes pointer or index invalidation creates difficulties or necessitates workarounds.
  2. Order is unimportant for the most part. The majority of data is simply iterated over, transformed, referred to and utilized with no regard to order.
  3. Erasing or otherwise "deactivating" objects occurs frequently in performance-critical code. For this reason methods of erasure which create strong performance penalties are avoided.
  4. Inserting new objects in performance-critical code (during gameplay) is also common - for example, a tree drops leaves, or a player spawns in an online multiplayer game.
  5. It is not always clear in advance how many elements there will be in a container at the beginning of development, or at the beginning of a level during play. Genericized game engines in particular have to adapt to considerably different user requirements and scopes. For this reason extensible containers which can expand and contract in realtime are necessary.
  6. Due to the effects of cache on performance, memory storage which is more-or-less contiguous is preferred.
  7. Memory waste is avoided.

std::vector in its default state does not meet these requirements due to:

  1. Poor (non-fill) single insertion performance (regardless of insertion position) due to the need for reallocation upon reaching capacity
  2. Insert invalidates pointers/iterators to all elements
  3. Erase invalidates pointers/iterators/indexes to all elements after the erased element

Game developers therefore either develop custom solutions for each scenario or implement workarounds for vector. The most common workarounds are most likely the following or derivatives:

  1. Using a boolean flag or similar to indicate the inactivity of an object (as opposed to actually erasing from the vector). Elements flagged as inactive are skipped during iteration.

    Advantages: Fast "deactivation". Easy to manage in multi-access environments.
    Disadvantages: Can be slower to iterate due to branching.
  2. Using a vector of data and a secondary vector of indexes. When erasing, the erasure occurs only in the vector of indexes, not the vector of data. When iterating it iterates over the vector of indexes and accesses the data from the vector of data via the remaining indexes.

    Advantages: Fast iteration.
    Disadvantages: Erasure still incurs some reallocation cost which can increase jitter.
  3. Combining a swap-with-back-element-and-pop approach to erasure with some form of dereferenced lookup system to enable contiguous element iteration (sometimes called a Packed array).
    Advantages: Iteration is at standard vector speed.
    Disadvantages: Erasure will be slow if objects are large and/or non-trivially copyable, thereby making swap costs large. All link-based access to elements incur additional costs due to the dereferencing system.

Hive brings a more generic solution to these contexts. While some developers, particularly AAA developers, will almost always develop a custom solution for specific use-cases within their engine, I believe most sub-AAA and indie developers are more likely to rely on third party solutions. Regardless, standardising the container will allow for greater cross-discipline communication.

Appendix F - Time complexity requirement explanations

Insert (single): O(1)

One of the requirements of hive is that pointers to non-erased elements stay valid regardless of insertion/erasure within the container. For this reason the container must use multiple memory blocks. If a single memory block were used, like in a std::vector, reallocation of elements would occur when the container expanded (and the elements were copied to a larger memory block). Instead, hive will insert into existing memory blocks when able, and create a new memory block when all existing memory blocks are full. This keeps insertion at O(1).

Insert (multiple): O(N)

Multiple insertions may allow an implementation to reserve suitably-sized memory blocks in advance, reducing the number of allocations necessary (whereas singular insertion would generally follow the implementation's block growth pattern, possibly allocating more than necessary). However when it comes to time complexity it has no advantages over singular insertion, is linear to the number elements inserted.

Erase (single): O(1)

Erasure is a simple matter of destructing the element in question and updating whatever data is associated with the erased-element skipping mechanism eg. the skipfield. Since we use a skipping mechanism to avoid erasures during iterator, no reallocation of subsequent elements is necessary and the process is O(1). Additionally, when using a Low-complexity jump-counting pattern the skipfield update is also always O(1).

Note: When a memory block becomes empty of non-erased elements it must be freed to the OS (or reserved for future insertions, depending on implementation) and removed from the hive's sequence of memory blocks. It it was not, we would end up with non-O(1) iteration, since there would be no way to predict how many empty memory blocks there would be between the current memory block being iterated over, and the next memory block with non-erased (active) elements in it.

Erase (multiple): O(N) for non-trivially-destructible types, for trivially-destructible types between O(1) and O(N) depending on range start/end

In this case, where the element is non-trivially destructible, the time complexity is O(N), with infrequent deallocation necessary from the removal of an empty memory block as noted above. However where the elements are trivially-destructible, if the range spans an entire memory block at any point, that block and its metadata can simply be removed without doing any individual writes to its metadata or individual destruction of elements, potentially making this a O(1) operation.

In addition (when dealing with trivially-destructible types) for those memory blocks where only a portion of elements are erased by the range, if no prior erasures have occurred in that memory block you may be able to erase that range in O(1) time, as, for example, if you are using a skipfield there will be no need to check the skipfield within the range for previously erased elements. The reason you would need to check for previously erased elements within that portion's range is so you can update the metadata for that memory block to accurately reflect how many non-erased elements remain within the block. The non-erased element-count metadata is necessary because there is no other way to ascertain when a memory block is empty of non-erased elements, and hence needs to be removed from the hive's iteration sequence. The reasoning for why empty memory blocks must be removed is included in the Erase(single) section, above.

However in most cases the erase range will not perfectly match the size of all memory blocks, and with typical usage of a hive there is usually some prior erasures in most memory blocks. So, for example, when dealing with a hive of a trivially-destructible type, you might end up with a tail portion of the first memory block in the erasure range being erased in O(N) time, the second and intermediary memory block being completely erased and freed in O(1) time, and only a small front portion of the third and final memory block in the range being erased in O(N) time. Hence the time complexity for trivially-destructible elements is between O(1) and O(N) depending on the start and end of the erasure range.

std::find: O(N)

This relies on basic iteration so is O(N).

splice: O(1)

Hive only does full-container splicing, not partial-container splicing (use range-insert with std::make_move_iterator to achieve the latter, albiet with the loss of pointer validity to the moved range). When splicing, the memory blocks from the source hive are transferred to the destination hive without processing the individual elements. These blocks may either be placed at the front of the hive or the end, depending on how full the source back block is compared to the destination back block. If the destination back block is more full ie. there is less unused space in it, it is better to put it at the beginning of the source block - as otherwise this creates a larger gap to skip during iteration which in turn affects cache locality. If there are unused element memory spaces at the back of the destination container (ie. the final memory block is not full) and a skipfield is used, the skipfield nodes corresponding to those empty spaces must be altered to indicate that these are skipped elements.

Iterator operators ++ and --: O(1) amortized

Generally the time complexity is O(1), and if a skipfield pattern is used it must allow for O(1) skipping of multiple erased elements. However every so often iteration will involve a transistion to the next/previous memory block in the hive's sequence of blocks, depending on whether we are doing ++ or --. At this point a read of the next/previous memory block's corresponding skipfield would be necessary, in case the front/back element(s) in that memory block are erased and hence skipped. So for every block transition, 2 reads of the skipfield are necessary instead of 1. Hence the time complexity is O(1) amortized.

If skipfields are used they must be per-element-memory-block and independent of subsequent/previous memory blocks, as otherwise you end up with a vector for a skipfield, which would need a range erased every time a memory block was removed from the hive (see notes under Erase, above), and reallocation to a larger skipfield memory block when a hive expanded. Both of these procedures carry reallocation costs, meaning you could have thousands of skipfield nodes needing to be reallocated based on a single erasure (from within a memory block which only had one non-erased element left and hence would need to be removed from the hive). This is unacceptable latency for any field involving high timing sensitivity (all of SG14).

begin()/end(): O(1)

For any implementation these should generally be stored as member variables and so returning them is O(1).

advance/next/prev: between O(1) and O(n), depending on current iterator location, distance and implementation. Average for reference implementation approximates O(log N).

The reasoning for this is similar to that of Erase(multiple), above. Complexity is dependent on state of hive, position of iterator and length of distance, but in many cases will be less than linear. It is necessary in a hive to store metadata both about the capacity of each block (for the purpose of iteration) and how many non-erased elements are present within the block (for the purpose of removing blocks from the iterative chain once they become empty). For this reason, intermediary blocks between the iterator's initial block and its final destination block (if these are not the same block, and if the initial block and final block are not immediately adjacent) can be skipped rather than iterated linearly across, by subtracting the "number of non-erased elements" metadata from distance for those blocks.

This means that the only linear time operations are any iterations within the initial block and the final block. However if either the initial or final block have no erased elements (as determined by comparing whether the block's capacity metadata and the block's "number of non-erased elements" metadata are equal), linear iteration can be skipped for that block and pointer/index math used instead to determine distances, reducing complexity to constant time. Hence the best case for this operation is constant time, the worst is linear to the distance.

distance: between O(1) and O(n), depending on current iterator location, distance and implementation. Average for reference implementation approximates O(log N).

The same considerations which apply to advance, prev and next also apply to distance - intermediary blocks between iterator1 and iterator2's blocks can be skipped in constant time, if they exist. iterator1's block and iterator2's block (if these are not the same block) must be linearly iterated across using ++ unless either block has no erased elements, in which case the operation becomes pointer/index math and is reduced to constant time for that block. In addition, if iterator1's block is not the same as iterator2's block, and iterator2 is equal to end() or (end() - 1), or is the last element in that block, iterator2's block's elements can also counted from the metadata rather than iteration.

Appendix G - Original reference implementation differences and link

This proposal and its reference implementation and the original reference implementation have several key differences, one is that the original is named 'colony', for historical and userbase reasons. Other differences follow:

Appendix H - Some user experience reports

Richard, Creative Assembly:

"I'm the lead of the Editors team at Creative Assembly, where we make tools for the Total War series of games. The last game we released was Three Kingdoms, currently doing quite well on Steam. The main tool that I work on is the map creation editor, kind of our equivalent of Unreal Editor, so it's a big tool in terms of code size and complexity.

The way we are storing and rendering entities in the tool currently is very inefficient: essentially we have a quadtree which stores pointers to the entities, we query that quadtree to get a list of pointers to entities that are in the frustum, then we iterate through that list calling a virtual draw() function on each entity. Each part of that process is very cache-unfriendly: the quadtree itself is a cache-unfriendly structure, with nodes allocated on the heap, and the entities themselves are all over the place in memory, with a virtual function call on top.

So, I have made a new container class in which to store the renderable versions of the entities, and this class has a bunch of colonies inside, one for each type of 'renderable'. On top of this, instead of a quadtree, I now have a virtual quadtree. So each renderable contains the index of the quadtree node that it lives inside. Then, instead of asking the quadtree what entities are in the frustum, I ask the virtual quadtree for a node mask of the nodes what are in the frustum, which is just a bit mask. So when rendering, I iterate through all the renderables and just test the relevant bit of the node mask to see if the renderable is in the frustum. (Or more accurately, to see if the renderable has the potential to be in the frustum.) Nice and cache friendly.

When one adds an entity to the container, it returns a handle, which is just a pointer to the object inside one of the colonies returned as a std::uintptr_t. So I need this to remain valid until the object is removed, which is the other reason to use a colony."

Andrew Shuvalov, MongoDB:

"I implemented a standalone open source project for the thread liveness monitor: https://github.com/shuvalov-mdb/thread-liveness-monitor. Also, I've made a video demo of the project: https://youtu.be/uz3uENpjRfA

The benchmarks are in the doc, and as expected the plf::colony was extremely fast. I do not think it's possible to replace it with any standard container without significant performance loss. Hopefully, this version will be very close to what we will put into the MongoDB codebase when this project is scheduled."

Daniel Elliot, Weta Digital:

"I'm using it as backing storage for a volumetric data structure (like openvdb). Its sparse so each tile is a 512^3 array of float voxels.

I thought that having colony will allow me to merge multiple grids together more efficiently as we can just splice the tiles and not copy or reallocate where the tiles dont overlap. Also adding and removing tiles will be fast. Its kind of like using an arena allocator or memory pool without having to actually write one."
Note: this is a private project Daniel is working on, not one for Weta Digital.

Gašper Ažman, Citadel Securities:

"Internally we use it as a slab allocator for objects with very different lifetime durations where we want aggressive hot memory reuse. It lets us ensure the algorithms are correct after the fact by being able to iterate over the container and verify what's alive.

It's a great single-type memory pool, basically, and it allows iteration for debugging purposes :)

Where it falls slightly short of expectation is having to iterate/delete/insert under a lock for multithreaded operation - for those usecases we had to do something different and lock-free, but for single-threaded applications it's amazing."

Appendix I - A brief and incomplete guide for selecting the appropriate container from inside/outside the C++ standard library, based on performance characteristics, functionality and benchmark results

Guides and flowcharts I've seen online have either been performance-agnostic or incorrect. This is not a perfect guide, nor is it designed to suit all participants, but it should be largely correct in terms of it's focus. Note, this guide does not cover:

  1. All known C++ containers
  2. Multithreaded usage/access patterns in any depth
  3. All scenarios
  4. The vast variety of map variants and their use-cases
  5. Examinations of technical nuance (eg. at which sizeof threshold on a given processor does a type qualify as large enough to consider not using it in a vector if there is non-back erasure?). For that reason I'm not going to qualify 'Very large' or 'large' descriptors in this guide.

These are broad strokes and can be treated as such. Specific situations with specific processors and specific access patterns may yield different results. There may be bugs or missing information. The strong insistence on arrays/vectors where-possible is to do with code simplicity, ease of debugging, and performance via cache locality. I am purposefully avoiding any discussion of the virtues/problems of C-style arrays vs std::array or vector here, for reasons of brevity. The relevance of all assumptions are subject to architecture. The benchmarks this guide is based upon are available here, here. Some of the map/set data is based on google's abseil library documentation.

Start!

a = yes, b = no

0. Is the number of elements you're dealing with a fixed amount?
0a. If so, is all you're doing either pointing to and/or iterating over elements?
0aa. If so, use an array (either static or dynamically-allocated).
0ab. If not, can you change your data layout or processing strategy so that pointing to and/or iterating over elements would be all you're doing?
0aba. If so, do that and goto 0aa.
0abb. If not, goto 1.
0b. If not, is all you're doing inserting-to/erasing-from the back of the container and pointing to elements and/or iterating?
0ba. If so, do you know the largest possible maximum capacity you will ever have for this container, and is the lowest possible maximum capacity not too far away from that?
0baa. If so, use vector and reserve() the highest possible maximum capacity. Or use boost::static_vector for small amounts which can be initialized on the stack.
0bab. If not, use a vector and reserve() either the lowest possible, or most common, maximum capacity. Or boost::static_vector.
0bb. If not, can you change your data layout or processing strategy so that back insertion/erasure and pointing to elements and/or iterating would be all you're doing?
0bba. If so, do that and goto 0ba.
0bbb. If not, goto 1.


1. Is the use of the container stack-like, queue-like or ring-like?
1a. If stack-like, use plf::stack, if queue-like, use plf::queue (both are faster and configurable in terms of memory block sizes). If ring-like, use ring_span or ring_span lite.
1b. If not, goto 2.


2. Does each element need to be accessible via an identifier ie. key? ie. is the data associative.
2a. If so, is the number of elements small and the type sizeof not large?
2aa. If so, is the value of an element also the key?
2aaa. If so, just make an array or vector of elements, and sequentially-scan to lookup elements. Benchmark vs absl:: sets below.
2aab. If not, make a vector or array of key/element structs, and sequentially-scan to lookup elements based on the key. Benchmark vs absl:: maps below.
2ab. If not, do the elements need to have an order?
2aba. If so, is the value of the element also the key?
2abaa. If so, can multiple keys have the same value?
2abaaa. If so, use absl::btree_multiset.
2abaab. If not, use absl::btree_set.
2abab. If not, can multiple keys have the same value?
2ababa. If so, use absl::btree_multimap.
2ababb. If not, use absl::btree_map.
2abb. If no order needed, is the value of the element also the key?
2abba. If so, can multiple keys have the same value?
2abbaa. If so, use std::unordered_multiset or absl::btree_multiset.
2abbab. If not, is pointer stability to elements necessary?
2abbaba. If so, use absl::node_hash_set.
2abbabb. If not, use absl::flat_hash_set.
2abbb. If not, can multiple keys have the same value?
2abbba. If so, use std::unordered_multimap or absl::btree_multimap.
2abbbb. If not, is on-the-fly insertion and erasure common in your use case, as opposed to mostly lookups?
2abbbba. If so, use robin-map.
2abbbbb. If not, is pointer stability to elements necessary?
2abbbbba. If so, use absl::flat_hash_map<Key, std::unique_ptr<Value>>. Use absl::node_hash_map if pointer stability to keys is also necessary.
2abbbbbb. If not, use absl::flat_hash_map.
2b. If not, goto 3.

Note: if iteration over the associative container is frequent rather than rare, try the std:: equivalents to the absl:: containers or tsl::sparse_map. Also take a look at this page of benchmark conclusions for more definitive comparisons across more use-cases and hash map implementations.


3. Are stable pointers/iterators/references to elements which remain valid after non-back insertion/erasure required, and/or is there a need to sort non-movable/copyable elements?
3a. If so, is the order of elements important and/or is there a need to sort non-movable/copyable elements?
3aa. If so, will this container often be accessed and modified by multiple threads simultaneously?
3aaa. If so, use forward_list (for its lowered side-effects when erasing and inserting).
3aab. If not, do you require range-based splicing between two or more containers (as opposed to splicing of entire containers, or splicing elements to different locations within the same container)?
3aaba. If so, use std::list.
3aabb. If not, use plf::list.
3ab. If not, use hive.
3b. If not, goto 4.


4. Is the order of elements important?
4a. If so, are you almost entirely inserting/erasing to/from the back of the container?
4aa. If so, use vector, with reserve() if the maximum capacity is known in advance.
4ab. If not, are you mostly inserting/erasing to/from the front of the container?
4aba. If so, use deque.
4abb. If not, is insertion/erasure to/from the middle of the container frequent when compared to iteration or back erasure/insertion?
4abba. If so, is it mostly erasures rather than insertions, and can the processing of multiple erasures be delayed until a later point in processing, eg. the end of a frame in a video game?
4abbaa. If so, try the vector erase_if pairing approach listed at the bottom of this guide, and benchmark against plf::list to see which one performs best. Use deque with the erase_if pairing if the number of elements is very large.
4abbab. If not, goto 3aa.
4abbb. If not, are elements large or is there a very large number of elements?
4abbba. If so, benchmark vector against plf::list, or if there is a very large number of elements benchmark deque against plf::list.
4abbbb. If not, do you often need to insert/erase to/from the front of the container?
4abbbba. If so, use deque.
4abbbbb. If not, use vector.
4b. If not, goto 5.


5. Is non-back erasure frequent compared to iteration?
5a. If so, is the non-back erasure always at the front of the container?
5aa. If so, use deque.
5ab. If not, is the type large, non-trivially copyable/movable or non-copyable/movable?
5aba. If so, use hive.
5abb. If not, is the number of elements very large?
5abba. If so, use a deque with a swap-and-pop approach (to save memory vs vector - assumes standard deque implementation of fixed block sizes) ie. when erasing, swap the element you wish to erase with the back element, then pop_back(). Benchmark vs hive.
5abbb. If not, use a vector with a swap-and-pop approach and benchmark vs hive.
5b. If not, goto 6.


6. Can non-back erasures be delayed until a later point in processing eg. the end of a video game frame?
6a. If so, is the type large or is the number of elements large?
6aa. If so, use hive.
6ab. If not, is consistent latency more important than lower average latency?
6aba. If so, use hive.
6abb. If not, try the erase_if pairing approach listed below with vector, or with deque if the number of elements is large. Benchmark this approach against hive to see which performs best.
6b. If not, use hive.


Vector erase_if pairing approach:
Try pairing the type with a boolean, in a vector, then marking this boolean for erasure during processing, and then use erase_if with the boolean to remove multiple elements at once at the designated later point in processing. Alternatively if there is a condition in the element itself which identifies it as needing to be erased, try using this directly with erase_if and skip the boolean pairing. If the maximum is known in advance, use vector with reserve().