Recently added to Bjarne Stroustrup's FAQ:
Are lists evil?
by Bjarne Stroustrup
From the FAQ:
According to some corners of the Web, I am under the impression that vectors are always better than linked lists and that I don't know about other data structures, such as trees (e.g.
std::set
) and hash tables (e.g.,std::unordered_map
). Obviously, that's absurd.The problem seems to be an interesting little exercise that John Bentley once proposed to me: Insert a sequence of random integers into a sorted sequence, then remove those elements one by one as determined by a random sequece of positions: Do you use a vector (a contiguously allocated sequence of elements) or a linked list? For example, see Software Development for Infrastructure. I use this example to illustrate some points, encourage thought about algorithms, data structures, and machine architecture, concluding:
- don't store data unnecessarily,
- keep data compact, and
- access memory in a predictable manner.
Note the absence of "list" and "vector" in the conclusion. Please don't confuse an example with what the example is meant to illustrate.
I used that example in several talks, notably:
This video has been popular: It has been downloaded more than 250K times (plus another 50K+ times at verious other sites). My impression is that many viewers failed to understand that the purpose of that example is to illustrate some general principles and to make people think. Initially, most people say ``List of course!'' (I have tried asking that question many times) because of the many insertions and deletions ``in the middle'' (lists are good at that). That answer is completely and dramatically wrong, so it is good to know why.
I have been using the example for years, and had graduate students implement and measure dozens of variants of this exercise and different exercises. Examples and measurements by others can be found on the Web. Of course,
- I have tried maps (they are much better than lists, but still slower than vectors)
- I have tried much larger elements sizes (eventually lists come into their own)
- I have used binary search and direct insertion for vectors (yes, they speed up even further)
- I checked my theory (no I'm not violating any big-O complexity rule; it is just that some operations can be dramatically more expensive for one data structure compared to another)
- I have preallocated links (that's better than
std::list
but the traversal still kills performance)- I have used singly-linked lists,
forward_list
s, (that doesn't make much difference, but makes it a bit harder to ensure that the user code is 100% equivalent)- I know (and say) that 500K lists are not common (but that doesn't matter for my main point). We use many structures (large and small) where there is a choice between linked and contiguous reprentation.
- I know that for insertion
push_front()
is faster forstd::list
s andpush_back()
s is faster forvector
s. You can construct examples to illustrate that, but this example is not one of those.My point is not about lists as such. They have their uses, but this example isn't one of them. Please don't confuse the example with what the example is used to illustrate. This example is about use of memory: We very often create a data structure, do some computation on it requiring access (often, traversal), and then delete it. The ordered sequence is simply an example of such use and the example is presented to get people to think about what matters in such cases. My suggestion is:
- don't store data unnecessarily,
- keep data compact, and
- access memory in a predictable manner.
I emphasize the importance of cache effects. In my experience, all but true experts tend to forget those when algorithms are discussed.
And, yes, my recomendation is to use
std::vector
by default. More generally, use a contiguous representation unless there is a good reason not to. Like C, C++ is designed to do that by default.Also, please don't make statements about performance without measurements. I have seen a case where changing a zero-to-two-element list to a zero-to-two-element vector made a factor-of-two difference to an algorithm. I didn't expect that. Nor did other experts looking at the code.
Add a Comment
Comments are closed.