Are lists evil?—Bjarne Stroustrup

Save to:
Instapaper Pocket Readability

Recently added to Bjarne Stroustrup's FAQ:

Are lists evil?

by Bjarne Stroustrup

From the FAQ:

According to some corners of the Web, I am under the impression that vectors are always better than linked lists and that I don't know about other data structures, such as trees (e.g. std::set) and hash tables (e.g., std::unordered_map). Obviously, that's absurd.

The problem seems to be an interesting little exercise that John Bentley once proposed to me: Insert a sequence of random integers into a sorted sequence, then remove those elements one by one as determined by a random sequece of positions: Do you use a vector (a contiguously allocated sequence of elements) or a linked list? For example, see Software Development for Infrastructure. I use this example to illustrate some points, encourage thought about algorithms, data structures, and machine architecture, concluding:

  • don't store data unnecessarily,
  • keep data compact, and
  • access memory in a predictable manner.

Note the absence of "list" and "vector" in the conclusion. Please don't confuse an example with what the example is meant to illustrate.

I used that example in several talks, notably:

This video has been popular: It has been downloaded more than 250K times (plus another 50K+ times at verious other sites). My impression is that many viewers failed to understand that the purpose of that example is to illustrate some general principles and to make people think. Initially, most people say ``List of course!'' (I have tried asking that question many times) because of the many insertions and deletions ``in the middle'' (lists are good at that). That answer is completely and dramatically wrong, so it is good to know why.

I have been using the example for years, and had graduate students implement and measure dozens of variants of this exercise and different exercises. Examples and measurements by others can be found on the Web. Of course,

  • I have tried maps (they are much better than lists, but still slower than vectors)
  • I have tried much larger elements sizes (eventually lists come into their own)
  • I have used binary search and direct insertion for vectors (yes, they speed up even further)
  • I checked my theory (no I'm not violating any big-O complexity rule; it is just that some operations can be dramatically more expensive for one data structure compared to another)
  • I have preallocated links (that's better than std::list but the traversal still kills performance)
  • I have used singly-linked lists, forward_lists, (that doesn't make much difference, but makes it a bit harder to ensure that the user code is 100% equivalent)
  • I know (and say) that 500K lists are not common (but that doesn't matter for my main point). We use many structures (large and small) where there is a choice between linked and contiguous reprentation.
  • I know that for insertion push_front() is faster for std::lists and push_back()s is faster for vectors. You can construct examples to illustrate that, but this example is not one of those.

My point is not about lists as such. They have their uses, but this example isn't one of them. Please don't confuse the example with what the example is used to illustrate. This example is about use of memory: We very often create a data structure, do some computation on it requiring access (often, traversal), and then delete it. The ordered sequence is simply an example of such use and the example is presented to get people to think about what matters in such cases. My suggestion is:

  • don't store data unnecessarily,
  • keep data compact, and
  • access memory in a predictable manner.

I emphasize the importance of cache effects. In my experience, all but true experts tend to forget those when algorithms are discussed.

And, yes, my recomendation is to use std::vector by default. More generally, use a contiguous representation unless there is a good reason not to. Like C, C++ is designed to do that by default.

Also, please don't make statements about performance without measurements. I have seen a case where changing a zero-to-two-element list to a zero-to-two-element vector made a factor-of-two difference to an algorithm. I didn't expect that. Nor did other experts looking at the code.

Add a Comment

You must sign in or register to add a comment.

Comments (4)

0 0

derekxgl said on Jun 6, 2014 08:26 AM:

what about vector vs deque?
0 0

jbruni said on Jun 6, 2014 10:47 AM:

deques are not guaranteed to have contiguous memory as vectors are.
0 0

NoSenseEtAl said on Jun 6, 2014 07:51 PM:

" I checked my theory (no I'm not violating any big-O complexity rule; it is just that some operations can be dramatically more expensive for one data structure compared to another)"
This is not clear, but anyway this entire list story is slightly(to put it mildly) misleading...
operations on list in vector in this case have SAME complexity, because O(n+1) = O(n+n)
But to contribute something interesting...
if you generate a list and vector and do measurements...
after that do measurements where you do a permutation of the list before measurements... according to Stepanov lectures difference will be big because memory allocator gives you "relatively nearby" memory for sequential requests...
0 0

Clayton Weaver said on Jun 8, 2014 09:58 PM:

This shows how anyone can take a simple thing and twist it. An example is just to help express a point, but people can misunderstand and run with it. Get enough people that agree with the invalid argument and you suddenly have people claiming Dr. Stroustrup likes, hates, or <insert fallacy here> something. Dr. Stroustrup has always struck me as an upfront honest person, so I would expect him to write a paper stating he didn't like it and why rather than leaving it open for interpretation. Lastly, even if he had disliked something, doesn't mean you have to dislike it too. Everything in the C++ language has its use and not everything will be used for your programs. This was still an interesting read.