Are lists evil? -- Bjarne Stroustrup

Recently added to Bjarne Stroustrup's FAQ:

Are lists evil?

by Bjarne Stroustrup

From the FAQ:

According to some corners of the Web, I am under the impression that vectors are always better than linked lists and that I don't know about other data structures, such as trees (e.g. std::set) and hash tables (e.g., std::unordered_map). Obviously, that's absurd.

The problem seems to be an interesting little exercise that John Bentley once proposed to me: Insert a sequence of random integers into a sorted sequence, then remove those elements one by one as determined by a random sequece of positions: Do you use a vector (a contiguously allocated sequence of elements) or a linked list? For example, see Software Development for Infrastructure. I use this example to illustrate some points, encourage thought about algorithms, data structures, and machine architecture, concluding:

  • don't store data unnecessarily,
  • keep data compact, and
  • access memory in a predictable manner.

Note the absence of "list" and "vector" in the conclusion. Please don't confuse an example with what the example is meant to illustrate.

I used that example in several talks, notably:

This video has been popular: It has been downloaded more than 250K times (plus another 50K+ times at verious other sites). My impression is that many viewers failed to understand that the purpose of that example is to illustrate some general principles and to make people think. Initially, most people say ``List of course!'' (I have tried asking that question many times) because of the many insertions and deletions ``in the middle'' (lists are good at that). That answer is completely and dramatically wrong, so it is good to know why.

I have been using the example for years, and had graduate students implement and measure dozens of variants of this exercise and different exercises. Examples and measurements by others can be found on the Web. Of course,

  • I have tried maps (they are much better than lists, but still slower than vectors)
  • I have tried much larger elements sizes (eventually lists come into their own)
  • I have used binary search and direct insertion for vectors (yes, they speed up even further)
  • I checked my theory (no I'm not violating any big-O complexity rule; it is just that some operations can be dramatically more expensive for one data structure compared to another)
  • I have preallocated links (that's better than std::list but the traversal still kills performance)
  • I have used singly-linked lists, forward_lists, (that doesn't make much difference, but makes it a bit harder to ensure that the user code is 100% equivalent)
  • I know (and say) that 500K lists are not common (but that doesn't matter for my main point). We use many structures (large and small) where there is a choice between linked and contiguous reprentation.
  • I know that for insertion push_front() is faster for std::lists and push_back()s is faster for vectors. You can construct examples to illustrate that, but this example is not one of those.

My point is not about lists as such. They have their uses, but this example isn't one of them. Please don't confuse the example with what the example is used to illustrate. This example is about use of memory: We very often create a data structure, do some computation on it requiring access (often, traversal), and then delete it. The ordered sequence is simply an example of such use and the example is presented to get people to think about what matters in such cases. My suggestion is:

  • don't store data unnecessarily,
  • keep data compact, and
  • access memory in a predictable manner.

I emphasize the importance of cache effects. In my experience, all but true experts tend to forget those when algorithms are discussed.

And, yes, my recomendation is to use std::vector by default. More generally, use a contiguous representation unless there is a good reason not to. Like C, C++ is designed to do that by default.

Also, please don't make statements about performance without measurements. I have seen a case where changing a zero-to-two-element list to a zero-to-two-element vector made a factor-of-two difference to an algorithm. I didn't expect that. Nor did other experts looking at the code.

Add a Comment

Comments are closed.

Comments (5)

0 0

derekxgl said on Jun 6, 2014 08:26 AM:

what about vector vs deque?
0 0

jbruni said on Jun 6, 2014 10:47 AM:

deques are not guaranteed to have contiguous memory as vectors are.
0 0

NoSenseEtAl said on Jun 6, 2014 07:51 PM:

" I checked my theory (no I'm not violating any big-O complexity rule; it is just that some operations can be dramatically more expensive for one data structure compared to another)"
This is not clear, but anyway this entire list story is slightly(to put it mildly) misleading...
operations on list in vector in this case have SAME complexity, because O(n+1) = O(n+n)
But to contribute something interesting...
if you generate a list and vector and do measurements...
after that do measurements where you do a permutation of the list before measurements... according to Stepanov lectures difference will be big because memory allocator gives you "relatively nearby" memory for sequential requests...
0 0

Clayton Weaver said on Jun 8, 2014 09:58 PM:

This shows how anyone can take a simple thing and twist it. An example is just to help express a point, but people can misunderstand and run with it. Get enough people that agree with the invalid argument and you suddenly have people claiming Dr. Stroustrup likes, hates, or <insert fallacy here> something. Dr. Stroustrup has always struck me as an upfront honest person, so I would expect him to write a paper stating he didn't like it and why rather than leaving it open for interpretation. Lastly, even if he had disliked something, doesn't mean you have to dislike it too. Everything in the C++ language has its use and not everything will be used for your programs. This was still an interesting read.
0 0

ramey said on Aug 17, 2015 03:51 PM:

I am the person who asked first question on this topic when you made your presentation on this subject at Going Native 2012. I failed to articulate my concern as well as I wanted to and you responded that you think you understood what I might be getting at but that I was wrong. It's no fun being told your wrong by Bjarne Stroustrup in front of 500 of one's peers - but it comes with the territory.

I think I understand what you were getting at, but that your example was ill chosen. I'm guessing that your point was that one could not ignore lower level consideration when choosing between alternative data structures and that the most optimal one might not be what at first seems to the be obvious choice.

Your example used list and vector to implement the same functionality and showed that vector might be faster even though it did some "list like" things. My concern was that we were losing the original abstraction that a list is thing optimized for insertions and deletions in places other than at the end while a vector is a thing for which this operation is not generally done. If we can't use these concepts of lists and vectors for choosing the appropriate data structures, then we've lost something important and we would need something to replace it with. You were unconvinced. The information on this page suggests to me that the example was ill-chosen to make the point and has created confusion in minds other than my own.

Soooo - this begs the question, what would be a better example? I think it's hard to find a good one. If one has a situation where there wouldn't be much conceptual difference between using a list or a vector ( example - a large number of collections each with only a few elements) then I think most people would likely choose the vector in any case. If it's a situation where there are lots of insertions/deletions and element size is large (example, 100,000 items each 1000 bytes long) I think most people would expect that a list would be more efficient and that they would be correct.

Actually, I really can't think of a good example (meaning one that I might actually come across in the real world) where this analysis would apply. I'm thinking that the original column was an interesting curiosity, but rarely relevant to the problems programmers in the real world actually encounter.

Robert Ramey