Doc. No.: |
WG21/N3709 |
Date: |
2013-8-29 |
Reply to: |
Hans-J. Boehm |
Notes taken by: |
Jared Hoberock, Alasdair Mackintosh |
Email: |
[email protected] |
N3709: Minutes for July 2013 Santa Clara SG1 Meeting
This meeting took place July 25-26, in Santa Clara, CA. We thank
Nvidia for hosting the meeting.
Physical Attendees
- Hans Boehm
- Chandler Carruth
- Jeffrey Yasskin
- Lawrence Crowl
- Chris Mysen
- Michael Wong
- Herb Sutter
- Alasdair Mackintosh
- Adam Berkan
- Niklas Gustafsson (Thursday only)
- Artur Laksberg
- Jared Hoberock
- Olivier Giroux
- Clark Nelson
- Mark Moir (Friday only)
- Kit Barton
Remote Attendees
- Torvald Riegel
- Pablo Halpern
- Howard Hinnant
- Paul McKenney
Minutes
Thursday 2013-07-25
- Concurrency terminology (slides attached to wiki)
- Lawrence presenting
- Lawrence: want to get our terminology story straight
- we run into lots of X-free terms -- what do these all mean
- wait-free -- every method makes progress; RMW sits in this category
- lock-free -- weaker; some method makes progress; compare-exchange loop is in this category
- Clark: is there a difference between method & thread?
- Lawrence: want to make distinction between thread is the mechanism which execs method
- treat method/operation as abstract entity
- Hans: definition of progress is tricky - not the same as execing instructions
- Lawrence: threads are isolated from others long enough to do meaningful work
- obstruction-free -- every method makes progress, provided that the thread that's running is isolated long enough; load-locked-store-conditional loops in this category
- Hans: this is a non-trivial statement, not obvious
- Herb: any operation is guaranteed to complete in finite steps if run in isolation
- Lawrence: clash-free -- some method makes progress, not clear what this is useful for
- starvation-free -- every method makes progress, queue locks are an example
- deadlock-free -- some method makes progress
- Jeffery: multiprocess operations need to be safe using the "isolated enough" mechanism
- Torvalds - starvation-free, deadlock-free, these are blocking mechanisms
- Lawrence: there are several places in the std which say "lock-free" but this isn't defined anywhere
- Lawrence: i guess what we mean is what this taxonomy calls "obstruction-free"
- Jeffrey: agree
- Hans: not completely certain that's right, is an LLSC loop only obstruction-free or is it lock-free
- Jeffrey: on actual hw, it's obstruction-free, the cache line containing the atomic variable, unless the variable is padded, the cache line may contain something else
- Chandler: on most hw, you can cause 2 different cache lines to share the same cache protocol, not sure there's actually a way to make a guarantee to make sure some other thread won't interrupt your progress
- Hans: makes sense, I think i understand. should be at least one LWG issue that has to do with this
- Lawrence: allocation-free, common to say this is a wait-free algorithm which uses a global allocator. it's an important attribute to say whether or not a concurrent thing does an allocation
- Lawrence: condition-free - method won't wait on a condition variable, unclear whether or not std::condition_variable allow implementations to wait on a notify
- Jeffrey: what do you mean about wait on notify - some acquire a lock, does that count?
- Lawrence: Not sure i'd call that waiting
- Herb: are you saying it gives up its quantum, but it doesn't actually block?
- Lawrence: the length of time you're in that critical section now can become arbitrarily delayed
- Herb: wait means block, but that's not what you're describing
- Lawrence: in essence, give up the processor
- Herb: so maybe you mean yield
- Olivier: should we have a yield-free freedom?
- Lawrence: failure-free - method never fails to complete the intended task
- Michael: how about address-free?
- Lawrence: that's in the working draft
- Michael: not easily spoken about by the standard bc there's no notion of program
- Lawrence: the std says that certain things are lock-free, but it never defines what that means
- Pablo: if we like this terminology, what do we do about the fact that we're using "lock-free" to mean something not really lock-free?
- Herb: normally lock-free, wait-free, obstruction-free to be a characteristic of an algorithm. so the std says that every algorithm is obstruction-free? which are they?
- Hans: we're talking about lock-free atomic
- Lawrence: that actually means "obstruction-free"
- Herb: so the uses of atomics are not guaranteed to make progress?
- Lawrence: yes, if obstruction-free
- Chander: yes, if they are able to run in some amt of isolation. problem is the existence of machines & circumstances where this can't be guaranteed. the whole point of obstruction-free is that they will eventually make progress
- Herb: I thought the whole idea is that it could interfere forever
- Jeffrey: on x86, we can get lock-free. on ppc, only obstruction-free
- Chandler: claiming it could interfere forever would not be a trivial state to enter - it's a probabilistic thing
- Herb: as a user, he doesn't care about probabilities, he wants a firm guarantee that the state cannot occur
- Jeffrey: it's a probability of staying in the state
- Lawrence: you have a low probability of staying in the loop
- Herb: those to me are implementation details. programmer only cares about a firm progress guarantee. should the std actually say "not guaranteed to make progress" ?
- Olivier: are alpha particles covered by the std? it's the same scale of probability we're talking about
- Hans: i'm a little bit with Herb bc it's fairly easy to get into these states where different threads are synced. is there some config where in the absence of intteruption which you indefinitely make no progress bc threads are in sync & interfere?
- Olivier: any sort of chatter on the bus will nudge it out of that state
- Herb: i like the alpha particle example, but what i'm hearing is that's not where we are with lock-free
- Chandler: the only concern i have there is people will believe that load-linked-store conditional hw is insufficient to implement c++ correctly
- Herb: my understanding of what we said earlier is that yeah but it probably won't happen, close enough
- Lawrence: load-locked-store-conditional with a hostile scheduler might not meet lock-free guarantee
- Jeffrey: i do expect attacks, but it doesn't seem like this would happen accidentally
- Hans: getting things to run in lock-step isn't unlikely
- Chandler: this requires getting the cache coherency protocol to run in lock-step, you don't have control over that
- Hans: what we're describing is that the hw has mechanisms which make these states very unlikely
- Chandler: i see 2 options:
- the abstract machine in c++ is lock-free, try to make it work as best as possible in your implementation
- we don't think you should have this difference bt the abstract machine and the actual hw for this load-linked condition, and we don't allow any difference for conforming implementations
- Herb: stall for a while is not making forward progress. stall for a while and then continue is lock-free. what about atomic<big_struct> that's going to use a mutex under the covers
- Lawrence: the difference is that you could use obstruction-free inside a signal handler
- Herb: atomic<big_struct> is guaranteed to make progress, but atomic<int> is only obstruction-free, no guarantee of fwd prog
- Lawrence: no circumstance where an obstruction-free will be worse than mutex-based one
- Pablo: re: obstruction-free/lock-free - one is not a refinement of the other
- Herb: obstruction-free is strictly weaker than lock-free
- Torvalds: it's not strictly weaker because one is blocking, the other isn't
- Chandler: the big_struct is guaranteed to make progress if it gets scheduled
- Herb: obstruction-free is trying to say that it's possible for progress never to happen
- Chandler: i think this gives the risk of being biased against llsc
- Jeffrey: would you be happy with a note that says "yes, llsc is fine"
- Hans: we need proposed wording and then evaluate
- Chandler: we need to have a note about the abstract machine and then say we expect implementations to implement abstract machine differently
- Michael: i strongly agree
- Torvalds: You can't say this thing is implemented obstruction-free and then say it behaves as if its lock-free
- Herb: is there a reason for a programmer who uses atomics to use a different algorithm based on knowing whether he's on an x86 or an llsc machine?
- Chandler: programmers should think in terms of the abstract machine
- Hans: we still have the problem on x86 that we still don't have a guarantee that there's a finite delay between executing a store on one processor and seeing it elsewhere
- Lawrence: what is the cardinality of concurrency that we're allowing. how many invocations on an object may appear without a happens-before edge between them?
- Lawrence: joint cardinality - the presence of one invocation may affect the permitted cardinality of other operations
- Olivier: we have language for where data races are banned, you have a data race that 2 operations conflict if they are non-atomic, and they conflict if one is a store
- Lawrence: our current data race allows a write to conflict with a read, in the distributed counter case a write does not conflict with a read, only another write. we need to be clear when we specify what the rules are for the user
- Hans: those things are expressable, it's not hard
- Lawrence: as long as we're paying attention, it will happen. it's really easy to not think about these things carefully
- Olivier: so is this analogous to synchronizes-with as something that the library can use, is there a races-with notion that on the interface these methods race with these methods, and all races are disallowed
- Chandler: "conflicts-with" is much better than "races-with"
- Lawrence: inherent-jointness -- the atomic constructor invocation must happen before regular method invocation, which must happen before destructor invocation
- Hans: this is specified in several different places
- Olivier: surely the lifetime of an object begins before constructor returns
- Clark: problem is that there are two different notions of lifetime
- Lawrence: it becomes a real object when its invariants are established, but these happen in the middle of a constructor. this is a problem in a concurrent world
- Michael: is there a way to talk about it using something like a strong safety guarantee?
- Chandler: i think the wording in the std would benefit from a systematic and defined terminology. i'd like to see the definitions of these terms expanded
- Jeffrey: didn't see the term "blocking" and heard two conflicting definitions of the word during the discussion
- Michael: i think something like this would be helpful. do we like obstruction-free or lock-free?
- Hans: lock-free should suggest lock-free, but weasel word it to mean that an obstruction-free implementation is fine
- Lawrence: i could imagine people doing hard real time would care
- Pablo: one of my concerns is that i don't know that we need all of these terms standardized, we need to look at which terms apply to our guarantees or requirements
- N3637 (future destructors) implementation experience
- Niklas presenting
- proposal to remove the requirement that ~future or ~shared_future blocks waiting for the associated state to become ready iff future came from std::async
- motivation for removing this requirement was that you can't predict statically whether a future will block or not
- there are contexts where it is forbidden to block, making using std::future difficult
- there are very good reasons why you have to be able to get join behavior on the future so that you can confidently know that any thread locals that were constructed as a side effect of std::async have destroyed by the time you proceed
- having std::future be a signal for that was one such way
- at Bristol decided we wanted to bifurcate the functionality such that we had std::future & std::waiting_future
- std::future::~future would never block and std::waiting_future would
- std::async would return std::waiting_future (breaking change)
- if you statically mention future as a type you accept from std::async, your code would no longer compile
- you'd get a compilation error
- there'd be an explicit way to convert a waiting_future from a future similar to future::share (we propose future::detach)
- there's a diagram of type conversions between our 4 proposed future types
- proposal succeeded in preliminary round but failed in final round at Bristol
- implementation of proposal was trivial
- async<t> returns waiting_future<t> instead of future<t>
- waiting_future copy of future with addition of detach function, change type of what share returns, add constructor taking future &&
- Lawrence: detach - does it move out of the original future?
- Niklas: yes, the original is no longer valid
- Niklas: future represents two things - placeholder for resulting value of computation and a handle which represents side effects of computation. by having one, it's hard to separate concerns
- Lawrence: could detach produce a future but leave behind enough state in the original such when the original was destroyed you would wait on the thread of the async?
- Chandler: i suspect that Howard had raised a more fundamental objection which was the change to the return type of std::async at all
- Lawrence: i understand the concern, but it's not that big a deal
- Howard: the way we usually handle this sort of thing is to introduce the desired functionality with a new name, and deprecate the old one
- Howard: even better is to make the new functionality even better to entice them to use the new functionality. nervous about changing this policy
- Howard: in Bristol, the argument was that the standard hasn't been out that long. i don't buy that argument. I don't want to tell users of future & async that they will have to recompile when they flip the switch
- Lawrence: from the standpoint of someone who may have to modify code, that's reasonable. would that approach address MS's concerns?
- Herb: there are two issues.
- right now future is this type where i don't know what it will do; defeats composability. one solution to that from our proposal was to have separate future & waiting_future types.
- what about std::async? that's much of a lesser problem bc you can deprecate it. seems like there's an opportunity bc there's low usage
- Herb: as long as there are statically detectable breaking changes, that's OK with SG1. LWG has a different view. I'm happy to fix something at a faster cadence and not leave deprecated stuff floating around.
- Herb: problem is that std::future is such a fundamental type and it needs to be composable
- Niklas: we can switch the names around. to the extent that libraries are using code, code will break no matter what. there doesn't seem to be a clean solution that avoids breaking existing code.
- Niklas: we can avoid silent breaking changes, but not breaking changes
- Chandler: i'm sympathetic about changing the nature of std::async after the fact. my ideal solution to achieving a non-blocking future type whatever its name may be is actually to deprecate std::async. it has a litany of problems that we've been discussing, we're not actually finding solutions
- Chandler: we'd simultaneously allow people to migrate to the new thing on their own schedule. high order bit is that we do need a non-blocking composable future type.
- Chanlder: everyone that i've talked to about std::future thinks std::future is that type. normal programmers don't understand the problem; only SG1 does
- Chandler: everyone intends std::future to be a non-blocking future. i'm happy with deprecating std::async and including waiting_future if that's what people want
- Herb: i just care that ~future doesn't block. it says that it doesn't unless it comes from std::async
- Pablo: the problem that i see with deprecating std::async is that it's still standard and std::future is still not a composable type. i'm not really sure that deprecating async is enough
- Pablo: the only thing you can do is protect yourself by querying some attribute of the future that you could at least assert on
- Herb: having an attribute on the future was discussed, but it's too burdensome
- Herb: the best way to deal with export was to simply remove it from the standard, not deprecate
- Michael: why do we have to deprecate async? is it absolutely necessary?
- Herb: yes, you can deprecate things but they tend to stick around. deprecation is no guarantee that the feature will ever go away. yes you can remove features, that's fine
- Herb: we do have the option of removing it without deprecation. there's no ISO requirement for deprecation before removal
- Hans: can't deprecate it without an alternative
- Jeffrey: i agree with deprecating async. i don't think we're discussing c++14 at this point. we should deprecate for c++17.
- Herb: no, deprecating std::async is on the table for c++14
- Jeffrey: in the long run i would like to deprecate std::async. a bunch of functions are written assuming that ~future doesn't block. some instances violate that precondition. it would be nice to have a way to check that precondition. but there's plenty of functions that have preconditions but don't check them.
- Chandler: advice: don't use std::async unless you need to. don't pass the returned valud of the async call across an interface boundary. once it passes a boundary, people will assume it is non-blocking
- Chandler: i want the ability to wrap future<t> in another future<t> which would make that distinction
- Chandler: whatever we do, we should do it by c++14
- Niklas: there are two things we can do with a future: pass it across an interface, drop it on the floor. the only thing we have to do is teach people not to use std::async
- Niklas: we have to just say you cannot use std::async and you don't know whether a library uses std::async or not
- Herb: Howard & other implementors, would you be okay with removing std::async in c++14 and leaving it as a conforming extension?
- Howard: thanks for asking. my recollection is that one of the reasons that EDG said let's get rid of export was bc they didn't have a sufficient customer base that was using it. i do have customers that are using it. removing without warning would be rude
- Herb: EDG wanted to just remove it because even though they did have customers that depended on it they could migrate customers away from it. if they were to deprecate export, they'd be forced to continue supporting it after customers had migrated away
- Howard: i've got enough customers who use it that i will have to support it indefinitely either way. if we deprecate/remove and don't offer an alternative it won't work
- Lawrence: there's nothing that's easy to do that replaces std::async. having to use threads or write your own library to do the equivalent is difficult
- Howard: add a method to future that would tell it not to block whether or not they knew this thing was from async
- Lawrence: i want to avoid the scenario where we detach threads and produce undefined behavior. problem i have with detach is the same problem we have with the situation of thread locals being destroyed after the standard library
- Pablo: obv what we have is a bunch of alternatives all of which have some flaw. we could change std::async, we could remove std::async without replacement, the other alternative is to remove async and put in a replacement for c++14.
- Pablo: can a new feature be a small enough delta from std::async that it's considered a new feature. is there something simple close enough to std::async that we could draft between now and the final draft that will not be considered a new feature at the last minute
- Chandler: how long did we spend trying to get std::async correct?
- Howard: we don't know yet
- Herb: the core problem is that future has to be composable, everyone expects it not to block. that's the problem we want to solve
- Herb: the bristol proposal solves that by modifying async. or we could just eliminate async. the more i think about deprecating std::async the more i think it wouldn't solve the problem
- Herb: another option would be to remove it outright, replace it with a clone with a new name, and put it in deprecated
- Howard: if we remove std::async vendors will still have to ship it after the std removes it at least for three years. during that period clients still won't be able to know that their futures don't block because i'm still shipping it as specified in c++11
- Niklas: the core concern is not just about blocking future, it's also about lack of separation of concerns. seems like waiting_future may not be the solution at all
- Niklas: the perception of future is that it's a placeholder. any solution we do come up with as a replacement should make that separation of concerns clear.
- Niklas: adding waiting_future now just perpetuates the problem
- Herb: is it really necessary to have a replacement now?
- Howard: look at the auto_ptr thing. if we deprecated it on the spot, people would have used it because there was no replacement
- Howard: strstream is the classic example of deprecating something but not having a complete replacement for it. we can't remove strstream
- Herb: I agree in general that you deprecate without replacement people will keep using it
- Herb: with async, it also has the additional problem of causing collateral damage to future. to me, future >> async because it's a composability type
- Jeffrey: i'd rather have async documented in the standard as being able to produce these futures rather than removing it from the std. i vote for deprecating over removal
- Chandler: where do we want to be in ten years? in a long enough time span that we could actually remove something. the consensus that i keep hearing is that we want to have std::future be a non-blocking interface.
- Chandler: the other thing that seems clear is we want something that is different in how it behaves than std::async. not clear what behavior exactly
- Chandler: the only thing that is left unclear is how we go about it
- Hans: i don't really understand the preference for removing async completely vs changing its return type. seems that changing its return type is that there's a lot of simple code that people can write that continues to work
- Chandler: there's no way to solve the ABI break though
- Hans: ?
- Chandler: the return type of async is an observable thing
- Hans: so is async going away entirely
- Chandler: removing async from the std doesn't imply that the libraries would remove it
- Herb: the argument for removing it is that we have a consistent std where future means doesn't block
- Howard: but all the shipping futures will block
- so what if async returns a new type that casts to future
- Chandler: can't do that without breaking API. there are people who can use std::async in their current context but there could be contexts where moving the future would cause problems
- Chandler: we really can't change the return type without forcing every library vendor to break their ABI. not willing to do that when one library vendor says no
- Herb: are there any other breaking changes in c++14?
- Lawrence: we did break the API for std::string in c++11
- Herb: have we done it in c++14?
- Chandler: no, there are no known breakages
- Pablo: there's not a way in the language to make it illegal to return a future to a higher-level function, but we could make it undefined behavior
- Herb: i really hate undefined behavior, but this is seductive bc if we want to say don't use a thing, that's the strongest way to say that
- Hans: there's a way to take an std::async generated future and decompose it into two things: placeholder and task handle
- Chandler: there's no path that causes us to have a solution quickly. we should standardize practice rather than invent
- Chandler: i would be very opposed to a newly-invented construct going into c++14
- Niklas: await should produce two results anyway - a placeholder and a task handle
- Lawrence: splitting a future is ok because you're only allowed to call get on it once
- Herb: i think where we want to be is std::future doesn't block and we have something better than std::async. it sounds like it's tentatively possible might be to accept N3637 and remove std::future<t> std::async and reintroduce std::waiting_future<t> std::async deprecated
- Howard: it sounds like you're suggesting removing async and then reinserting it
- Herb: with a new name to avoid the ABI breakage. N3637 except instead of change async in-place, reinsert a new async with a different name
- Howard: I could live with that except that removing async in c++14 is not going to do what you're going to do. you're going to have a std that's just wishful thinking
- Pablo: when something gets removed it takes a few years to go away. if you remove it in five years, it takes five years plus a few years
- Herb: we can't start sooner than now and one of the nice properties of this suggestion is that it's a minor delta away from N3637 so it's likely to satisfy folks not in attendance
- Chandler: I think we should move towards straw polls, but will be difficult to clarify what we're polling
- Chandler: there's a large number of variants which are composable. Should we have a period of deprecation? When should we take such an action? Should we provide a differently-named function which behaves as std::async was specified in N3637?
- Lawrence: concerned about semantics of future::detach because you now have no idea about lifetime of TLS
Straw poll: N3637 + add async2, remove async in c++14
Straw poll: N3637 + add async2, deprecate async in 14
Straw poll: N3637
Straw poll: just deprecate async
Straw poll: N3637 + deprecate async, no async changes
-
- Lawrence: i want to replace the detach() feature with a split() feature to split the concerns. i dont think a new type is strictly necessary. I'm concerned that detach throws the thread into the weeds
- Michael: i agree
- N3634 Improvements to std::future<t> and Related APIs
- Niklas presenting
- Some small improvements since Bristol
- unwrapping: future<future<t> > -> future<t>
- added implicit unwrapping through construction
- do implicit unwrapping in .then
- other push back in Bristol on future::is_ready()
- two pieces of push back: 1. we proposed a function named future::ready() - feedback that empty() precedent was a mistake, needs the is_ in front
- 2. there's more info besides whether it's ready that was interesting to know. e.g., if the future was deferred.
- might be a better idea to generalize the idea of future introspection
- Jeffrey: those changes sound fine to me
- Chandler: i was hoping for future::then itself to do the implicit unwrapping
- Niklas: on the input or the output?
- Chandler: the one coming out
- Niklas: i don't recall
- Niklas: You have a future<t>, T being anything, it's hard to say whether or not it should be unwrapped -- you need context
- Niklas: you have this context in future.then(), so it's reasonable to unwrap
- Artur: you may have a function returning a future
- Artur: unwrapping is not consistent with the behavior of std::async
- Artur: what if you don't desire unwrapping
- Chandler: asymmetry with async doesn't matter
- Niklas: you could say that if the then's lambda takes a future<future<t> > then there is no unwrapping on the output
- Niklas: on the other hand, if the then's lambda takes a future<t> then there is a context for unwrapping on the output
- Artur: the implicit unwrapping occurs in the PPL, but there is no unwrapping in the managed library
- then runs in the same context as the original future?
- Niklas: it depends. This proposal dovetails with the executor proposal from Bristol. if by context you mean thread id, maybe, maybe not. if by context you mean executor, then yes.
- Niklas: there's also an overload of .then that takes an explicit scheduler context
- Niklas: the proposal is meaningless without an abstract Executor context
- Michael: Does MS's PPL already implement implicit unwrapping?
- Niklas: Not our future prototype, but our PPL tasks do
- Michael: What's your experience?
- Niklas: implicit unwrapping is no more expensive than explicit unwrapping. Code looks clean
- Artur: one piece of feedback is that whenever i'm explaining implicit unwrap, i have to explain why it exists at all
- Niklas: in terms of unwrapping, we're back to where we were in Bellevue
- Niklas: does anyone feel strongly about is_ready
- Chandler, Pablo: i want is_ready
- Jared: why does lambda take a future instead of then taking continuation & error handler?
- Artur: no common exception type
- Jeffrey: exception_ptr
- Chandler: i think it would be nice if there was an option to take T as the arg, and if there is an exception, std::terminate
- Niklas: it would complicate the case for generic lambdas -- you can't look at the signature to determine whether to unwrap or not
- Niklas: with PPL we have several years of experience dealing with having a single function handling continuation & exception
- Jeffrey: I move that we accept this
- Hans: the intention is to move this to a TS
- Pablo: what are we moving to a TS?
- Niklas: implicit unwrapping on output from .then and renaming ready to is_ready
- Hans: i have my doubts about the name because of the future status argument
- Niklas: these are separate concerns
- Chandler: summarize that issue?
- Niklas: there are other things that you may want to know about the status of a future that you may want to interrogate, including whether or not it's deferred, for example
- Niklas: instead of future::is_ready, we could have future::status that returns status as an enumeration
- Chanlder: i'd like to see a straw poll about it
- Clark: i think is_ready ought to stay and if somebody wants to propose something more, that's fine
- Herb: ready or is_ready is fine, a status function that returns an enum would be fine if somebody wants to write it up
- Hans: my concern is if you have it returning a bool, what do you do with a third state? deferred doesn't really fit the description
- Chandler: the argument for having is_ready is from writing code -- it's inside of an assert. it's so much easier to read assert(f.is_ready()) than assert(f.status() == std::future_status::ready)
- Niklas: the cost of having two separate functions is minimal, and the readability of something that has a verb in it (is_ready) is high
- Pablo: for containers we have .size() and .empty(); .empty() is not fundamental, but we still have it
- Herb: even if we don't take a straw poll, if we're in agreement, could we record the agreement
- Hans: let's just ask for a three-way poll. what your top choice is
- Chandler: any objections to is_ready()?
- Hans: want to ask for the other one separately
- Hans: any objections to bool is_ready()? (no)
- Hans: any objections to also having a status function?
- Chandler: I object b/c i haven't seen a proposal
- Howard: i second that
- Niklas: to summarize, we're going to do implicit unwrapping on output from .then(), we didn't discuss whether to remove the explicit unwrap or not, name is changed to is_ready and returns bool, if someone wants to propose a status getter, then that's unrelated to this proposal.
- when_any returns all the futures, even the ones that aren't ready?
- Niklas: when_any returns a vector of the futures
- Hans: what does this proposal say about waiting_future?
- Niklas: it says nothing
- Hans: these APIs basically work for waiting_future
- N3650 Resumable Functions
- Niklas presenting
- there's a great deal of discussion about language vs library at Bristol
- reflector discussions mostly focused on whether we can generalize the resumable function concept to also include scenarios that are more generator-like (like in python generators with yield)
- another use case for resumable fcns is to generate sequences from stateful logic
- you could imagine that you call a fcn that returns an object and that calling .begin() on that object advances to the first .yield point in that resumable function
- object++ continues to advance to the next yield point
- the resumable function proposal started out in Kona with a very general definition untied to std::future
- made more narrow in Bellevue & Bristol by tying it to std::future
- it could be with some added complexity to be generalized
- what does it mean to produce a sequence from asynchronous input?
- you're producing a sequence from async input without blocking -- so at least .begin() and ++ need to be asynchronous operations
- the aspect of a generalization that would require the most thinking
- something that was brought up in Bristol that has not been addressed in discussions following was when you do an await, you will be executing the continuation on the same context (executor) that was used prior to the continuation
- there may be a reason to explicitly pick an executor that will be used for the continuation code -- we left this unresolved how you would do this
- there should be some sort of library-based mechanism to do that
- visual studio will ship a CTP release of resumable functions sometime this year as an extension
- Herb: what's been implemented is not the generator part of it, just basic await (__await & __async) on the function
- Niklas: one of the reasons i argued in bristol for a language solution is because all the libraries i've seen is that they all tie the application code very directly to the implementation
- Niklas: it's very hard to migrate from one implementation to another
- Chandler: there are researchers at Google who are building a system which would be an alternative to this
- Chandler: it's nothing other than threads except that they've demonstrated the scalability of 100Ks threads live on the system concurrently without a whole bunch of drawbacks
- Chandler: there's nothing implementation-specific about anything you write
- Jeffrey: in the long run, i think this may make this unnecessary, but i don't think it has totally proven itself, or that its existence makes this proposal unnecesary in the near term
- Niklas: putting it in the language lets you have many flowers bloom
- Niklas: the strongest argument for language lets you tell the programmer explicitly what they need to be concerned about
- Chandler: if user mode scheduling begins to propagate, then the argument can be made that none of this is necessary
- Artur: there are two arguments for async: 1. the scalability of threads 2. GUI frameworks that are mostly single threaded today
- Torvald: if you resume and then the thread that is used for something else, then not all the OS state for the thread is restored
- Niklas: yes, the langauge proposal does not concern itself with that - that's entirely the concern of executors
- Niklas: the proposal for resumable functions depends on the .then proposal
- Torvald: you implicitly spawn a new execution agent, but you don't get a full OS thread
- Jeffrey: you get a detailed description of how you can use the OS primitives. you can switch threads between awaits. in the period between awaits, you can use TLS as you'd expect. you don't switch unexpectedly
- Torvald: so really you have more fine-grained control over how fat your execution agents are. in the end, this seems like an optimization to save some space. how many space do you actually save?
- Herb: There are perf optimizations where the compiler can help you, but that's not the motivation of the proposal. the motivation is that yes you can write everything with .then, but some things are super hard.
- Herb: you don't preserve the program structure you had before with .then
- Torvald: if OS threads didn't have any overhead, you wouldn't need any of this at all. so we're just trying to optimize the overhead that you have with OS threads
- Niklas: green threads are not that lightweight. what is lightweight is linked stacks. there's a spectrum of solutions here, some do well when integrating with I/O, some not
- Torvald: are we really looking at lightweight execution agents here?
- Niklas: there's several ways of looking at this. one is just being resigned to how libraries are best likely to represent async with futures
- Niklas: futures represent a composition challenge that they do not compose well with control flow constructs
- Niklas: let's imagine that we do introduce language and some of these library approaches are fruitful, then await reduces to .get(). meanwhile, you have been able to write code as an app dev that runs on a number of platforms
- Niklas: worst case here we introduce language that reduces to .get() 15 years from now
- Torvald: the complexity that we add in could reduce to a no-op in the future
- Torvald: the other way to go would be to see how much we could get by introducing lightweight execution agents
- Pablo: i started implementing a library version of this and AFAICT it seems like this should not be particularly complex or ugly
- Pablo: i'd be happy if this could be generalized to be more of a generator/coroutine solution than the specialized thing we have here
- Niklas: if we didn't have to worry about threads, contiguous stacks being expensive, or UI threads existing, then i think absolutely this is an approach to optimizing away some space
- Niklas: the reality is that threads are expensive
- Niklas: until we have low-cost threads, there are important compiler optimizations that will be necessary
- Pablo: Intel & MS seems to be arguing the opposite thing depending on the proposal at hand. e.g. MS wants parallelism as a library even though it totally requires compiler support, but here I am arguing the opposite for this proposal
- Chandler: i don't think that's an accurate characterization of the discussion taking place. they have presented some compelling arguments that there are syntactical differences that aren't expressable in the library due to control inversion
- Niklas: it is dealing with how apps to written to OS threads.
- Pablo: imagine await were a function call which could return on a different thread
- Jeffrey: that's not what functions do. a lot of the data parallel stuff act like functions, they do a lot of magic underneath, but they have a functional interface
- Olivier: even if we accepted that a function could return on another thread, isn't the problem that the curly braces are in the wrong place?
- Chandler: that's control inversion, that's a syntactical thing. there's no optimization, no critical change introduced
- Torvald: the goal is an optimization, that's what i'm saying
- Niklas: no, the goal is that there is an existing reality of asynchronous APIs. that's what we want to solve - asynchronous ops don't compose with control flow
- Herb: language vs library, i would love to see a library and show me side by side examples that are as easy to use
- Herb: the only reason we prefer a library to cilk spawn is that we think we can do better with a task group
- Herb: this is being proposed as a TS, it just lets us gain experience
- Herb: people have complained that C++ doesn't have await -- this is our biggest weakness in Win8 C++
- Herb: customer ported a major C++ code to Win8, had to rewrite in C# due to C++ missing await
- Clark: its for a TS, not a std
- Olivier: niklas, you said it's a valid implementation of await just to resolve to .get().
- Niklas: in a potential future where you don't care about the cost of resources
- Olivier: say apple wanted to implement await -- they could use .get()?
- Niklas: GUI threads are a precious resource. you don't want many 1MB or 3MB stacks sitting around
- Niklas: if we can make resources not so expensive, then async APIs can go away. if you have a resumable function which takes an await, you can see that as just a .get()
- Torvald: i suggest that it would be helpful to clarify what the proposal is getting us. specify precisely what properties of the execution agent are preserved if you use a resumable functions.
- Torvald: in other words, are they full OS threads or not? if they are full threads, there's very little you can do
- Niklas: sure
- Pablo: which threads does resume after the await? the thread that fulfills the future that we're awaiting on?
- Niklas: the intent is that that's where the executors come in. the resumable function has an affinity to the executor that starts it. it is an executor policy how it is resumed
- Niklas: unresolved at bristol was how to allow a developer to choose specifically another executor to run the continuation
- Torvald: i'd be interesting in seeing a specification of execution agents, or context switches, or how ever you do it
- Pablo: there's a garbage collection question. what happens if you discard the future that gets returned and nobody every fulfills the future that you're awaiting on?
- Niklas: if the future goes away, and the promise goes away, then the await goes away
- Chandler: a lot of this discussion is very confusion bc people seem to think there's some new functionality being introduced. AFACT, that's not the case.
- Chandler: what we have is a syntactically convenient way to write a function that eventually does a .then() and some magic to manage the lifetime of variables in that world
- Chandler: i don't understand why people think these questions are intrinsically tied to resumable functions
- Chandler: we already have the ability to write synchronous and asynchronous APIs, and people trying to make threads cheap enough to avoid asynchronous APIs, and yet we still have async APIs
- Niklas: on our platform, there's no imminent death of the async API. if it's not suited to a broader use, then we'll have it as an extension
- Chandler: I'm confident that there will always be async and sync APIs
- Jeffrey: I agree with Pablo that it would be nice if await were more powerful. i think i'm in favor because i see an evolutionary path toward making it more powerful
- Chandler: does anyone have anything bad to say about resumable functions?
- Chandler: this will be the first time that i have to teach a programmer that their local variable may be accessed by their caller during the lifetime of the function
- Niklas: The caller and callee run concurrently and can pass pointers between
- Herb: they run concurrently, and the resumable function has its own stack. so of course i can get ptrs from one stack to the other
- Chandler: i wonder whether it would help if calling a resumable fcn did not syntactically look like calling a non resumable fcn
- Herb: they need decoration on callee and call site
- Chandler: other languages have coroutines that decorate the call site
- Niklas: it separates your program into two worlds - synchrony & asynchrony. the decoration is where you cross that line
- Herb: by having a requirement to annotate the caller, there should be some benefit
- Chandler: the language i'm thinking of is go - there is a decorator on the call site
- Chandler: this proves useful because this function does not execute solely within the bounds of my call into it
- Herb: if you didn't have await then inside your callee is you'd call .then(). the caller doesn't have to care now, why should he care with await?
- Herb: as soon as the caller has to care about something, it defeats composability
- Niklas: today, if a fcn returns future<t>, that's red flag that there may be concurrency. this proposal is more of the same
- Hans: seems like this is non-trivial to implement, implementors, is it hard?
- Michael: don't have an option on whether this doable or not
- Niklas: the proposal has two implementations -- naive and fiber-based, more sophisticated implementations are possible
- Michael: what was the implementation effort?
- Niklas: the fiber-based one was straightforward. frontend is straightforward. the runtime that is most complex. a couple of man months of work.
- Niklas: a linked stack implementation would require more work, security issues require OS work on Windows
- Herb: the feature is simple enough that we're implementing it multiple times. all of the implementations put together are probably less effort than variadic templates
- Michael: changing the OS is a big problem
- Niklas: there's not necessarily a kernel, but if we tried to do linked-stacks today, it would be flagged as malware
- Herb: the main issue is asking the OS to do something. e.g. they insist that exception handlers have increasing addresses
- Herb: we can switch the stack pointers ourselves, but it's nice if the windows folks gave us the functionality
- break
- Jeffrey: My concern is that this produces functions that must be called exactly once. You get a function that represents the middle of a function. It's like what setjmp would give you.
- Alasdair: Google is full of code like that
- Niklas: I don't understand
- Jeffrey: If you write await somewhere in the middle of your fcn, you've got locals, some initialized, some not. The continuation of your function gets passed as a callback to .then().
- The future is required to call that fcn. As long as this is only bound to futures, it doesn't matter much. If this gets generalized, then it becomes more of a concern.
- Jeffrey: if then was every overrideable, then you get an extra constraint
- Niklas: if you were to expand it to other containers, then you would have to place the constraint on those
- Chandler: await is tied to future - this is a problem from a language/library design perspective. We need to define some sort of concept which has a .then, the .then returns a particular type
- Artur: we have a working implementation that does this
- Niklas: our internal implementation does that
- Jeffrey: will it be hard to implement the language feature if it is tied to a particular name in the library
- Niklas: in the absence of concepts, it's hard to
- Chandler: new expressions tie to language keywords
- Jared: doesn't range-based for do this?
- Chandler: yes, but it causes lots of problems
- Niklas: Should we work on generalized in terms of types that can be consumed & produced by resumable functions so that it is not tied to future? Is that desirable?
- Clark: How is this different from the way it was brought before Kona?
- Niklas: it isn't, it's going to back to the generality from before
- Clark: will the objections from Kona resurface?
- Niklas: The consistent feedback from Kona was that the proposal was too abstract, for pedagogical reasons, it would be useful to limit it to std::future
- Clark: maybe the problems is that we don't have concepts
- Chandler: the problem is that if you specify in terms of patterns, it becomes difficult to specify. It is very useful to specify language & library interactions in terms of abstractions of the library components
- Jeffrey: we could just say that this is just futures for the first version
- Chandler: the abstraction isn't to make it work with things that aren't future, but to be able to identify the surface area of interaction
- Niklas: we will rewrite the paper for Chicago
- N3666 C++ Latches and Barriers
- alasdair presenting, referred to draft paper attached to wiki.
- we're planning a series of straightforward lightweight idioms that would be useful to include in the std
- we'll focus on latch & barrier classes
- thread coordination mechanisms that allow a thread to block until an op has completed
- a latch is a single-use barrier
- create a latch with an internal counter
- one or more threads wait until that counter decrements to zero
- threads call count_down to decrement the counter by 1
- wait blocks the caller until count is 0
- try_wait returns true if count is 0, false otherwise, but doesn't block
- count_down_and_wait decrements and waits until 0
- barrier has an internal counter
- threads can decrement and block waiting until 0
- variants of constructor take either a void function or size_t function
- functions invoked when counter reaches 0
- size_t function returns a new value for the count
- can only reset the count when the barrier hits 0 (can't add threads to the barrier)
- allowing resize at arbitrary times is hard
- count_down_and_wait decrements by 1 and the caller blocks until 0
- count_down_and_wait invokes the completion function
- if you did not pass in a completion function, then the counter is reset to the initial value
- a thread can call count_down_and_wait in a tight loop. the next count_down_and_wait is applied to the next iteration of the counter
- Hans: why aren't count down and wait not separable?
- Olivier: there's an implicit reset when you hit 0. If you have separate count down and waits, you don't know which iteration of the counter you're on
- Michael: is this very similar to OpenMP-style barriers?
- Alasdair: one of the objections from Bristol was the way we handled reset. We've changed that so you have to explicitly register a completion function.
- Alasdair: sample use cases: latch - doing work in a thread pool and want to block until all threads are finished
- Alasdair: initializing a whole set of data structures. spawn a whole series of threads to do some work, then all wait on a latch.
- Alasdair: barrier is useful when coordinating a series of threads doing a repetitive task. example of a task doing something which runs in multiple stages
- Olivier: I believe your example is illegal based on some earlier preconditions -- the reset function is not allowed to return 0
- Alasdair: you're right
- ?: If you do return 0, maybe you can't use it again
- Olivier: then you'd have to define what it means to call count_down_and_wait when the counter is 0
- ?: it's undefined. you should never count down at 0
- Olivier: I like the improvements, and I wonder if we want a template argument which is taking in a conceptually callable which can be just a template function or in order to avoid type erasure of the reset function
- Chandler: I very much think it should be a Callable template argument because it's going to end up being a lambda so often. Callable lets us do that, std::function doesn't
- Jeffrey: You need make_barrier or something
- Chandler: no, we need template argument deduction for constructors
- Artur: how do you store the std::function?
- Chandler: as an instance of the Callable
- Olivier: it could default to a completely empty callable type that does nothing when called
- Olivier: my motivation is that not only does NV believe that barriers are useful, we've built it in silicon. we want the compiler to figure out that this whole thing flattens to a barrier instruction when available
- ?: opinions on names or flavors of names of latches? The only reason barrier is different is because you can reset it.
- Jeffrey: Is it possible for one thread calling wait on a latch, is it then safe for the thread to destroy the latch?
- Jeffrey: They need to specify that count down is an atomic operation
- Chandler: I would like to see RAII helpers built in for calling things like count down at the close of a scope because otherwise people will consistently fail to call count_down late enough
- Jeffrey: just add scope_guard
- Chandler: i'm talking about order of destruction problems. there's a destructor that needs to run before the counter reaches 0
- Lawrence: i'd rather not add ad hoc constructions for this in every c++ concurrency type
- Herb: something general like scope_guard would come from LWG. Something like finally would come from Evolution. Since this group is sensitive to these things, we may want to coordinate with them. Who volunteers to write a scope_guard paper?
- Chandler: I know that Google has a scope_guard, I hope that the folks from Google pushing this proposal would go ahead and do it
- Hans: I detect a consensus that we should get this into a TS
- Chandler: i would take this to LWG. it's not clear there are important interactions between Executors, futures, etc. that would benefit from latch_guard & barrier. These seem separable
- Lawrence: In principle I agree, but if we're going to make a mistake, it's more likely to be when we push it out before the rest of it is settled
- Lawrence: There's a number of things in the upcoming concurrency TS that will interact in weird ways. We should gain experience before standardization
- Olivier: i could imagine an Executor that created bulk parallel tasks with a shared barrier
- Olivier: The scheduling of the threads could be tied to the barrier. The barrier might not promise that it activates other threads. That's one of the most motivating examples of integrating the creation of the work with the creation of the barrier
- Michael: How much more efficient is latch than the barrier?
- ?: the barrier is pretty much two latches. but resetting a barrier is hard. There are implementations of barriers which are not two latches, particular a hw implementation which would be more interesting.
- Hans: I hear a consensus that we should proceed with this proposal in some sort of vehicle. Every library that I know of includes barriers, maybe not latches
Friday 2013-07-26
- N3425
Concurrent Unordered Associative Containers C++
- Arch presenting
- still looking at concurrent_unordered_map
- more from STL would be OK
- concurrent erasure is difficult to support
- looking at value-oriented alternatives
- concurrent queue interface
- considering a range of interfaces fairly close to java's ConcurrentHashMap
- maybe a form for doing transactions and reductions
- noticed that concurrent_queue returns item as a reference
- we received feedback to use optional<t> to return an item
- without optional<t>, issues with exception safety
- Lawrence: i don't think optional<t> is sufficient for the need, need something which is status + optional<t>
- Lawrence: has potential to reduce amount of interface
- Arch: what did concurrent_queue return?
- Lawrence: returns a status
- Arch: a boolean suffices for us
- Lawrence: an abstract interface will require more than a boolean
- Chandler: why?
- Lawrence: in the lock-free queue, you can return a status that says empty or conflict, couldn't complete operation. both are failures but for different reasons
- ?: closing the queue then pop and the queue is closed is different from just empty
- Chandler: using optional<t> to build the return type - status_or. implementation would be status + optional<>
- Lawrence: we just need a proposal on that type
- Chandler: will it be the same type for all containers, or specific to a container?
- Lawrence: i'd like to see it be a general thing
- Chandler: i think they should be idiomatically related. i suspect the set of statuses will not be common
- Arch: closing a hash table doesn't seem as motivated as closing a queue
- Chandler: I suspect the bool return to be enough for hash tables bc they don't have the stateful properties that queues have. queues also do synchronization
- ?: you could end up in a state where there's a difference between "i couldn't do that" and "the item's not there"
- Chandler: i'd prefer that distinction not exist in the interface
- Arch: i'd like feedback based on use cases that involve concurrent erasure
- Arch: one example is a server example which records state based on different threads
- Arch: another is software caches where one would like to do evictions from the cache
- Michael: we've been looking at an alternative to concurrent erasure. we'd like to consider a different model using read-copy-update. this is why paul is here today
- Paul: another use case is that you're trying to represent something in the external world - you're always out of date. when you're trying to faithfully represent the
- state of the world, erasure is important
- Arch: server case is a special case of that. it's useful as a general motivation
- Paul: If you're looking at routing in network, data collection on e.g. cars. i suggest to allow implementors to choose what deferred destruction mechanism they prefer
- Paul: if gc is available, they can use it. if hazard pointers work, they can do that
- Michael: if TM is available in the future, they could do that too
- Arch: we were concerned about the complexity of allowing that.
- Paul: one way would be to prepare to get me a reference on this object. this would give you the reference or tell you "nope, can't do it". you don't need that last for RCU or GC, but you do need it for hazard pointers
- Paul: you just have an internal method you use internally, maybe the client supplies it
- Arch: I think i see your point. we were also considering generalizing insertion as well
- Herb: sounds like policy-based design. we have other types like this, e.g. smart pointers. these haven't really taken off. it'd be interesting to see how this avoids or is affected by the known issues in policy-based design.
- Herb: e.g. the policy sometimes changes the semantics of the type. another one is that you get different types when the policy changes
- Paul: I'm not suggesting a template parameter
- Herb: It would be interesting to see how this would be affected by the known strengths and drawbacks of policy-based design
- Paul: one thing you can do is disallow some customization. if we choose carefully which semantics we provide, it should work fine
- Arch: one of our designers always wants to enable the "no mechanism at all" in order to ensure the table isn't engaging in concurrent deletion
- Chandler: i feel like it's important that we have the flexibility to allow concurrent deletion in different ways. i want to talk about the case where you don't have concurrent deletion, or objects aren't worth deleting
- Chandler: common case is a hash map of integers or pointers not owned by the container. it's essential that this container provides an optimal interface for this use case.
- Chandler: a copy-based interface is perfect for this. users are disappointed with the interface when the element is simple.
- Chandler: what if the copy-based interface is the only exported interface. and the policy-based deferred destruction interface is managed by container-optimized smart pointers
- Chandler: wrap an allocated instance in a wrapper that provides special semantics. you get the copy-based interface, but the copy you get is the wrapper interface that implements these semantics
- Chandler: the copy-out type is a wrapper type providing memory management behavior
- Chandler: lifetime management will be handled in a much better way
- Paul: is it semantically required to be a copy?
- Chandler: the return type of an accessor method has a value type (not ref or ptr), however the value type could be a wrapper containing a ptr to the shared obj
- Chandler: if you mutate through, you take the shared obj. do mutations to this instance affect other threads? is the obj the obj or just a view?
- Paul: that's my question. i'd like to see an implementation. when you say shared_ptr, i'm thinking expensive. if you're willing to accept that you make a copy and changes you make aren't visible elsewhere
- Chandler: we'd receive some smart ptr (not necessarily shared_ptr) which you can look through to mutate the shared obj. it has a deferred destruction semantic attached. you do deferred destruction however
- Paul: when you say smart ptr, what you're saying is it has semantics, but at runtime it has the cost of just a ptr? or do we go through methods every time we access?
- Chandler: for accesses to the underlying obj, it'd be going through a ptr. it would have methods which don't touch the underlying obj, but have methods to handle its lifetime
- Paul: OK, let me summarize. the operation of reference acquisition and the operation of saying i'm done with this ptr would execute code. but just saying i just want this field of this obj is just a ptr deref
- Chandler: absolutely
- Paul: interesting, i'd like to see more detail
- Chandler: it's easy to say it's a value-based map, but it's not. it's a value-based view to a map. the interfaces for deferred destruction will be customized smart ptrs
- Paul: do you envision this as standalone, or would you have hash tables of hash tables?
- Arch: if you ask for the value, you'd copy the entire table out
- Paul: unless you used an interface that didn't copy the whole thing
- Arch: if you have large objs, you make it a table of ptrs to the objs
- Chandler: this use case is rare
- Paul: it's not unusual to see hash tables of lists in the linux kernel
- Arch: what we don't want is to push retries on users
- Paul: hazard ptrs give you that. they give you much smaller mem footprint, non-blocking for updates as well as for reads
- Arch: for collisions, we're thinking of passing in a functor to resolve the collision
- Hans: I'm not sure i understand
- Arch: if you inserting in a serial STL container, you insert you get an iterator. if there's a duplicate you decide what to do
- Arch: in a concurrent environment, it's gotta be a single transaction
- Hans: the java interfaces have stuff like insert if absent. we're moving away from an STL interface anyway if it's a value-based interface
- Lawrence: we're gonna need the equivalent of compare-exchange operations. we can't decompose this
- Arch: actually that's a form of reduction
- Herb: what if we had GC in c++?
- Paul: you don't necessarily need atomics for release of an item. it depends on what deletion method you use
- Herb: what if we had GC in c++ of the following form: sometimes for perf reasons garbage collection is nice. what if you had something like a make_gc function
- which returns a garbage collector. all the roots are gc_ptr. if you had the ability to garbage collect, how would it affect this proposal? sometimes it would
- be really convenient to have a gc-enabled allocation
- Arch: in other words another version of a smart pointer
- Lawrence: one of the things i'd like to see is a separation of the abstract inteface from the concrete impl
- Jeffery: we need several concrete interfaces before we can abstract out the common interface
- Lawrence: part of the problem with concurrency is that the perf changes resolving from interface changes are very different.
- Lawrence: i don't think we're going to get away with one concrete type in the concurrent world bc the perf characteristics are very different
- ?: they probably have largely the same interface, but small differences. esp. with maps.
- Arch: being able to concurrent block while insertions are happening - some implementations can, others can't
- Paul: I believe gc is attractive, but it can't replace the other forms of destruction - can't compete on perf. i'm very strongly against having gc as the only destruction policy
- Hans: it seems that gc would play nicely with value-based, smart ptr-based interface
- Chandler: let me summarize Herb's suggestion: a mandatory available garbage collector, not a mandatory gc
- Herb: one of the unfortunate things about gc is that the systems which popularized it forced it on users
- Herb: the existing c++11 hooks for gc are unrelated to what i'm suggesting
- Herb: the idea is that it's opt-in, available in the standard. not required to use it. how would the availability affect this design?. this could make concurrent erasure possible
- Paul: the benefits of the other deferred destruction policies is that they eliminate the ABA problem
- Jeffrey: is the suggestion that a future version of this proposal show a family of implementations of this pointer in order to abstract the common interface?
- Lawrence: yes, that's what we did with the queue paper.
- Hans: we have some reasonable suggestions to pursue. do you need any more?
- Arch: any interest in a transactional interface to allow transactions on more than one table?
- Herb: when you have multiple shared objs of any kind it's the owner's responsibility to lock
- Hans: agreed. don't try to address in individual containers
- Torvald: what about the choice of linearizability as the correctness criteria. in the case of maps, this could be too strict. should we guarantee having total orders for accesses to the same element?
- Hans: i'd argue in favor of something that's linearizable by default. whether we want weaker containers is another question
- Jeffrey: if the correctness rule is not linearizability, what is it?
- Michael: why is linearizability a correctness criterion, i thought it's just a property
- Arch: if it's not linearizable, it's surprising
- Torvald: not everything is seq consistent
- Hans: it is, you can't tell
- Torvald: for mutexes, yes. for the operations, i think there is a difference. if you used acquire/release to synchronize on a certain element of the map, you get a different behavior if you use something which is not seq consistent
- Torvald: we could say we want total orders. if we say we have total orders on the full hash map (as the proposal does) or we could say we have a total orders on single element of the hash map
- Hans: so far the intent is to have seq consistency period (i.e., on all elements)
- Torvald: would it make sense to have something weaker?
- Hans: possibly with explicit relaxation
- Jeffery: one option is to treat each element as a memory location
- Hans: we've avoided that because you get implicity violations of seq consistency
- Lawrence: it's reasonable to have concrete implementations that provide different guarantees. some apps may accept a non-linearizable hash table. that'll be used by people who know what they're doing
- Torvald: that sounds reasonable. how do we provide both? one option is to provide two concrete implementations that give weak or strong guarantees. another is to make memory order a parameter to these functions
- Hans: the answer isn't obv to me. can't tell without looking at a design and implementation consequences
- Arch: you'd also have to look at perf gain in practice
- Hans: the worst case is probably power arm v7
- Hans: the consensus i hear is to explore value-based design and provide the seq consistent version by default. possibly providing other options if there is some perf benefit
- N3563 MapReduce
- Chris presenting
- this is a much higher-level construct than most of the discussions we've been having
- we took the Google version of MapReduce and apply it in a parallel way
- the Google implementations are all distributed, but there are obv use cases for data parallelism in a single node
- lot of things that Google does that don't apply in a parallel environment
- this is a process of removing all the cruft that Google has but still allowing it as an add-on
- what you do in a single machine is battling contention problems you run into in a single machine
- you have key/value pairs which get moved around. key is an ID, value is the payload
- the mapper stage is a super-high data parallel operation. shuffle rearranges by key for a separate stage
- lot of people use map reduce just for parallel for or map. real value is the shuffle & reduce
- the map reduce controller which does thread creation, blocking, is where all the meaningful work is done
- the tricky part is optimizing it in terms of number of threads
- a lot of the value of map reduce is that you can optimize aggressively. all of the contention is in the shuffle stage
- the mapper phase divides input into key/values
- shuffle sorts
- the reduce takes all the values for a single key and creates an output
- a shard is a chunk of data to group data together to improve throughput
- combiner can prereduce all inputs
- the MapReducer is the job controlling everything
- there's a mapper class and a reducer class. basically just templates. some functions are optional. classes passed in as template parameters to the map_reducer
- one created per thread
- the start(), flush(), abort() come from a more fault-tolerant view of the world
- when you call run() on a map_reduce obj, you receive some sort of structure (currently an input iterator) and create iterators for each mapper thread
- once all of those values are consumed, the shuffle stage starts and recombines values by key
- once the shuffle is complete, the reduce is called
- if you have commutative & associative, you don't need to see all the values, count words e.g. each mapper does the precounting, spits out one value per key
- the input splitter is a way of making the Google version work. gives you more parallelism, lets you chunk your data
- there's a shard function which controls how data is rearranged. controlling the shard function is useful for changing the perf characteristics of the framework
- here's an example implementation for word count
- file in, iterator for output. walk through and output values to iterator
- run_async is a non-blocking version, there's a wait function as well
- there's a done function to check for completion
- Olivier: why go with an object instead of more like an algorithm? one returns a future, one doesn't
- Chris: i wasn't thinking too much.
- Olivier: this last one could be a map_reduce algorithm which looks and feels more like the rest
- Chris: most template parameters will be deduced
- Jeffrey: there's nothing really better about passing 8 template parameters to class than a function
- Olivier: just something you see more often
- Chris: i go into some detail about how this could become a distributed version. the main difference is that you need serialization
- Chris: the implementation as it is is really more meant for the in-memory parallel version
- Chris: the main question is whether this is an interesting direction? what could be changed? it's a fairly high-level construct. how does this jive with the pipeline proposal?
- break
- Chris: fairly complex algo - can roll into Oliver et al's parallel alto concept? We think MR a useful use case for our own needs. Any thoughts?
- Adam: right level of controls/options? We took some out from the Google version.
- Jared: we tried to avoid knobs/controls in our parallel algos - what effect if MR did that?
- Oliver: our paper had serial or parallel option. Maybe have just this by default, plus optional extra knobs?
- Chris: we have MR Options container
- Artur: how extensible is this?
- Chris: basic alto fairly well specified. Maybe scheduling options (auto thread creation, etc.) A more fundamental change would be a change in data structure between the map and reduce phase - would be more complex to add that to the police (Do we use arrays? Queues?)
- Adam: Note that in theory this interface applies to both a single machine MR and a distributed one. Could at least re-use the concepts if going to multi-machine.
- Jared: Suppose stages were specified as algorithms? That's what Nidia's does.
- Chris: shuffle does real work - hard to specify independently.
- Lawrence: one concern with separating them out is that we'd have a barrier between stages - could affect scalability. Also, if we start with a simple interface that turns out to be inadequate we'd have a binary incompatibility going forwards.
- Chris: shuffle needs to be a barrier.
- Remote: How does shuffle interact with executors?
- Counters: Lawrence presenting
- D3706 is attached to wiki
- Lawrence: Note that inrements operations return void - we separate incrementing from reading. That's where we get performance. Also have an exchange op.
- Simplest type is Simplex - single global conter
- Jeffrey: Etymology of simplex?
- Lawrence: simplex and duplex protocols.
- Jeffrey: not sure that meaning is clear.
- Lawrence: clearer when you see duplex. Also, assume people will use namespace. Lots of redundancy in identifiers - assume namespace will help.
- Lawrence: for simplex, lots of threads means lots of cache line contention. So we have buffers that bring in a local variable that tracks counts. When goes out of scope, values are pushed to the main counter. Also an option to explicitly push up. BUT buffers add a latency to the total. SO, duplex counters distribute the count, but when you read you get the agregated value.
- Create a broker attached to the main counter. Broker can be thread-local. Main counter gets values form brokes whe doing a load. But requires write access to the broker, so needs to be threadsafe. Weak duplex does not provide an exchange.
- Feedback was that people wanted arrays. Can have a simple fixed-size array, but for concurrent access better to provide a specific array abstraction.
- Atomicity can be specified - none, some or full. Paper had a table summarising their use.
- Open issues - Do we want a dynarray for the counter array? (Internally we use this, but it's not exposed.)
- Core of implementation is the bumper - basic incrementable class.
- Clark: lots of similaryth betwn thsi and a Cilc adding-reducer. (Jeffrey asks if these are only usable inside a Cilc environment? Not sure - Pablo would know.)
- Lawrence: these can be useful for analysing the current stae/health of your system.
- Adam: dupluex and simplex counttrs have same intrface, so easy to switch if you discover thah simplexes are too slow?
- Mark: Has new scalable statistics counter paper. Probablistic, but can specify accuracy.
- Lawrence: these counters don't lose counts.
- Mark: we also have precise ones.
- Adam: at leaf you could only have +- 1, but would +N for batching.
- Mark: what's the actual guarantee? I don't know at the time I read it.
- Lawrence: when the system's quiescent. Eventual consistency.
- Lawrence: note that these counters imply no memory synchronisation.
- Jeffrey: in practice you'll see the results if you know that an external event has happened (e.g. a thread has produced output.) (Mark will send Lawrence a copy of the paper.)
- Naming suggestions - simple? Single (a single word of memory).
- Hans: the form you're most likely to use (thread-local) seems hard to use. (See 'thread_red' example in the paper.)
- Lawrence: no way to say 'give me a thread-local view of this global' in C++.
- Adam: could we have a factory?
- Lawrence: no way to make member vars threadlocal.
- Jeffrey mention Google's internal threadlocal class that can be used as a member.
- Lawrence will think about this - better way of handling threadlocal counters. Maybe something based on the Google class?
- Jeffrey: for naming, use tree metaphor - leaf, node, root?
- Lawrence: stacking a bit more commplex - can stack broker on duplex, but not broker on broker.
- Hans: any memory synchronisation issues? Is this sequentially consistent? Think not. Also, someone might try to use these as flags, but it won't work.
- Lawrence: not what it was designed for.
- Jeffrey: could use aquire-release - what would effect on ARM or Power PC be?
- Adam: note that there are atomicity options?
- Jeffrey: would aquire-release satify Hans?
- Lawrence: increment of buffer just happens on the stack. Then get transferred to the thread-local at end of function. At some point this needs to be a release.
- Lawrence: aquire and relase mutex on constrict/destruct of broker, and when do a load. Increments don't need a local relaease. Neex a mutex on load so that the set of brokers douesn't change. Don't want to add a relsase on every increment.
- Hans: concerned you can write buggy code becasue this may not be sequentially consistent.
- Lawrence: not a big deal to extentd increments to add this.
- Hans: worth considering. Concern that we are exposing sequential consistentcy issues that we have tried not to expose in other APIs.
- Jeffrey: are aquire-release sematics enough?
- Hans: probably OK - would be much happier with this.
- Jeffrey: maybe atomicity::none could turn this off.
- Should have longer offline discussion on this. Also look at Mark's paper. Next version should also mention Cilc counters. Also a section on naming options. (Please send Lawrence suggestions.) Slight concern about size of interface, but the various classes seem necessary.
- Lawrence: simplest approach is to just use the simplex counter. Only use something more if you have performance issues.
- N3696 - Bronek presenting
- Bronek ran through the paper. Followup questions were:
- Jeffrey: Does Nvidia implement some of this in hardware? Oliver: we have fetch min/max.
- Hans: may make sense to add ones with common h/w support. We have compare/exchange supported in most hardware. Is min and max above/below the bar?
- Jeffrey: The priority update fun is atomic store-if. Impls that supprt additional types can represent these as template types.
- Chandler: Compiler can recogise pattern when they show up at machine level. How to specify 'update'? Prefer store-if or predicated update. Not sure of best way to encode this, but would like to see a well-specified way of doing this.
- Mark: this code can go into an infinite loop with no concurrent things going on. If I read a value that matches the predicate but isn't in the memory location I'll keep looping.
- Bronek: See paper's reference [1] page 4. (Fig 2: Priority Update Implementation)
- Chandler: What should we name this?
- Hans: Agree that this is useful, but not sure it's simple enough to implement to be worth adding.
- Chandler: Concerned that he has seen the same approach before, but subtly broken. Easy to write almost-correct version that seems to work. Would be good to have an official, properly described version.
- Jeffrey: Store-if not a bad name, but could be better.
- Jeffrey: Conditions for an compare-exchange loop to finish (other thread doesn't modify) this returns and is lock-free. Don't have a use for this without a strict weak-ordering predicate.
- Chandler: Hesitant to call it store-if without a strict weak-ordering predicate. Maybe call std:monotonic_update
- Mark: We seem to disagree as to what this does/is used for. Think it completes in constant time in absence of another thread. E.g. I want to replace the stored value only if it's odd.
- Jeffrey: What's the goal for using this with a no strict-weak ordering predicate?
- Jeffrey: Predicate could be any function. Hence prefer store over update.
- Bronek: conditional-store-strong?
- Jeffrey: Don't have strength of store. Also, not that std library uses if with a predicate.
- Mark: Think this would be naming for a subclass of use-cases - this isn't just for monotonic.
- Jeffrey: Asking for other use-cases - would help us name it correctly.
- Mark: E.g. if 0 put my ID into it.
- Jeffrey: Predicate that says "put my value if not 0" is a strict-weak ordering.
- Chandler: Want to narrow this to the set of use-cases that works, and doesn't have bugs.
- Adam: Maybe specialised version for max and min.
- Jeffrey: store-if-less?
- Lawrence: Fine to have template on all of the atomics. Not sure fetch min/max needed on all of them.
- Jeffrey: Want these as members or helper functions?
- Jeffrey: reason fetch-add is there on atomic was to help compilers.
- Herb: Shouldn't rely on optimization on semantics.
- Herb: Store-if and fetch-add belong on atomic interface -they aren't algorithms.
- Chandler: First issue is can we turn special cases of this into a single instruction on certain compilers/CPUs. Second issue is do we want precise hand-written code for special cases.
- Chandler: Don't want to rely on template-specialisation for optimisation.
- Lawrence: WE have programmer reasons for special min/max impls. (I.e. special names for these functions.)
- Jeffrey: atomic_compare_exchange_weak an example of a free function working on shared pointer
- Hans: Should write this up as a standalone algorithm that also works on shared pointer. Should also have memory-order argument.
- Jeffrey: Shared pointer has explicit atomicity - should do this here for consistency.
- Hans: Should have free functions atomic_store_if, store_if_explicit.
-
N3678: C++ Stream Guards, Lawrence presenting:
- Lawrence: Stream guard paper sent to library committee, who were unhappy. Basic idea - can set up stream guard and write to it. Implementation either buffers output, or locks a mutex. Problem is recursion: Mixing mutexes and buffers would give different output depending on impl. Another proposal to try and fix race conditions, but this failed So we currently have no solution for concurrent output.
- Lawrence: suggest that stream guard is best approach. Apart from contention has no real issues.
- Hans: Agreed - we need something, and this seems to be the fallback.
- Lawrence: Problem with buffering is we have multiple buffers attached to a stream.
- Torvald: Can you flatten the buffers?
- Lawrence: if in same thread, probably. If not same thread, what?
- Torvald: Yes, that's a problem. Would that be a real issue?
- Lawrence: Still have effects if you mix them - see recursive locking section.
- Lawrence: If we want reliable flattening, need unbounded buffers.
- Torvald: If we offer to grab mutex when buffer gets too big?
- Lawrence: Then we get interleaving. Once you go into the buffer you're replicating the implelentation cost of streams.
- Torvald: Why separate stream mutex and stream guard?
- Lawrence: 3 proposals at Bristol. 1) Stream mutex (no buffer), 2) Stream guard - buffering, plus grabs mutex. 3) Weaker approach extending the granualrity of the buffering.
- Lawrence: Consider guard and mutex as two separate proposals. Was never intent to have both adopted. (Intent is to solve basic output problem.) Mutex rejected because of contention. But guard still needs a mutext eventually.
- Jeffrey: Have we considered a stream buffer that just does a flush on destruction. Do we have a guarantee that a single write call is atomic?
- Lawrence: Need to check. Should be atomic with e.g. stdout.
- Lawrence: If buffering is explicit (e.g. writing to stringstream) then no surprises.
- Lawrence: Stream mutex seems to solve problem. If you're doing that much I/O then you have other problems.
- Chandler: Concerns about side-effects of stream mutex - suppose code relies (perhaps unknowingly) on this. You turn off logging and your code starts breaking. (Or you run your code through a race detector and it says you're fine. Then you switch libraries and, because of the lack of happens-before relationships, you're now full of races.)
- Lawrence: Could live with explicit buffering. Implicit buffering causes trouble. State bits need specifying an implementing. Make sure each thread has exactly one buffer.
- Lawrence: Can we ask for an atomic I/O array dump?
- Chandler: On Linux definitely. Dinkumware would be harder. Would need to check about Posix & Apple.
- Lawrence: If we have an explicit buffer we can get around many of these issues.
- Hans: What state bits do we need to worry a out?
- Lawrence: E.g. base of outputted numbers.
- Clark: Thought that LWG objection was that if you had both mutax and buffering we'd have issues. If proposal was specific it would have been more acceptable.
- Conclusion: Explicit buffering seems like the best approach. Implicit buffering is confusing, stream mutexes have conention issues and Chandler's concernn about races.
Next Steps:
-
- Herb: Note that we expect to have a Library Extensions paper for Chicago. Also include vector "loops" (algorithms) as a library. Task-based parallelism should also be showing up.
- Chandler: Might want to split into multiple papers?
- Herb: Some concepts related.
- Michael: Is task-based the same as 'fork-join' parallelism?
- Chandler: Suggest a concurrency/synchronisation TS combining Futures, Executors, Resumable Functions. Another on data structures - queues, collections, counters. Finally an algorithm-focussed TS, MapReduce, Pipelines. Finally (?) a TS for fine-grained parallelism, e,g, Vector Loops, Cilk-based parallelism. Other stuff should be ready for their own TSes: Latches/Barriers, Queues.
- Pablo: Suggest executors as their own.
- Chandler: Executors straddle concurrency and parallelism - thinks Executors really need to go with Futures and Resumables.
- Herb: Futures/Resumables blocked by Executors.
- Jeffrey: Suggest splitting into sections.
- Chandler: Can allow vendors to say we support library but not language parts of TS.
- Lawrence: Concern about splitting algorithms and data structures, given the way that C++ std lib combines them. (I.e. uses iterators as inputs to algorithms.) We might lose something if we split TSes.
- Jeffrey - use concurrent vector as an example.
- Chandler: Note that these data structures are for concurrency, not parallelism.
- Herb: Can we invent parallelism, concurrency and sharing/synchronisation buckets? Then put stuff in that.
After various discussions we indentified Concurrency and Parallelism TSes. The Synchronisation TS is small, and may be rolled into Concurrency.
Concurrency TS
Part 1: (Publish First, 2014?)
- Future Extensions (then, wait_any, wait_all)
- Executors
- Resumable Functions, await (with futures)
Part 2:
- Counters
- Queues
- Concurrent Vector
- Unordered Associatve Containers
(Possibly in a separate Synchronisation TS?)
- Latches and Barriers
- upgrade_lock?
Parallelism TS, possibly with any of first 3?
- Parallel Algorithms (iterators, splittable range?)
- Data-Based Parallelism. (Vector, SIMD, ...)
- Task-based parallelism (cilk, fork-join)