[Concurrency] async/await + actors


(Pierre Habouzit) #21

My currently not very well formed opinion on this subject is that GCD queues are just what you need with these possibilities:
- this Actor queue can be targeted to other queues by the developer when he means for these actor to be executed in an existing execution context / locking domain,
- we disallow Actors to be directly targeted to GCD global concurrent queues ever
- for the other ones we create a new abstraction with stronger and better guarantees (typically limiting the number of possible threads servicing actors to a low number, not greater than NCPU).

Is there a specific important use case for being able to target an actor to an existing queue? Are you looking for advanced patterns where multiple actors (each providing disjoint mutable state) share an underlying queue? Would this be for performance reasons, for compatibility with existing code, or something else?

Mostly for interaction with current designs where being on a given bottom serial queue gives you the locking context for resources naturally attached to it.

Ok. I don’t understand the use-case well enough to know how we should model this. For example, is it important for an actor to be able to change its queue dynamically as it goes (something that sounds really scary to me) or can the “queue to use” be specified at actor initialization time?

I think I need to read more on actors, because the same way you're not an OS runtime expert, I'm not (or rather no longer, I started down that path a lifetime ago) a language expert at all, and I feel like I need to understand your world better to try to explain this part better to you.

No worries. Actually, after thinking about it a bit, I don’t think that switching underlying queues at runtime is scary.

The important semantic invariant which must be maintained is that there is only one thread executing within an actor context at a time. Switching around underlying queues (or even having multiple actors on the same queue) shouldn’t be a problem.

OTOH, you don’t want an actor “listening” to two unrelated queues, because there is nothing to synchronize between the queues, and you could have multiple actor methods invoked at the same time: you lose the protection of a single serial queue.

The only concern I’d have with an actor switching queues at runtime is that you don’t want a race condition where an item on QueueA goes to the actor, then it switches to QueueB, then another item from QueueB runs while the actor is already doing something for QueueA.

I think what you said made sense.

Ok, I captured this in yet-another speculative section:
https://gist.github.com/lattner/31ed37682ef1576b16bca1432ea9f782#intra-actor-concurrency

Great. BTW I agree 100% with:

That said, this is definitely a power-user feature, and we should understand, build, and get experience using the basic system before considering adding something like this.

Private concurrent queues are not a success in dispatch and cause several issues, these queues are second class citizens in GCD in terms of feature they support, and building something with concurrency *within* is hard. I would keep it as "that's where we'll go some day" but not try to attempt it until we've build the simpler (or rather less hard) purely serial case first.

Right, I agree this is not important for the short term. To clarify though, I meant to indicate that these actors would be implemented completely independently of dispatch, not that they’d build on private concurrent queues.

Another problem I haven't touched either is kernel-issued events (inbound IPC from other processes, networking events, etc...). Dispatch for the longest time used an indirection through a manager thread for all such events, and that had two major issues:

- the thread hops it caused, causing networking workloads to utilize up to 15-20% more CPU time than an equivalent manually made pthread parked in kevent(), because networking pace even when busy idles back all the time as far as the CPU is concerned, so dispatch queues never stay hot, and the context switch is not only a scheduled context switch but also has the cost of a thread bring up

- if you deliver all possible events this way you also deliver events that cannot possibly make progress because the execution context that will handle them is already "locked" (as in busy running something else.

It took us several years to get to the point we presented at WWDC this year where we deliver events directly to the right dispatch queue. If you only have very anonymous execution contexts then all this machinery is wasted and unused. However, this machinery has been evaluated and saves full percents of CPU load system-wide. I'd hate for us to go back 5 years here.

I don’t have anything intelligent to say here, but it sounds like you understand the issues well :slight_smile: I agree that giving up 5 years of progress is not appealing.

TBH our team has to explain into more depth how eventing works on Darwin, the same way we (or maybe it's just I, I don't want to disparage my colleagues here :P) need to understand Actors and what they mean better, I think the swift core team (and whoever works on concurrency) needs to be able to understand what I explained above.

Makes sense. In Swift 6 or 7 or whenever actors are a plausible release goal, a lot of smart people will need to come together to scrutinize all the details. Iteration and improvement over the course of a release cycle is also sensible.

The point of the document is to provide a long term vision of where things could go, primarily to unblock progress on async/await in the short term. For a very long time now, the objection to doing anything with async/await has been that people don’t feel that they know how and whether async/await would fit into the long term model for Swift concurrency. Indeed, the discussions around getting “context” into the async/await design are a concrete example of how considering the long term direction is important to help shape the immediate steps.

That said, we’re still a ways off from actually implementing an actor model, so we have some time to sort it out.

(2) is what scares me: the kernel has stuff to deliver, and kernels don't like to hold on data on behalf of userspace forever, because this burns wired memory. This means that this doesn't quite play nice with a global anonymous pool that can be starved at any time. Especially if we're talking XPC Connections in a daemon, where super high priority clients such as the frontmost app (or even worse, SpringBoard) can ask you questions that you'd better answer as fast as you can.

The solution we recommend to solve this to developers (1st and 3rd parties), is for all your XPC connections for your clients are rooted at the same bottom queue that represents your "communication" subsystem, so you can imagine it be the "Incoming request multiplexer" Actor or something, this one is in the 2nd category and is known to the kernel so that the kernel can instantiate the Actor itself without asking permission to userspace and directly make the execution context, and rely on the scheduler to get the priorities right.

Thanks for the explanation, I understand a lot better what you’re talking about now. To me, this sounds like the concern of a user space framework author (e.g. the folks writing Kitura), not the users of the frameworks. As such, I have no problem with making it “more difficult” to set up the right #2 abstractions.

I strongly disagree with this for several reasons.

Some Frameworks have a strong need for a serial context, some don't

It completely depends on the framework. If your framework is, say, a networking subsystem which is very asynchronous by nature for a long time, then yes, having the framework setup a #2 kind of guy inside it and have callbacks from/to this isolated context is just fine (and incidentally what your networking stack does).

However for some frameworks it makes very little sense to do this, they're better served using the "location" provided by their client and have some internal synchronization (locks) for the shared state they have. Too much framework code today creates their own #2 queue (if not queue*s*) all the time out of fear to be "blocked" by the client, but this leads to terrible performance.

[ disclaimer I don't know that Security.framework works this way or not, this is an hypothetical ]

For example, if you're using Security.framework stuff (that requires some state such as say your current security ephemeral keys and what not), using a private context instead of using the callers is really terribly bad because it causes tons of context-switches: such a framework should really *not* use a context itself, but a traditional lock to protect global state. The reason here is that the global state is really just a few keys and mutable contexts, but the big part of the work is the CPU time to (de)cipher, and you really want to parallelize as much as you can here, the shared state is not reason enough to hop.

It is tempting to say that we could still use a private queue to hop through to get the shared state and back to the caller, that'd be great if the caller would tail-call into the async to the Security framework and allow for the runtime to do a lightweight switch to the other queue, and then back. The problem is that real life code never does that: it will rarely tail call into the async (though with Swift async/await it would) but more importantly there's other stuff on the caller's context, so the OS will want to continue executing that, and then you will inevitably ask for a thread to drain that Security.framework async.

In our experience, the runtime can never optimize this Security async pattern by never using an extra thread for the Security work.

Top level contexts are a fundamental part of App (process) design

It is actually way better for the app developer to decide what the subsystems of the app are, and create well known #2 context for these. In our WWDC Talk we took the hypothetical example of News.app, that fetches stuff from RSS feeds, has a database to know what to fetch and what you read, the UI thread, and some networking parts to interact with the internet.

Such an app should upfront create 3 "#2" guys:
- the main thread for UI interactions (this one is made for you obviously)
- the networking handling context
- the database handling context

The flow of most of the app is: UI triggers action, which asks the database subsystem (brain) what to do, which possibly issues networking requests.
When a networking request is finished and that the assets have been reassembled on the network handling queue, it passes them back to the database/brain to decide how to redraw the UI, and issues the command to update the UI back to the UI.

At the OS layer we believe strongly that these 3 places should be made upfront and have strong identities. And it's not an advanced need, it should be made easy. The Advanced need is to have lots of these, and have subsystems that share state that use several of these contexts.

For everything else, I agree this hypothetical News.app can use an anonymous pools or reuse any of the top-level context it created, until it creates a scalability problem, in which case by [stress] testing the app, you can figure out which new subsystem needs to emerge. For example, maybe in a later version News.app wants beautiful articles and needs to precompute a bunch of things at the time the article is fetched, and that starts to take enough CPU that doing this on the networking context doesn't scale anymore. Then you just create a new top-level "Article Massaging" context, and migrate some of the workload there.

Why this manual partitionning?

It is our experience that the runtime cannot figure these partitions out by itself. and it's not only us, like I said earlier, Go can't either.

The runtime can't possibly know about locking domains, what your code may or may not hit (I mean it's equivalent to the termination problem so of course we can't guess it), or just data affinity which on asymmetric platforms can have a significant impact on your speed (NUMA machines, some big.LITTLE stuff, ...).

The default anonymous pool is fine for best effort work, no doubt we need to make it good, but it will never beat carefully partitioned subsystems.

we need to embrace it and explain to people that everywhere in a traditional POSIX world they would have used a real pthread_create()d thread to perform the work of a given subsystem, they create one such category #2 bottom queue that represents this thread (and you make this subsystem an Actor),

Makes sense. This sounds like a great opportunity for actors to push the world even farther towards sensible designs, rather than cargo culting the old threads+channels model.

It is, and this is exactly why I focus on your proposal a lot, I see a ton of value in it that go way beyond the expressiveness of the language.

Also, I think we should strongly encourage pure async “fire and forget” actor methods anyway - IOW, we should encourage push, not pull

I almost agree. We should strongly encourage the `pure async "account for, fire and forget" actor methods`. The `account for` is really backpressure, where you actually don't fire if the remote queue is full and instead rely on some kind of reactive pattern to pull from you. (but I know you wrote that on your proposal and you're aware of it).

Yep, I was trying to get across the developer mindset of “push, not pull” when it comes to decomposing problems and setting up the actor graph.

I think that - done right - the remote queue API can be done in a way where it looks like you’re writing naturally “push” code, but that the API takes care of making the right thing happen.

- since they provide much stronger guarantees in general.

It depends which guarantees you're talking about. I don't think this statement is true. Async work has good and strong properties when you write code in the "normal" priority ranges, what we refer as to "in the QoS world" on Darwin (from background up to UI work).

"stronger guarantees” is probably not the right way to express this. I’m talking about things like “if you don’t wait, it is much harder to create deadlocks”. Many problems are event-driven or streaming, which are naturally push. I can’t explain why I think this, but it seems to me that push designs encourage more functional approaches, but pull designs tend to be more imperative/stateful. The later feels familiar, but encourages the classical bugs we’re all used to :slight_smile:

However, there are tons of snowflakes on any platform that can't be in that world:
- media rendering (video/audio)
- HID (touch, gesture recognition, keyboard, mouses, trackpads, ...)
- some use cases of networking (bluetooth is a very good example, you hate when your audio drops with your bluetooth headset don't you?)
- ...

And these use cases are many, and run in otherwise regular processes all the time.

I think there is some misunderstanding here. I’m not saying that sync is bad, I’m only talking about the default abstraction and design patterns that people should reach for first.

The general design I’m shooting for here is to provide a default abstractions that work 80%+ of the time, allowing developers to have a natural first step to reach for when they build their code. However, any single abstraction will have limitations and problems in some use cases, and some of those snowflakes are SO important (media is a great example) that it isn’t acceptable to take any hit. This is why I think it is just as important to have an escape hatch. The biggest escape hatches we’ve talked about are multithreaded actors, but folks could also simply “not use actors” if they aren’t solving problems for them.

Swift aims to be pragmatic, not dogmatic. If you’re implementing a media decoder, write the thing in assembly if you want. My feelings won’t be hurt :slight_smile:

My concern was not about how you write their code, for all I care, they could use any language. It's about how they interact with the Swift world that I'm worried about.

Assuming these subsystem exist already and are implemented, it is our experience that it is completely impractical to ask from these subsystems to not ever interact with the rest of the world except through very gated interfaces. Eventually they need to use some kind of common/shared infrastructure, whether it's logging, some security/DRM decoding thing that needs to delegate to the SEP or some daemon, etc... and some of these generic OS layers would likely with time use Swift Actors.

Since await is asynchronous wait (IOW as my C-addicted brain translates it, equivalent to dispatch_group_notify(group, queue, ^{ tell me when what I'm 'waiting' on is done please })), that doesn't fly.
Those subsystem need to block synchronously with wait (no a) on a given Actor.

However it is a fact of life that these subsystems, have to interact with generic subsystems sometimes, and that mean they need to be able to synchronously wait on an actor, so that this actor's priority is elevated. And you can't waive this off, there are tons of legitimate reasons for very-high priorities subsystems to have to interact and wait on regular priority work.

I understand completely, which is why synchronous waiting is part of the model. Despite what I say above, I really don’t want people to avoid actors or write their code in assembly. :slight_smile:

My point about pull model is that it seems like the right *default* for people to reach for, not that it should be the only or exclusive mechanic proposed. This is one reason that I think it is important to introduce async/await before actors - so we have the right mechanic to build this waiting on top of.

I 100% agree with you that if *everything* was asynchronous and written this way, our lives would be great. I don't however think it's possible on real life operating system to write all your code this way. And this is exactly where things start to be *very* messy.

+1, again, this pragmatism is exactly why the proposal describes actor methods returning values, even though it is not part of the standard actor calculus that academia discusses:
https://gist.github.com/lattner/31ed37682ef1576b16bca1432ea9f782#extending-the-model-through-await

If you have some suggestion for how I can clarify the writing to address the apparent confusion here, please let me know and I’ll be happy to fix it.

Unless I misunderstood what await is dramatically, then I don't see where your write up addresses synchronous waiting anywhere yet.
Or is it that await turns into a synchronous wait if the function you're awaiting from is not an actor function? that would seem confusing to me.

-Pierre

···

On Sep 4, 2017, at 10:36 AM, Chris Lattner via swift-evolution <swift-evolution@swift.org> wrote:
On Sep 3, 2017, at 12:44 PM, Pierre Habouzit <phabouzit@apple.com <mailto:phabouzit@apple.com>> wrote:


(Pierre Habouzit) #22

Hello,

I have a little question about the actors.

On WWDC 2012 Session 712 one of the most important tips (for me at least) was: Improve Performance with Reader-Writer Access

Basically:
• Use concurrent subsystem queue: DISPATCH_QUEUE_CONCURRENT
• Use synchronous concurrent “reads”: dispatch_sync()
• Use asynchronous serialized “writes”: dispatch_barrier_async()

Example:
// ...
   _someManagerQueue = dispatch_queue_create("SomeManager", DISPATCH_QUEUE_CONCURRENT);
// ...

And then:

- (id) getSomeArrayItem:(NSUInteger) index {
    id importantObj = NULL;
    dispatch_sync(_someManagerQueue,^{
        id importantObj = [_importantArray objectAtIndex:index];
     });
   return importantObj;
}
- (void) removeSomeArrayItem:(id) object {
     dispatch_barrier_async(_someManagerQueue,^{
         [_importantArray removeObject:object];
     });
}
- (void) addSomeArrayItem:(id) object {
     dispatch_barrier_async(_someManagerQueue,^{
         [_importantArray addObject:object];
     });
}

That way you ensure that whenever you read an information (eg an array) all the "changes" have been made ​​or are "waiting" . And every time you write an information, your program will not be blocked waiting for the operation to be completed.

That way, if you use several threads, none will have to wait another to get any value unless one of them is "writing", which is right thing to do.

With this will it be composed using actors? I see a lot of discussion about using serial queues, and I also have not seen any mechanism similar to dispatch_barrier_async being discussed here or in other threads.

Actors are serial and exclusive, so this concurrent queue thing is not relevant.
Also, int he QoS world, using reader writer locks or private concurrent queues this way is not terribly great.
lastly for a simple writer like that you want dispatch_barrier_sync() not async (async will create a thread and it's terribly wasteful for so little work).

We covered this subtleties in this year's WWDC GCD session.

-Pierre

···

On Sep 4, 2017, at 7:27 AM, Wallacy via swift-evolution <swift-evolution@swift.org> wrote:

Em seg, 4 de set de 2017 às 08:20, Daniel Vollmer via swift-evolution <swift-evolution@swift.org <mailto:swift-evolution@swift.org>> escreveu:
Hello,

first off, I’m following this discussion with great interest, even though my background (simulation software on HPC) has a different focus than the “usual” paradigms Swift seeks to (primarily) address.

> On 3. Sep 2017, at 19:26, Chris Lattner via swift-evolution <swift-evolution@swift.org <mailto:swift-evolution@swift.org>> wrote:
>> On Sep 2, 2017, at 11:09 PM, Pierre Habouzit <phabouzit@apple.com <mailto:phabouzit@apple.com>> wrote:
>>> On Sep 2, 2017, at 12:19 PM, Pierre Habouzit <pierre@habouzit.net <mailto:pierre@habouzit.net>> wrote:
>>>
>>> Is there a specific important use case for being able to target an actor to an existing queue? Are you looking for advanced patterns where multiple actors (each providing disjoint mutable state) share an underlying queue? Would this be for performance reasons, for compatibility with existing code, or something else?
>>
>> Mostly for interaction with current designs where being on a given bottom serial queue gives you the locking context for resources naturally attached to it.
>
> Ok. I don’t understand the use-case well enough to know how we should model this. For example, is it important for an actor to be able to change its queue dynamically as it goes (something that sounds really scary to me) or can the “queue to use” be specified at actor initialization time?

I’m confused, but that may just be me misunderstanding things again. I’d assume each actor has its own (serial) queue that is used to serialize its messages, so the queue above refers to the queue used to actually process the messages the actor receives, correct?

Sometimes, I’d probably make sense (or even be required to fix this to a certain queue (in the thread(-pool?) sense), but at others it may just make sense to execute the messages in-place by the sender if they don’t block so no context switch is incurred.

> One plausible way to model this is to say that it is a “multithreaded actor” of some sort, where the innards of the actor allow arbitrary number of client threads to call into it concurrently. The onus would be on the implementor of the NIC or database to implement the proper synchronization on the mutable state within the actor.
>>
>> I think what you said made sense.
>
> Ok, I captured this in yet-another speculative section:
> https://gist.github.com/lattner/31ed37682ef1576b16bca1432ea9f782#intra-actor-concurrency

This seems like an interesting extension (where the actor-internal serial queue is not used / bypassed).

        Daniel.
_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org <mailto:swift-evolution@swift.org>
https://lists.swift.org/mailman/listinfo/swift-evolution
_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution


(Pierre Habouzit) #23

-Pierre

Sometimes, I’d probably make sense (or even be required to fix this to a certain queue (in the thread(-pool?) sense), but at others it may just make sense to execute the messages in-place by the sender if they don’t block so no context switch is incurred.

Do you mean kernel context switch? With well behaved actors, the runtime should be able to run work items from many different queues on the same kernel thread. The “queue switch cost” is designed to be very very low. The key thing is that the runtime needs to know when work on a queue gets blocked so the kernel thread can move on to servicing some other queues work.

My understanding is that a kernel thread can’t move on servicing a different queue while a block is executing on it. The runtime already know when a queue is blocked, and the only way it has to mitigate the problem is to spawn an other kernel thread to server the other queues. This is what cause the kernel thread explosion.

I’m not sure what you mean by “executing on it”. A work item that currently has a kernel thread can be doing one of two things: “executing work” (like number crunching) or “being blocked in the kernel on something that GCD doesn’t know about”.

However, the whole point is that work items shouldn’t do this: as you say it causes thread explosions. It is better for them to yield control back to GCD, which allows GCD to use the kernel thread for other queues, even though the original *queue* is blocked.

You're forgetting two things:

First off, when the work item stops doing work and gives up control, the kernel thread doesn't become instantaneously available. If you want the thread to be reusable to execute some asynchronously waited on work that the actor is handling, then you have to make sure to defer scheduling this work until the thread is in a reusable state.

Second, there may be other work enqueued already in this context, in which case, even if the current work item yields, what it's waiting on will create a new thread because the current context is used.

The first issue is something we can optimize (despite GCD not doing it), with tons of techniques, so let's not rathole into a discussion on it.
The second one is not something we can "fix". There will be cases when the correct thing to do is to linearize, and some cases when it's not. And you can't know upfront what the right decision was.

Something else I realized, is that this code is fundamentally broken in swift:

actor func foo()
{
    NSLock *lock = NSLock();
    lock.lock();

    let compute = await someCompute(); <--- this will really break `foo` in two pieces of code that can execute on two different physical threads.
    lock.unlock();
}

The reason why it is broken is that mutexes (whether it's NSLock, pthread_mutex, os_unfair_lock) have to be unlocked from the same thread that took it. the await right in the middle here means that we can't guarantee it.

There are numerous primitives that can't be used across an await call in this way:
- things that use the calling context identity in some object (such as locks, mutexes, ...)
- anything that attaches data to the context (TSDs)

The things in the first category have probably to be typed in a way that using them across an async or await is disallowed at compile time.
The things in the second category are Actor unsafe and need to move to other ways of doing the same.

-Pierre

···

On Sep 4, 2017, at 9:10 AM, Chris Lattner via swift-evolution <swift-evolution@swift.org> wrote:

On Sep 4, 2017, at 9:05 AM, Jean-Daniel <mailing@xenonium.com <mailto:mailing@xenonium.com>> wrote:


(Gwendal Roué) #24

I tend to believe that such read/write optimization could at least be implemented using the "Intra-actor concurrency" described by Chris Lattner at https://gist.github.com/lattner/31ed37682ef1576b16bca1432ea9f782#intra-actor-concurrency.

But you generally ask the question of reader vs. writer actor methods, that could be backed by dispatch_xxx/dispatch_barrier_xxx. I'm not sure it's as simple as mutating vs. non-mutating. For example, a non-mutating method can still cache the result of some expensive computation without breaking the non-mutating contract. Unless this cache is itself a read/write-safe actor, such non-mutating method is not a real reader method.

That's a very interesting topic, Wallacy!

Gwendal

···

Le 4 sept. 2017 à 16:28, Wallacy via swift-evolution <swift-evolution@swift.org> a écrit :

Hello,

I have a little question about the actors.

On WWDC 2012 Session 712 one of the most important tips (for me at least) was: Improve Performance with Reader-Writer Access

Basically:
• Use concurrent subsystem queue: DISPATCH_QUEUE_CONCURRENT
• Use synchronous concurrent “reads”: dispatch_sync()
• Use asynchronous serialized “writes”: dispatch_barrier_async()

[...]

With this will it be composed using actors? I see a lot of discussion about using serial queues, and I also have not seen any mechanism similar to dispatch_barrier_async being discussed here or in other threads.


(Wallacy) #25

"Actors are serial and exclusive, so this concurrent queue thing is not
relevant."

Always? That is something i can't understand. The proposal actually cites
the "Intra-actor concurrency"

"Also, int he QoS world, using reader writer locks or private concurrent
queues this way is not terribly great."

This I understand, makes sense.

"lastly for a simple writer like that you want dispatch_barrier_sync() not
async (async will create a thread and it's terribly wasteful for so little
work)."

Yes, dispatch_barrier_sync makes more sense here...

My point is:

The proposal already define something like: actor var, in another words, "a
special kind of var", and " Improve Performance with Reader-Writer Access"
is not only a "special case" on concurrence world, but if make in the right
way, is the only reasonable way to use a "class variable" (actor is special
class right?) on multithreaded environment. If i'm not wrong (again) queues
(concurrent/serial) help the "lock hell" problem.

It is only a thing to be considered before a final model is defined, thus
avoiding that in the future a big refactory is needed to solve something
that has not been considered now.

Okay to start small, I'm just trying to better visualize what may be
necessary in the future to make sure that what has been done now will be
compatible.

Thanks.

···

Em seg, 4 de set de 2017 às 16:06, Pierre Habouzit <phabouzit@apple.com> escreveu:

On Sep 4, 2017, at 7:27 AM, Wallacy via swift-evolution < > swift-evolution@swift.org> wrote:

Hello,

I have a little question about the actors.

On WWDC 2012 Session 712 one of the most important tips (for me at least)
was: Improve Performance with Reader-Writer Access

Basically:
• Use concurrent subsystem queue: DISPATCH_QUEUE_CONCURRENT
• Use synchronous concurrent “reads”: dispatch_sync()
• Use asynchronous serialized “writes”: dispatch_barrier_async()

Example:

// ...
   _someManagerQueue = dispatch_queue_create("SomeManager", DISPATCH_QUEUE_CONCURRENT);// ...

And then:

- (id) getSomeArrayItem:(NSUInteger) index {
    id importantObj = NULL;
    dispatch_sync(_someManagerQueue,^{
        id importantObj = [_importantArray objectAtIndex:index];
     });
   return importantObj;
}- (void) removeSomeArrayItem:(id) object {
     dispatch_barrier_async(_someManagerQueue,^{
         [_importantArray removeObject:object];
     });
}- (void) addSomeArrayItem:(id) object {
     dispatch_barrier_async(_someManagerQueue,^{
         [_importantArray addObject:object];
     });
}

That way you ensure that whenever you read an information (eg an array)
all the "changes" have been made ​​or are "waiting" . And every time you
write an information, your program will not be blocked waiting for the
operation to be completed.

That way, if you use several threads, none will have to wait another to
get any value unless one of them is "writing", which is right thing to do.

With this will it be composed using actors? I see a lot of discussion
about using serial queues, and I also have not seen any mechanism similar
to dispatch_barrier_async being discussed here or in other threads.

Actors are serial and exclusive, so this concurrent queue thing is not
relevant.
Also, int he QoS world, using reader writer locks or private concurrent
queues this way is not terribly great.
lastly for a simple writer like that you want dispatch_barrier_sync() not
async (async will create a thread and it's terribly wasteful for so little
work).

We covered this subtleties in this year's WWDC GCD session.

-Pierre

Em seg, 4 de set de 2017 às 08:20, Daniel Vollmer via swift-evolution < > swift-evolution@swift.org> escreveu:

Hello,

first off, I’m following this discussion with great interest, even though
my background (simulation software on HPC) has a different focus than the
“usual” paradigms Swift seeks to (primarily) address.

> On 3. Sep 2017, at 19:26, Chris Lattner via swift-evolution < >> swift-evolution@swift.org> wrote:
>> On Sep 2, 2017, at 11:09 PM, Pierre Habouzit <phabouzit@apple.com> >> wrote:
>>> On Sep 2, 2017, at 12:19 PM, Pierre Habouzit <pierre@habouzit.net> >> wrote:
>>>
>>> Is there a specific important use case for being able to target an
actor to an existing queue? Are you looking for advanced patterns where
multiple actors (each providing disjoint mutable state) share an underlying
queue? Would this be for performance reasons, for compatibility with
existing code, or something else?
>>
>> Mostly for interaction with current designs where being on a given
bottom serial queue gives you the locking context for resources naturally
attached to it.
>
> Ok. I don’t understand the use-case well enough to know how we should
model this. For example, is it important for an actor to be able to change
its queue dynamically as it goes (something that sounds really scary to me)
or can the “queue to use” be specified at actor initialization time?

I’m confused, but that may just be me misunderstanding things again. I’d
assume each actor has its own (serial) queue that is used to serialize its
messages, so the queue above refers to the queue used to actually process
the messages the actor receives, correct?

Sometimes, I’d probably make sense (or even be required to fix this to a
certain queue (in the thread(-pool?) sense), but at others it may just make
sense to execute the messages in-place by the sender if they don’t block so
no context switch is incurred.

> One plausible way to model this is to say that it is a “multithreaded
actor” of some sort, where the innards of the actor allow arbitrary number
of client threads to call into it concurrently. The onus would be on the
implementor of the NIC or database to implement the proper synchronization
on the mutable state within the actor.
>>
>> I think what you said made sense.
>
> Ok, I captured this in yet-another speculative section:
>
https://gist.github.com/lattner/31ed37682ef1576b16bca1432ea9f782#intra-actor-concurrency

This seems like an interesting extension (where the actor-internal serial
queue is not used / bypassed).

        Daniel.
_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution


(Chris Lattner) #26

Agreed, this is just as broken as:

func foo()
{
    let lock = NSLock()
    lock.lock()

    someCompute {
      lock.unlock()
    }
}

and it is just as broken as trying to do the same thing across queues. Stuff like this, or the use of TLS, is just inherently broken, both with GCD and with any sensible model underlying actors. Trying to fix this is not worth it IMO, it is better to be clear that they are different things and that (as a programmer) you should *expect* your tasks to run on multiple kernel threads.

BTW, why are you using a lock in a single threaded context in the first place??? :wink:

-Chris

···

On Sep 4, 2017, at 12:18 PM, Pierre Habouzit <phabouzit@apple.com> wrote:

Something else I realized, is that this code is fundamentally broken in swift:

actor func foo()
{
    NSLock *lock = NSLock();
    lock.lock();

    let compute = await someCompute(); <--- this will really break `foo` in two pieces of code that can execute on two different physical threads.
    lock.unlock();
}

The reason why it is broken is that mutexes (whether it's NSLock, pthread_mutex, os_unfair_lock) have to be unlocked from the same thread that took it. the await right in the middle here means that we can't guarantee it.


(Chris Lattner) #27

Hello,

I have a little question about the actors.

On WWDC 2012 Session 712 one of the most important tips (for me at least) was: Improve Performance with Reader-Writer Access

Basically:
• Use concurrent subsystem queue: DISPATCH_QUEUE_CONCURRENT
• Use synchronous concurrent “reads”: dispatch_sync()
• Use asynchronous serialized “writes”: dispatch_barrier_async()

[...]

With this will it be composed using actors? I see a lot of discussion about using serial queues, and I also have not seen any mechanism similar to dispatch_barrier_async being discussed here or in other threads.

I tend to believe that such read/write optimization could at least be implemented using the "Intra-actor concurrency" described by Chris Lattner at https://gist.github.com/lattner/31ed37682ef1576b16bca1432ea9f782#intra-actor-concurrency.

Right.

But you generally ask the question of reader vs. writer actor methods, that could be backed by dispatch_xxx/dispatch_barrier_xxx. I'm not sure it's as simple as mutating vs. non-mutating. For example, a non-mutating method can still cache the result of some expensive computation without breaking the non-mutating contract. Unless this cache is itself a read/write-safe actor, such non-mutating method is not a real reader method.

Right. A further concern is that while it is possible to encode reader/writer concerns into the type system, it makes the system as a whole more complicated. I’m really trying to start with a intentionally very simple model, because the following years of feature creep will provide ample opportunity to look at the problems that occur in practice and address them. Preemptively trying to solve problems that may not manifest in practice (or occur in different ways) can lead to designing solutions for problems that don’t exist or designing the wrong solution.

-Chris

···

On Sep 4, 2017, at 7:53 AM, Gwendal Roué via swift-evolution <swift-evolution@swift.org> wrote:

Le 4 sept. 2017 à 16:28, Wallacy via swift-evolution <swift-evolution@swift.org <mailto:swift-evolution@swift.org>> a écrit :


(Pierre Habouzit) #28

"Actors are serial and exclusive, so this concurrent queue thing is not relevant."

Always? That is something i can't understand. The proposal actually cites the "Intra-actor concurrency"

As a future extension yes, I don't think we should rush there :wink:
Dispatch has clearly failed at making intra-queue concurrency a first class citizen atm.

···

On Sep 5, 2017, at 9:29 AM, Wallacy via swift-evolution <swift-evolution@swift.org> wrote:

"Also, int he QoS world, using reader writer locks or private concurrent queues this way is not terribly great."

This I understand, makes sense.

"lastly for a simple writer like that you want dispatch_barrier_sync() not async (async will create a thread and it's terribly wasteful for so little work)."

Yes, dispatch_barrier_sync makes more sense here...

My point is:

The proposal already define something like: actor var, in another words, "a special kind of var", and " Improve Performance with Reader-Writer Access" is not only a "special case" on concurrence world, but if make in the right way, is the only reasonable way to use a "class variable" (actor is special class right?) on multithreaded environment. If i'm not wrong (again) queues (concurrent/serial) help the "lock hell" problem.

It is only a thing to be considered before a final model is defined, thus avoiding that in the future a big refactory is needed to solve something that has not been considered now.

Okay to start small, I'm just trying to better visualize what may be necessary in the future to make sure that what has been done now will be compatible.

Thanks.

Em seg, 4 de set de 2017 às 16:06, Pierre Habouzit <phabouzit@apple.com <mailto:phabouzit@apple.com>> escreveu:

On Sep 4, 2017, at 7:27 AM, Wallacy via swift-evolution <swift-evolution@swift.org <mailto:swift-evolution@swift.org>> wrote:

Hello,

I have a little question about the actors.

On WWDC 2012 Session 712 one of the most important tips (for me at least) was: Improve Performance with Reader-Writer Access

Basically:
• Use concurrent subsystem queue: DISPATCH_QUEUE_CONCURRENT
• Use synchronous concurrent “reads”: dispatch_sync()
• Use asynchronous serialized “writes”: dispatch_barrier_async()

Example:
// ...
   _someManagerQueue = dispatch_queue_create("SomeManager", DISPATCH_QUEUE_CONCURRENT);
// ...

And then:

- (id) getSomeArrayItem:(NSUInteger) index {
    id importantObj = NULL;
    dispatch_sync(_someManagerQueue,^{
        id importantObj = [_importantArray objectAtIndex:index];
     });
   return importantObj;
}
- (void) removeSomeArrayItem:(id) object {
     dispatch_barrier_async(_someManagerQueue,^{
         [_importantArray removeObject:object];
     });
}
- (void) addSomeArrayItem:(id) object {
     dispatch_barrier_async(_someManagerQueue,^{
         [_importantArray addObject:object];
     });
}

That way you ensure that whenever you read an information (eg an array) all the "changes" have been made ​​or are "waiting" . And every time you write an information, your program will not be blocked waiting for the operation to be completed.

That way, if you use several threads, none will have to wait another to get any value unless one of them is "writing", which is right thing to do.

With this will it be composed using actors? I see a lot of discussion about using serial queues, and I also have not seen any mechanism similar to dispatch_barrier_async being discussed here or in other threads.

Actors are serial and exclusive, so this concurrent queue thing is not relevant.
Also, int he QoS world, using reader writer locks or private concurrent queues this way is not terribly great.
lastly for a simple writer like that you want dispatch_barrier_sync() not async (async will create a thread and it's terribly wasteful for so little work).

We covered this subtleties in this year's WWDC GCD session.

-Pierre

Em seg, 4 de set de 2017 às 08:20, Daniel Vollmer via swift-evolution <swift-evolution@swift.org <mailto:swift-evolution@swift.org>> escreveu:
Hello,

first off, I’m following this discussion with great interest, even though my background (simulation software on HPC) has a different focus than the “usual” paradigms Swift seeks to (primarily) address.

> On 3. Sep 2017, at 19:26, Chris Lattner via swift-evolution <swift-evolution@swift.org <mailto:swift-evolution@swift.org>> wrote:
>> On Sep 2, 2017, at 11:09 PM, Pierre Habouzit <phabouzit@apple.com <mailto:phabouzit@apple.com>> wrote:
>>> On Sep 2, 2017, at 12:19 PM, Pierre Habouzit <pierre@habouzit.net <mailto:pierre@habouzit.net>> wrote:
>>>
>>> Is there a specific important use case for being able to target an actor to an existing queue? Are you looking for advanced patterns where multiple actors (each providing disjoint mutable state) share an underlying queue? Would this be for performance reasons, for compatibility with existing code, or something else?
>>
>> Mostly for interaction with current designs where being on a given bottom serial queue gives you the locking context for resources naturally attached to it.
>
> Ok. I don’t understand the use-case well enough to know how we should model this. For example, is it important for an actor to be able to change its queue dynamically as it goes (something that sounds really scary to me) or can the “queue to use” be specified at actor initialization time?

I’m confused, but that may just be me misunderstanding things again. I’d assume each actor has its own (serial) queue that is used to serialize its messages, so the queue above refers to the queue used to actually process the messages the actor receives, correct?

Sometimes, I’d probably make sense (or even be required to fix this to a certain queue (in the thread(-pool?) sense), but at others it may just make sense to execute the messages in-place by the sender if they don’t block so no context switch is incurred.

> One plausible way to model this is to say that it is a “multithreaded actor” of some sort, where the innards of the actor allow arbitrary number of client threads to call into it concurrently. The onus would be on the implementor of the NIC or database to implement the proper synchronization on the mutable state within the actor.
>>
>> I think what you said made sense.
>
> Ok, I captured this in yet-another speculative section:
> https://gist.github.com/lattner/31ed37682ef1576b16bca1432ea9f782#intra-actor-concurrency

This seems like an interesting extension (where the actor-internal serial queue is not used / bypassed).

        Daniel.
_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org <mailto:swift-evolution@swift.org>
https://lists.swift.org/mailman/listinfo/swift-evolution
_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org <mailto:swift-evolution@swift.org>
https://lists.swift.org/mailman/listinfo/swift-evolution

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution


(Wallacy) #29

Fair enough! Tranks!

···

Em ter, 5 de set de 2017 às 13:48, Pierre Habouzit <phabouzit@apple.com> escreveu:

On Sep 5, 2017, at 9:29 AM, Wallacy via swift-evolution < > swift-evolution@swift.org> wrote:

"Actors are serial and exclusive, so this concurrent queue thing is not
relevant."

Always? That is something i can't understand. The proposal actually cites
the "Intra-actor concurrency"

As a future extension yes, I don't think we should rush there :wink:
Dispatch has clearly failed at making intra-queue concurrency a first
class citizen atm.

"Also, int he QoS world, using reader writer locks or private concurrent
queues this way is not terribly great."

This I understand, makes sense.

"lastly for a simple writer like that you want dispatch_barrier_sync() not
async (async will create a thread and it's terribly wasteful for so little
work)."

Yes, dispatch_barrier_sync makes more sense here...

My point is:

The proposal already define something like: actor var, in another words,
"a special kind of var", and " Improve Performance with Reader-Writer
Access" is not only a "special case" on concurrence world, but if make in
the right way, is the only reasonable way to use a "class variable" (actor
is special class right?) on multithreaded environment. If i'm not wrong
(again) queues (concurrent/serial) help the "lock hell" problem.

It is only a thing to be considered before a final model is defined, thus
avoiding that in the future a big refactory is needed to solve something
that has not been considered now.

Okay to start small, I'm just trying to better visualize what may be
necessary in the future to make sure that what has been done now will be
compatible.

Thanks.

Em seg, 4 de set de 2017 às 16:06, Pierre Habouzit <phabouzit@apple.com> > escreveu:

On Sep 4, 2017, at 7:27 AM, Wallacy via swift-evolution < >> swift-evolution@swift.org> wrote:

Hello,

I have a little question about the actors.

On WWDC 2012 Session 712 one of the most important tips (for me at least)
was: Improve Performance with Reader-Writer Access

Basically:
• Use concurrent subsystem queue: DISPATCH_QUEUE_CONCURRENT
• Use synchronous concurrent “reads”: dispatch_sync()
• Use asynchronous serialized “writes”: dispatch_barrier_async()

Example:

// ...
   _someManagerQueue = dispatch_queue_create("SomeManager", DISPATCH_QUEUE_CONCURRENT);// ...

And then:

- (id) getSomeArrayItem:(NSUInteger) index {
    id importantObj = NULL;
    dispatch_sync(_someManagerQueue,^{
        id importantObj = [_importantArray objectAtIndex:index];
     });
   return importantObj;
}- (void) removeSomeArrayItem:(id) object {
     dispatch_barrier_async(_someManagerQueue,^{
         [_importantArray removeObject:object];
     });
}- (void) addSomeArrayItem:(id) object {
     dispatch_barrier_async(_someManagerQueue,^{
         [_importantArray addObject:object];
     });
}

That way you ensure that whenever you read an information (eg an array)
all the "changes" have been made ​​or are "waiting" . And every time you
write an information, your program will not be blocked waiting for the
operation to be completed.

That way, if you use several threads, none will have to wait another to
get any value unless one of them is "writing", which is right thing to do.

With this will it be composed using actors? I see a lot of discussion
about using serial queues, and I also have not seen any mechanism similar
to dispatch_barrier_async being discussed here or in other threads.

Actors are serial and exclusive, so this concurrent queue thing is not
relevant.
Also, int he QoS world, using reader writer locks or private concurrent
queues this way is not terribly great.
lastly for a simple writer like that you want dispatch_barrier_sync() not
async (async will create a thread and it's terribly wasteful for so little
work).

We covered this subtleties in this year's WWDC GCD session.

-Pierre

Em seg, 4 de set de 2017 às 08:20, Daniel Vollmer via swift-evolution < >> swift-evolution@swift.org> escreveu:

Hello,

first off, I’m following this discussion with great interest, even
though my background (simulation software on HPC) has a different focus
than the “usual” paradigms Swift seeks to (primarily) address.

> On 3. Sep 2017, at 19:26, Chris Lattner via swift-evolution < >>> swift-evolution@swift.org> wrote:
>> On Sep 2, 2017, at 11:09 PM, Pierre Habouzit <phabouzit@apple.com> >>> wrote:
>>> On Sep 2, 2017, at 12:19 PM, Pierre Habouzit <pierre@habouzit.net> >>> wrote:
>>>
>>> Is there a specific important use case for being able to target an
actor to an existing queue? Are you looking for advanced patterns where
multiple actors (each providing disjoint mutable state) share an underlying
queue? Would this be for performance reasons, for compatibility with
existing code, or something else?
>>
>> Mostly for interaction with current designs where being on a given
bottom serial queue gives you the locking context for resources naturally
attached to it.
>
> Ok. I don’t understand the use-case well enough to know how we should
model this. For example, is it important for an actor to be able to change
its queue dynamically as it goes (something that sounds really scary to me)
or can the “queue to use” be specified at actor initialization time?

I’m confused, but that may just be me misunderstanding things again. I’d
assume each actor has its own (serial) queue that is used to serialize its
messages, so the queue above refers to the queue used to actually process
the messages the actor receives, correct?

Sometimes, I’d probably make sense (or even be required to fix this to a
certain queue (in the thread(-pool?) sense), but at others it may just make
sense to execute the messages in-place by the sender if they don’t block so
no context switch is incurred.

> One plausible way to model this is to say that it is a “multithreaded
actor” of some sort, where the innards of the actor allow arbitrary number
of client threads to call into it concurrently. The onus would be on the
implementor of the NIC or database to implement the proper synchronization
on the mutable state within the actor.
>>
>> I think what you said made sense.
>
> Ok, I captured this in yet-another speculative section:
>
https://gist.github.com/lattner/31ed37682ef1576b16bca1432ea9f782#intra-actor-concurrency

This seems like an interesting extension (where the actor-internal
serial queue is not used / bypassed).

        Daniel.
_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution

_______________________________________________

swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution


(Elliott Harris) #30

My currently not very well formed opinion on this subject is that GCD queues are just what you need with these possibilities:
- this Actor queue can be targeted to other queues by the developer when he means for these actor to be executed in an existing execution context / locking domain,
- we disallow Actors to be directly targeted to GCD global concurrent queues ever
- for the other ones we create a new abstraction with stronger and better guarantees (typically limiting the number of possible threads servicing actors to a low number, not greater than NCPU).

Is there a specific important use case for being able to target an actor to an existing queue? Are you looking for advanced patterns where multiple actors (each providing disjoint mutable state) share an underlying queue? Would this be for performance reasons, for compatibility with existing code, or something else?

Mostly for interaction with current designs where being on a given bottom serial queue gives you the locking context for resources naturally attached to it.

Ok. I don’t understand the use-case well enough to know how we should model this. For example, is it important for an actor to be able to change its queue dynamically as it goes (something that sounds really scary to me) or can the “queue to use” be specified at actor initialization time?

I think I need to read more on actors, because the same way you're not an OS runtime expert, I'm not (or rather no longer, I started down that path a lifetime ago) a language expert at all, and I feel like I need to understand your world better to try to explain this part better to you.

No worries. Actually, after thinking about it a bit, I don’t think that switching underlying queues at runtime is scary.

The important semantic invariant which must be maintained is that there is only one thread executing within an actor context at a time. Switching around underlying queues (or even having multiple actors on the same queue) shouldn’t be a problem.

OTOH, you don’t want an actor “listening” to two unrelated queues, because there is nothing to synchronize between the queues, and you could have multiple actor methods invoked at the same time: you lose the protection of a single serial queue.

The only concern I’d have with an actor switching queues at runtime is that you don’t want a race condition where an item on QueueA goes to the actor, then it switches to QueueB, then another item from QueueB runs while the actor is already doing something for QueueA.

I think what you said made sense.

Ok, I captured this in yet-another speculative section:
https://gist.github.com/lattner/31ed37682ef1576b16bca1432ea9f782#intra-actor-concurrency

Great. BTW I agree 100% with:

That said, this is definitely a power-user feature, and we should understand, build, and get experience using the basic system before considering adding something like this.

Private concurrent queues are not a success in dispatch and cause several issues, these queues are second class citizens in GCD in terms of feature they support, and building something with concurrency *within* is hard. I would keep it as "that's where we'll go some day" but not try to attempt it until we've build the simpler (or rather less hard) purely serial case first.

Right, I agree this is not important for the short term. To clarify though, I meant to indicate that these actors would be implemented completely independently of dispatch, not that they’d build on private concurrent queues.

Another problem I haven't touched either is kernel-issued events (inbound IPC from other processes, networking events, etc...). Dispatch for the longest time used an indirection through a manager thread for all such events, and that had two major issues:

- the thread hops it caused, causing networking workloads to utilize up to 15-20% more CPU time than an equivalent manually made pthread parked in kevent(), because networking pace even when busy idles back all the time as far as the CPU is concerned, so dispatch queues never stay hot, and the context switch is not only a scheduled context switch but also has the cost of a thread bring up

- if you deliver all possible events this way you also deliver events that cannot possibly make progress because the execution context that will handle them is already "locked" (as in busy running something else.

It took us several years to get to the point we presented at WWDC this year where we deliver events directly to the right dispatch queue. If you only have very anonymous execution contexts then all this machinery is wasted and unused. However, this machinery has been evaluated and saves full percents of CPU load system-wide. I'd hate for us to go back 5 years here.

I don’t have anything intelligent to say here, but it sounds like you understand the issues well :slight_smile: I agree that giving up 5 years of progress is not appealing.

TBH our team has to explain into more depth how eventing works on Darwin, the same way we (or maybe it's just I, I don't want to disparage my colleagues here :P) need to understand Actors and what they mean better, I think the swift core team (and whoever works on concurrency) needs to be able to understand what I explained above.

Makes sense. In Swift 6 or 7 or whenever actors are a plausible release goal, a lot of smart people will need to come together to scrutinize all the details. Iteration and improvement over the course of a release cycle is also sensible.

The point of the document is to provide a long term vision of where things could go, primarily to unblock progress on async/await in the short term. For a very long time now, the objection to doing anything with async/await has been that people don’t feel that they know how and whether async/await would fit into the long term model for Swift concurrency. Indeed, the discussions around getting “context” into the async/await design are a concrete example of how considering the long term direction is important to help shape the immediate steps.

That said, we’re still a ways off from actually implementing an actor model, so we have some time to sort it out.

(2) is what scares me: the kernel has stuff to deliver, and kernels don't like to hold on data on behalf of userspace forever, because this burns wired memory. This means that this doesn't quite play nice with a global anonymous pool that can be starved at any time. Especially if we're talking XPC Connections in a daemon, where super high priority clients such as the frontmost app (or even worse, SpringBoard) can ask you questions that you'd better answer as fast as you can.

The solution we recommend to solve this to developers (1st and 3rd parties), is for all your XPC connections for your clients are rooted at the same bottom queue that represents your "communication" subsystem, so you can imagine it be the "Incoming request multiplexer" Actor or something, this one is in the 2nd category and is known to the kernel so that the kernel can instantiate the Actor itself without asking permission to userspace and directly make the execution context, and rely on the scheduler to get the priorities right.

Thanks for the explanation, I understand a lot better what you’re talking about now. To me, this sounds like the concern of a user space framework author (e.g. the folks writing Kitura), not the users of the frameworks. As such, I have no problem with making it “more difficult” to set up the right #2 abstractions.

I think Pierre has some really interesting points here, and I wanted to highlight them with some context from a real application / framework. Perhaps it can help a broader audience understand some of these points and help influence the overall design of the concurrency model.

A few years ago, we moved Camera and some related internal frameworks from being pretty naive clients of GCD to embracing the philosophy Pierre has outlined. We learned a lot.

I strongly disagree with this for several reasons.

Some Frameworks have a strong need for a serial context, some don't

It completely depends on the framework. If your framework is, say, a networking subsystem which is very asynchronous by nature for a long time, then yes, having the framework setup a #2 kind of guy inside it and have callbacks from/to this isolated context is just fine (and incidentally what your networking stack does).

However for some frameworks it makes very little sense to do this, they're better served using the "location" provided by their client and have some internal synchronization (locks) for the shared state they have. Too much framework code today creates their own #2 queue (if not queue*s*) all the time out of fear to be "blocked" by the client, but this leads to terrible performance.

[ disclaimer I don't know that Security.framework works this way or not, this is an hypothetical ]

For example, if you're using Security.framework stuff (that requires some state such as say your current security ephemeral keys and what not), using a private context instead of using the callers is really terribly bad because it causes tons of context-switches: such a framework should really *not* use a context itself, but a traditional lock to protect global state. The reason here is that the global state is really just a few keys and mutable contexts, but the big part of the work is the CPU time to (de)cipher, and you really want to parallelize as much as you can here, the shared state is not reason enough to hop.

It is tempting to say that we could still use a private queue to hop through to get the shared state and back to the caller, that'd be great if the caller would tail-call into the async to the Security framework and allow for the runtime to do a lightweight switch to the other queue, and then back. The problem is that real life code never does that: it will rarely tail call into the async (though with Swift async/await it would) but more importantly there's other stuff on the caller's context, so the OS will want to continue executing that, and then you will inevitably ask for a thread to drain that Security.framework async.

In our experience, the runtime can never optimize this Security async pattern by never using an extra thread for the Security work.

This was a really common problem for us. We would often, without really realizing it, spin up a thread for very trivial work. In a lot of cases, this was the result of over-zealous asynchronous API design. We were so terrified of blocking, and with GCD we could “very easily” avoid it. We didn’t stop and think about the resource usage of what we were asking the system to do. We didn’t really need a thread to be spun up for us to update a dictionary.

After reading the proposal, I came away wanting a better understanding of when async/await might involve that sort of resource usage. Understandably, that may be more of a runtime question. However, a goal of the proposal was a solution which scaled to millions of tasks. If async/await is proposed and implemented in Swift 5, we will need to tackle this with a solution that does not involve Actors.

Is this something we’re interested in discussing right now? I’m still pretty curious how we practically get to millions of tasks, even though I agree that GCD is up to the challenge. I don’t think it will be on the back of millions of queues though.

Top level contexts are a fundamental part of App (process) design

It is actually way better for the app developer to decide what the subsystems of the app are, and create well known #2 context for these. In our WWDC Talk we took the hypothetical example of News.app, that fetches stuff from RSS feeds, has a database to know what to fetch and what you read, the UI thread, and some networking parts to interact with the internet.

Such an app should upfront create 3 "#2" guys:
- the main thread for UI interactions (this one is made for you obviously)
- the networking handling context
- the database handling context

The flow of most of the app is: UI triggers action, which asks the database subsystem (brain) what to do, which possibly issues networking requests.
When a networking request is finished and that the assets have been reassembled on the network handling queue, it passes them back to the database/brain to decide how to redraw the UI, and issues the command to update the UI back to the UI.

At the OS layer we believe strongly that these 3 places should be made upfront and have strong identities. And it's not an advanced need, it should be made easy. The Advanced need is to have lots of these, and have subsystems that share state that use several of these contexts.

Having experienced the benefits of embracing this design philosophy, I’m inclined to agree on the overall approach. I’ve had a lot of success with this approach on Darwin, but I’m not that familiar with other platforms. I would think it’s universally applicable, but maybe others could chime in on other approaches?

Actors and CoroutineContext (see: "Contextualizing async coroutines”) both seem to provide some facilities for achieving this design. Maybe it’s enough for one or both of those to encourage this design somehow, if we decide it is worth encouraging.

Pierre, were you thinking something different than either of those two approaches?

For everything else, I agree this hypothetical News.app can use an anonymous pools or reuse any of the top-level context it created, until it creates a scalability problem, in which case by [stress] testing the app, you can figure out which new subsystem needs to emerge. For example, maybe in a later version News.app wants beautiful articles and needs to precompute a bunch of things at the time the article is fetched, and that starts to take enough CPU that doing this on the networking context doesn't scale anymore. Then you just create a new top-level "Article Massaging" context, and migrate some of the workload there.

The level of re-architecture involved in encountering a scalability problem is usually pretty scary. As a rule, you are usually introduced to the problem at the worst possible time too. In our case, it prompted our adoption of the architecture you’re evangelizing.

I think there’s a lot of work to do here. Ideally, it would be just that simple, but those two systems likely grew entangled together while they were in close proximity. A great deal of discipline would have been needed to make that transition easy. Hopefully we can solve this some day.

Personally, I think this leads down a really interesting line of thought about how to construct software, but that’s probably best for another thread or over a drink.

Why this manual partitionning?

It is our experience that the runtime cannot figure these partitions out by itself. and it's not only us, like I said earlier, Go can't either.

The runtime can't possibly know about locking domains, what your code may or may not hit (I mean it's equivalent to the termination problem so of course we can't guess it), or just data affinity which on asymmetric platforms can have a significant impact on your speed (NUMA machines, some big.LITTLE stuff, ...).

The default anonymous pool is fine for best effort work, no doubt we need to make it good, but it will never beat carefully partitioned subsystems.

Spending some time thinking about how we might encourage people to trend toward good concurrent software design as a part of this proposal seems worthwhile. I’m not sure this proposal is quite opinionated enough yet, but I know it’s very early. :slight_smile:

we need to embrace it and explain to people that everywhere in a traditional POSIX world they would have used a real pthread_create()d thread to perform the work of a given subsystem, they create one such category #2 bottom queue that represents this thread (and you make this subsystem an Actor),

Makes sense. This sounds like a great opportunity for actors to push the world even farther towards sensible designs, rather than cargo culting the old threads+channels model.

It is, and this is exactly why I focus on your proposal a lot, I see a ton of value in it that go way beyond the expressiveness of the language.

Also, I think we should strongly encourage pure async “fire and forget” actor methods anyway - IOW, we should encourage push, not pull

I almost agree. We should strongly encourage the `pure async "account for, fire and forget" actor methods`. The `account for` is really backpressure, where you actually don't fire if the remote queue is full and instead rely on some kind of reactive pattern to pull from you. (but I know you wrote that on your proposal and you're aware of it).

Yep, I was trying to get across the developer mindset of “push, not pull” when it comes to decomposing problems and setting up the actor graph.

I think that - done right - the remote queue API can be done in a way where it looks like you’re writing naturally “push” code, but that the API takes care of making the right thing happen.

- since they provide much stronger guarantees in general.

It depends which guarantees you're talking about. I don't think this statement is true. Async work has good and strong properties when you write code in the "normal" priority ranges, what we refer as to "in the QoS world" on Darwin (from background up to UI work).

"stronger guarantees” is probably not the right way to express this. I’m talking about things like “if you don’t wait, it is much harder to create deadlocks”. Many problems are event-driven or streaming, which are naturally push. I can’t explain why I think this, but it seems to me that push designs encourage more functional approaches, but pull designs tend to be more imperative/stateful. The later feels familiar, but encourages the classical bugs we’re all used to :slight_smile:

However, there are tons of snowflakes on any platform that can't be in that world:
- media rendering (video/audio)
- HID (touch, gesture recognition, keyboard, mouses, trackpads, ...)
- some use cases of networking (bluetooth is a very good example, you hate when your audio drops with your bluetooth headset don't you?)
- ...

And these use cases are many, and run in otherwise regular processes all the time.

I think there is some misunderstanding here. I’m not saying that sync is bad, I’m only talking about the default abstraction and design patterns that people should reach for first.

The general design I’m shooting for here is to provide a default abstractions that work 80%+ of the time, allowing developers to have a natural first step to reach for when they build their code. However, any single abstraction will have limitations and problems in some use cases, and some of those snowflakes are SO important (media is a great example) that it isn’t acceptable to take any hit. This is why I think it is just as important to have an escape hatch. The biggest escape hatches we’ve talked about are multithreaded actors, but folks could also simply “not use actors” if they aren’t solving problems for them.

Swift aims to be pragmatic, not dogmatic. If you’re implementing a media decoder, write the thing in assembly if you want. My feelings won’t be hurt :slight_smile:

My concern was not about how you write their code, for all I care, they could use any language. It's about how they interact with the Swift world that I'm worried about.

Assuming these subsystem exist already and are implemented, it is our experience that it is completely impractical to ask from these subsystems to not ever interact with the rest of the world except through very gated interfaces. Eventually they need to use some kind of common/shared infrastructure, whether it's logging, some security/DRM decoding thing that needs to delegate to the SEP or some daemon, etc... and some of these generic OS layers would likely with time use Swift Actors.

Since await is asynchronous wait (IOW as my C-addicted brain translates it, equivalent to dispatch_group_notify(group, queue, ^{ tell me when what I'm 'waiting' on is done please })), that doesn't fly.
Those subsystem need to block synchronously with wait (no a) on a given Actor.

Is that because they’re trying to provide a synchronous API on something that is inherently asynchronous (all Actor communication)?

However it is a fact of life that these subsystems, have to interact with generic subsystems sometimes, and that mean they need to be able to synchronously wait on an actor, so that this actor's priority is elevated. And you can't waive this off, there are tons of legitimate reasons for very-high priorities subsystems to have to interact and wait on regular priority work.

I understand completely, which is why synchronous waiting is part of the model. Despite what I say above, I really don’t want people to avoid actors or write their code in assembly. :slight_smile:

My point about pull model is that it seems like the right *default* for people to reach for, not that it should be the only or exclusive mechanic proposed. This is one reason that I think it is important to introduce async/await before actors - so we have the right mechanic to build this waiting on top of.

I 100% agree with you that if *everything* was asynchronous and written this way, our lives would be great. I don't however think it's possible on real life operating system to write all your code this way. And this is exactly where things start to be *very* messy.

This is one of the topics I’d like to explore more on a separate thread – asynchrony is viral. We’ve got a lot of supporting evidence of this and, as proposed, async/await look to continue this trend.

Cheers,
Elliott

···

On Sep 4, 2017, at 11:40 AM, Pierre Habouzit via swift-evolution <swift-evolution@swift.org> wrote:

On Sep 4, 2017, at 10:36 AM, Chris Lattner via swift-evolution <swift-evolution@swift.org <mailto:swift-evolution@swift.org>> wrote:
On Sep 3, 2017, at 12:44 PM, Pierre Habouzit <phabouzit@apple.com <mailto:phabouzit@apple.com>> wrote:

+1, again, this pragmatism is exactly why the proposal describes actor methods returning values, even though it is not part of the standard actor calculus that academia discusses:
https://gist.github.com/lattner/31ed37682ef1576b16bca1432ea9f782#extending-the-model-through-await

If you have some suggestion for how I can clarify the writing to address the apparent confusion here, please let me know and I’ll be happy to fix it.

Unless I misunderstood what await is dramatically, then I don't see where your write up addresses synchronous waiting anywhere yet.
Or is it that await turns into a synchronous wait if the function you're awaiting from is not an actor function? that would seem confusing to me.

-Pierre

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org <mailto:swift-evolution@swift.org>
https://lists.swift.org/mailman/listinfo/swift-evolution


(Pierre Habouzit) #31

Something else I realized, is that this code is fundamentally broken in swift:

actor func foo()
{
    NSLock *lock = NSLock();
    lock.lock();

    let compute = await someCompute(); <--- this will really break `foo` in two pieces of code that can execute on two different physical threads.
    lock.unlock();
}

The reason why it is broken is that mutexes (whether it's NSLock, pthread_mutex, os_unfair_lock) have to be unlocked from the same thread that took it. the await right in the middle here means that we can't guarantee it.

Agreed, this is just as broken as:

func foo()
{
    let lock = NSLock()
    lock.lock()

    someCompute {
      lock.unlock()
    }
}

and it is just as broken as trying to do the same thing across queues. Stuff like this, or the use of TLS, is just inherently broken, both with GCD and with any sensible model underlying actors. Trying to fix this is not worth it IMO, it is better to be clear that they are different things and that (as a programmer) you should *expect* your tasks to run on multiple kernel threads.

BTW, why are you using a lock in a single threaded context in the first place??? :wink:

I don't do locks, I do atomics as a living.

Joke aside, it's easy to write this bug we should try to have the compiler/analyzer help here for these broken patterns.
TSD is IMO less of a problem because people using them are aware of their sharp edges. Not so much for locks.

-Pierre

···

On Sep 11, 2017, at 9:00 PM, Chris Lattner via swift-evolution <swift-evolution@swift.org> wrote:
On Sep 4, 2017, at 12:18 PM, Pierre Habouzit <phabouzit@apple.com <mailto:phabouzit@apple.com>> wrote:

-Chris

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution


(Pierre Habouzit) #32

My currently not very well formed opinion on this subject is that GCD queues are just what you need with these possibilities:
- this Actor queue can be targeted to other queues by the developer when he means for these actor to be executed in an existing execution context / locking domain,
- we disallow Actors to be directly targeted to GCD global concurrent queues ever
- for the other ones we create a new abstraction with stronger and better guarantees (typically limiting the number of possible threads servicing actors to a low number, not greater than NCPU).

Is there a specific important use case for being able to target an actor to an existing queue? Are you looking for advanced patterns where multiple actors (each providing disjoint mutable state) share an underlying queue? Would this be for performance reasons, for compatibility with existing code, or something else?

Mostly for interaction with current designs where being on a given bottom serial queue gives you the locking context for resources naturally attached to it.

Ok. I don’t understand the use-case well enough to know how we should model this. For example, is it important for an actor to be able to change its queue dynamically as it goes (something that sounds really scary to me) or can the “queue to use” be specified at actor initialization time?

I think I need to read more on actors, because the same way you're not an OS runtime expert, I'm not (or rather no longer, I started down that path a lifetime ago) a language expert at all, and I feel like I need to understand your world better to try to explain this part better to you.

No worries. Actually, after thinking about it a bit, I don’t think that switching underlying queues at runtime is scary.

The important semantic invariant which must be maintained is that there is only one thread executing within an actor context at a time. Switching around underlying queues (or even having multiple actors on the same queue) shouldn’t be a problem.

OTOH, you don’t want an actor “listening” to two unrelated queues, because there is nothing to synchronize between the queues, and you could have multiple actor methods invoked at the same time: you lose the protection of a single serial queue.

The only concern I’d have with an actor switching queues at runtime is that you don’t want a race condition where an item on QueueA goes to the actor, then it switches to QueueB, then another item from QueueB runs while the actor is already doing something for QueueA.

I think what you said made sense.

Ok, I captured this in yet-another speculative section:
https://gist.github.com/lattner/31ed37682ef1576b16bca1432ea9f782#intra-actor-concurrency

Great. BTW I agree 100% with:

That said, this is definitely a power-user feature, and we should understand, build, and get experience using the basic system before considering adding something like this.

Private concurrent queues are not a success in dispatch and cause several issues, these queues are second class citizens in GCD in terms of feature they support, and building something with concurrency *within* is hard. I would keep it as "that's where we'll go some day" but not try to attempt it until we've build the simpler (or rather less hard) purely serial case first.

Right, I agree this is not important for the short term. To clarify though, I meant to indicate that these actors would be implemented completely independently of dispatch, not that they’d build on private concurrent queues.

Another problem I haven't touched either is kernel-issued events (inbound IPC from other processes, networking events, etc...). Dispatch for the longest time used an indirection through a manager thread for all such events, and that had two major issues:

- the thread hops it caused, causing networking workloads to utilize up to 15-20% more CPU time than an equivalent manually made pthread parked in kevent(), because networking pace even when busy idles back all the time as far as the CPU is concerned, so dispatch queues never stay hot, and the context switch is not only a scheduled context switch but also has the cost of a thread bring up

- if you deliver all possible events this way you also deliver events that cannot possibly make progress because the execution context that will handle them is already "locked" (as in busy running something else.

It took us several years to get to the point we presented at WWDC this year where we deliver events directly to the right dispatch queue. If you only have very anonymous execution contexts then all this machinery is wasted and unused. However, this machinery has been evaluated and saves full percents of CPU load system-wide. I'd hate for us to go back 5 years here.

I don’t have anything intelligent to say here, but it sounds like you understand the issues well :slight_smile: I agree that giving up 5 years of progress is not appealing.

TBH our team has to explain into more depth how eventing works on Darwin, the same way we (or maybe it's just I, I don't want to disparage my colleagues here :P) need to understand Actors and what they mean better, I think the swift core team (and whoever works on concurrency) needs to be able to understand what I explained above.

Makes sense. In Swift 6 or 7 or whenever actors are a plausible release goal, a lot of smart people will need to come together to scrutinize all the details. Iteration and improvement over the course of a release cycle is also sensible.

The point of the document is to provide a long term vision of where things could go, primarily to unblock progress on async/await in the short term. For a very long time now, the objection to doing anything with async/await has been that people don’t feel that they know how and whether async/await would fit into the long term model for Swift concurrency. Indeed, the discussions around getting “context” into the async/await design are a concrete example of how considering the long term direction is important to help shape the immediate steps.

That said, we’re still a ways off from actually implementing an actor model, so we have some time to sort it out.

(2) is what scares me: the kernel has stuff to deliver, and kernels don't like to hold on data on behalf of userspace forever, because this burns wired memory. This means that this doesn't quite play nice with a global anonymous pool that can be starved at any time. Especially if we're talking XPC Connections in a daemon, where super high priority clients such as the frontmost app (or even worse, SpringBoard) can ask you questions that you'd better answer as fast as you can.

The solution we recommend to solve this to developers (1st and 3rd parties), is for all your XPC connections for your clients are rooted at the same bottom queue that represents your "communication" subsystem, so you can imagine it be the "Incoming request multiplexer" Actor or something, this one is in the 2nd category and is known to the kernel so that the kernel can instantiate the Actor itself without asking permission to userspace and directly make the execution context, and rely on the scheduler to get the priorities right.

Thanks for the explanation, I understand a lot better what you’re talking about now. To me, this sounds like the concern of a user space framework author (e.g. the folks writing Kitura), not the users of the frameworks. As such, I have no problem with making it “more difficult” to set up the right #2 abstractions.

I think Pierre has some really interesting points here, and I wanted to highlight them with some context from a real application / framework. Perhaps it can help a broader audience understand some of these points and help influence the overall design of the concurrency model.

A few years ago, we moved Camera and some related internal frameworks from being pretty naive clients of GCD to embracing the philosophy Pierre has outlined. We learned a lot.

I strongly disagree with this for several reasons.

Some Frameworks have a strong need for a serial context, some don't

It completely depends on the framework. If your framework is, say, a networking subsystem which is very asynchronous by nature for a long time, then yes, having the framework setup a #2 kind of guy inside it and have callbacks from/to this isolated context is just fine (and incidentally what your networking stack does).

However for some frameworks it makes very little sense to do this, they're better served using the "location" provided by their client and have some internal synchronization (locks) for the shared state they have. Too much framework code today creates their own #2 queue (if not queue*s*) all the time out of fear to be "blocked" by the client, but this leads to terrible performance.

[ disclaimer I don't know that Security.framework works this way or not, this is an hypothetical ]

For example, if you're using Security.framework stuff (that requires some state such as say your current security ephemeral keys and what not), using a private context instead of using the callers is really terribly bad because it causes tons of context-switches: such a framework should really *not* use a context itself, but a traditional lock to protect global state. The reason here is that the global state is really just a few keys and mutable contexts, but the big part of the work is the CPU time to (de)cipher, and you really want to parallelize as much as you can here, the shared state is not reason enough to hop.

It is tempting to say that we could still use a private queue to hop through to get the shared state and back to the caller, that'd be great if the caller would tail-call into the async to the Security framework and allow for the runtime to do a lightweight switch to the other queue, and then back. The problem is that real life code never does that: it will rarely tail call into the async (though with Swift async/await it would) but more importantly there's other stuff on the caller's context, so the OS will want to continue executing that, and then you will inevitably ask for a thread to drain that Security.framework async.

In our experience, the runtime can never optimize this Security async pattern by never using an extra thread for the Security work.

This was a really common problem for us. We would often, without really realizing it, spin up a thread for very trivial work. In a lot of cases, this was the result of over-zealous asynchronous API design. We were so terrified of blocking, and with GCD we could “very easily” avoid it. We didn’t stop and think about the resource usage of what we were asking the system to do. We didn’t really need a thread to be spun up for us to update a dictionary.

After reading the proposal, I came away wanting a better understanding of when async/await might involve that sort of resource usage. Understandably, that may be more of a runtime question. However, a goal of the proposal was a solution which scaled to millions of tasks. If async/await is proposed and implemented in Swift 5, we will need to tackle this with a solution that does not involve Actors.

Is this something we’re interested in discussing right now? I’m still pretty curious how we practically get to millions of tasks, even though I agree that GCD is up to the challenge. I don’t think it will be on the back of millions of queues though.

Top level contexts are a fundamental part of App (process) design

It is actually way better for the app developer to decide what the subsystems of the app are, and create well known #2 context for these. In our WWDC Talk we took the hypothetical example of News.app, that fetches stuff from RSS feeds, has a database to know what to fetch and what you read, the UI thread, and some networking parts to interact with the internet.

Such an app should upfront create 3 "#2" guys:
- the main thread for UI interactions (this one is made for you obviously)
- the networking handling context
- the database handling context

The flow of most of the app is: UI triggers action, which asks the database subsystem (brain) what to do, which possibly issues networking requests.
When a networking request is finished and that the assets have been reassembled on the network handling queue, it passes them back to the database/brain to decide how to redraw the UI, and issues the command to update the UI back to the UI.

At the OS layer we believe strongly that these 3 places should be made upfront and have strong identities. And it's not an advanced need, it should be made easy. The Advanced need is to have lots of these, and have subsystems that share state that use several of these contexts.

Having experienced the benefits of embracing this design philosophy, I’m inclined to agree on the overall approach. I’ve had a lot of success with this approach on Darwin, but I’m not that familiar with other platforms. I would think it’s universally applicable, but maybe others could chime in on other approaches?

For any OS where the only threading primitive is a pthread-like thing, I think this is universally applicable. That covers Linux, macOS, *BSD, ...

I know that Windows has a fibril concept that's supposed to be leaner (I suspect they built that for C# (?)) but I know nothing about it past the name, so I can't say for them.

Actors and CoroutineContext (see: "Contextualizing async coroutines”) both seem to provide some facilities for achieving this design. Maybe it’s enough for one or both of those to encourage this design somehow, if we decide it is worth encouraging.

Pierre, were you thinking something different than either of those two approaches?

No, this is exactly what I'm advocating for.

For everything else, I agree this hypothetical News.app can use an anonymous pools or reuse any of the top-level context it created, until it creates a scalability problem, in which case by [stress] testing the app, you can figure out which new subsystem needs to emerge. For example, maybe in a later version News.app wants beautiful articles and needs to precompute a bunch of things at the time the article is fetched, and that starts to take enough CPU that doing this on the networking context doesn't scale anymore. Then you just create a new top-level "Article Massaging" context, and migrate some of the workload there.

The level of re-architecture involved in encountering a scalability problem is usually pretty scary. As a rule, you are usually introduced to the problem at the worst possible time too. In our case, it prompted our adoption of the architecture you’re evangelizing.

I think there’s a lot of work to do here. Ideally, it would be just that simple, but those two systems likely grew entangled together while they were in close proximity. A great deal of discipline would have been needed to make that transition easy. Hopefully we can solve this some day.

This is exactly why we need to help at the language level to think about these problems the right way upfront. It's obvious to me that retrofitting this design in existing code is challenging, and it's best not to do mistakes around here when you start. It's a bit like error handling in that regard, and I think Swift is doing an awesome job as making error handling something you can retrofit more easily in existing codebases thoroughly by making it something the language reasons about.

This is exactly why I'm trying to push for not purely async/await but some reasoning about a "top level context" at the language level so that somehow the compiler helps you with this. I don't *know* which form it should take though, I just know there's a need here.

···

On Sep 5, 2017, at 5:29 PM, Elliott Harris via swift-evolution <swift-evolution@swift.org> wrote:

On Sep 4, 2017, at 11:40 AM, Pierre Habouzit via swift-evolution <swift-evolution@swift.org <mailto:swift-evolution@swift.org>> wrote:

On Sep 4, 2017, at 10:36 AM, Chris Lattner via swift-evolution <swift-evolution@swift.org <mailto:swift-evolution@swift.org>> wrote:
On Sep 3, 2017, at 12:44 PM, Pierre Habouzit <phabouzit@apple.com <mailto:phabouzit@apple.com>> wrote:

Personally, I think this leads down a really interesting line of thought about how to construct software, but that’s probably best for another thread or over a drink.

Why this manual partitionning?

It is our experience that the runtime cannot figure these partitions out by itself. and it's not only us, like I said earlier, Go can't either.

The runtime can't possibly know about locking domains, what your code may or may not hit (I mean it's equivalent to the termination problem so of course we can't guess it), or just data affinity which on asymmetric platforms can have a significant impact on your speed (NUMA machines, some big.LITTLE stuff, ...).

The default anonymous pool is fine for best effort work, no doubt we need to make it good, but it will never beat carefully partitioned subsystems.

Spending some time thinking about how we might encourage people to trend toward good concurrent software design as a part of this proposal seems worthwhile. I’m not sure this proposal is quite opinionated enough yet, but I know it’s very early. :slight_smile:

we need to embrace it and explain to people that everywhere in a traditional POSIX world they would have used a real pthread_create()d thread to perform the work of a given subsystem, they create one such category #2 bottom queue that represents this thread (and you make this subsystem an Actor),

Makes sense. This sounds like a great opportunity for actors to push the world even farther towards sensible designs, rather than cargo culting the old threads+channels model.

It is, and this is exactly why I focus on your proposal a lot, I see a ton of value in it that go way beyond the expressiveness of the language.

Also, I think we should strongly encourage pure async “fire and forget” actor methods anyway - IOW, we should encourage push, not pull

I almost agree. We should strongly encourage the `pure async "account for, fire and forget" actor methods`. The `account for` is really backpressure, where you actually don't fire if the remote queue is full and instead rely on some kind of reactive pattern to pull from you. (but I know you wrote that on your proposal and you're aware of it).

Yep, I was trying to get across the developer mindset of “push, not pull” when it comes to decomposing problems and setting up the actor graph.

I think that - done right - the remote queue API can be done in a way where it looks like you’re writing naturally “push” code, but that the API takes care of making the right thing happen.

- since they provide much stronger guarantees in general.

It depends which guarantees you're talking about. I don't think this statement is true. Async work has good and strong properties when you write code in the "normal" priority ranges, what we refer as to "in the QoS world" on Darwin (from background up to UI work).

"stronger guarantees” is probably not the right way to express this. I’m talking about things like “if you don’t wait, it is much harder to create deadlocks”. Many problems are event-driven or streaming, which are naturally push. I can’t explain why I think this, but it seems to me that push designs encourage more functional approaches, but pull designs tend to be more imperative/stateful. The later feels familiar, but encourages the classical bugs we’re all used to :slight_smile:

However, there are tons of snowflakes on any platform that can't be in that world:
- media rendering (video/audio)
- HID (touch, gesture recognition, keyboard, mouses, trackpads, ...)
- some use cases of networking (bluetooth is a very good example, you hate when your audio drops with your bluetooth headset don't you?)
- ...

And these use cases are many, and run in otherwise regular processes all the time.

I think there is some misunderstanding here. I’m not saying that sync is bad, I’m only talking about the default abstraction and design patterns that people should reach for first.

The general design I’m shooting for here is to provide a default abstractions that work 80%+ of the time, allowing developers to have a natural first step to reach for when they build their code. However, any single abstraction will have limitations and problems in some use cases, and some of those snowflakes are SO important (media is a great example) that it isn’t acceptable to take any hit. This is why I think it is just as important to have an escape hatch. The biggest escape hatches we’ve talked about are multithreaded actors, but folks could also simply “not use actors” if they aren’t solving problems for them.

Swift aims to be pragmatic, not dogmatic. If you’re implementing a media decoder, write the thing in assembly if you want. My feelings won’t be hurt :slight_smile:

My concern was not about how you write their code, for all I care, they could use any language. It's about how they interact with the Swift world that I'm worried about.

Assuming these subsystem exist already and are implemented, it is our experience that it is completely impractical to ask from these subsystems to not ever interact with the rest of the world except through very gated interfaces. Eventually they need to use some kind of common/shared infrastructure, whether it's logging, some security/DRM decoding thing that needs to delegate to the SEP or some daemon, etc... and some of these generic OS layers would likely with time use Swift Actors.

Since await is asynchronous wait (IOW as my C-addicted brain translates it, equivalent to dispatch_group_notify(group, queue, ^{ tell me when what I'm 'waiting' on is done please })), that doesn't fly.
Those subsystem need to block synchronously with wait (no a) on a given Actor.

Is that because they’re trying to provide a synchronous API on something that is inherently asynchronous (all Actor communication)?

However it is a fact of life that these subsystems, have to interact with generic subsystems sometimes, and that mean they need to be able to synchronously wait on an actor, so that this actor's priority is elevated. And you can't waive this off, there are tons of legitimate reasons for very-high priorities subsystems to have to interact and wait on regular priority work.

I understand completely, which is why synchronous waiting is part of the model. Despite what I say above, I really don’t want people to avoid actors or write their code in assembly. :slight_smile:

My point about pull model is that it seems like the right *default* for people to reach for, not that it should be the only or exclusive mechanic proposed. This is one reason that I think it is important to introduce async/await before actors - so we have the right mechanic to build this waiting on top of.

I 100% agree with you that if *everything* was asynchronous and written this way, our lives would be great. I don't however think it's possible on real life operating system to write all your code this way. And this is exactly where things start to be *very* messy.

This is one of the topics I’d like to explore more on a separate thread – asynchrony is viral. We’ve got a lot of supporting evidence of this and, as proposed, async/await look to continue this trend.

Cheers,
Elliott

+1, again, this pragmatism is exactly why the proposal describes actor methods returning values, even though it is not part of the standard actor calculus that academia discusses:
https://gist.github.com/lattner/31ed37682ef1576b16bca1432ea9f782#extending-the-model-through-await

If you have some suggestion for how I can clarify the writing to address the apparent confusion here, please let me know and I’ll be happy to fix it.

Unless I misunderstood what await is dramatically, then I don't see where your write up addresses synchronous waiting anywhere yet.
Or is it that await turns into a synchronous wait if the function you're awaiting from is not an actor function? that would seem confusing to me.

-Pierre

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org <mailto:swift-evolution@swift.org>
https://lists.swift.org/mailman/listinfo/swift-evolution

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org <mailto:swift-evolution@swift.org>
https://lists.swift.org/mailman/listinfo/swift-evolution


(John McCall) #33

Maybe we could somehow mark a function to cause a warning/error when directly using it from an async function. You'd want to use that on locks, synchronous I/O, probably some other things.

Trying to hard-enforce it would pretty quickly turn into a big, annoying effects-system problem, where even a program not using async at all would suddenly have to mark a ton of functions as "async-unsafe". I'm not sure this problem is worth that level of intrusion for most programmers. But a soft enforcement, maybe an opt-in one like the Clang static analyzer, could do a lot to prod people in the right direction.

John.

···

On Sep 12, 2017, at 2:19 AM, Pierre Habouzit via swift-evolution <swift-evolution@swift.org> wrote:

On Sep 11, 2017, at 9:00 PM, Chris Lattner via swift-evolution <swift-evolution@swift.org <mailto:swift-evolution@swift.org>> wrote:

On Sep 4, 2017, at 12:18 PM, Pierre Habouzit <phabouzit@apple.com <mailto:phabouzit@apple.com>> wrote:

Something else I realized, is that this code is fundamentally broken in swift:

actor func foo()
{
    NSLock *lock = NSLock();
    lock.lock();

    let compute = await someCompute(); <--- this will really break `foo` in two pieces of code that can execute on two different physical threads.
    lock.unlock();
}

The reason why it is broken is that mutexes (whether it's NSLock, pthread_mutex, os_unfair_lock) have to be unlocked from the same thread that took it. the await right in the middle here means that we can't guarantee it.

Agreed, this is just as broken as:

func foo()
{
    let lock = NSLock()
    lock.lock()

    someCompute {
      lock.unlock()
    }
}

and it is just as broken as trying to do the same thing across queues. Stuff like this, or the use of TLS, is just inherently broken, both with GCD and with any sensible model underlying actors. Trying to fix this is not worth it IMO, it is better to be clear that they are different things and that (as a programmer) you should *expect* your tasks to run on multiple kernel threads.

BTW, why are you using a lock in a single threaded context in the first place??? :wink:

I don't do locks, I do atomics as a living.

Joke aside, it's easy to write this bug we should try to have the compiler/analyzer help here for these broken patterns.
TSD is IMO less of a problem because people using them are aware of their sharp edges. Not so much for locks.


(Eagle Offshore) #34

OK, I've been watching this thing for a couple weeks.

I've done a lot of GCD network code. Invariably my completion method starts with

dispatch_async(queue_want_to_handle_this_on,....)

Replying on the same queue would be nice I guess, only often all I need to do is update the UI in the completion code.

OTOH, I have situations where the reply is complicated and I need to persist a lot of data, then update the UI.

So honestly, any assumption you make about how this is supposed to work is going to be wrong about half the time unless....

you let me specify the reply queue directly.

That is the only thing that works all the time. Even then, I'm very apt to make the choice to do some of the work off the main thread and then queue up the minimal amount of work onto the main thread.

Finally, I don't think this is properly a language feature. I think its a library feature. I think Swfit's tendency is to push way too much into the language rather than the library and I personally STRONGLY prefer tiny languages with rich libraries rather than the opposite.

That's my $0.02. I don't care about key words for async stuff. I'm more than happy with GCD the library as long as I have the building blocks (closures) to take advantage of it.

···

On Sep 5, 2017, at 6:06 PM, Pierre Habouzit via swift-evolution <swift-evolution@swift.org> wrote:

On Sep 5, 2017, at 5:29 PM, Elliott Harris via swift-evolution <swift-evolution@swift.org <mailto:swift-evolution@swift.org>> wrote:

On Sep 4, 2017, at 11:40 AM, Pierre Habouzit via swift-evolution <swift-evolution@swift.org <mailto:swift-evolution@swift.org>> wrote:

On Sep 4, 2017, at 10:36 AM, Chris Lattner via swift-evolution <swift-evolution@swift.org <mailto:swift-evolution@swift.org>> wrote:

On Sep 3, 2017, at 12:44 PM, Pierre Habouzit <phabouzit@apple.com <mailto:phabouzit@apple.com>> wrote:

My currently not very well formed opinion on this subject is that GCD queues are just what you need with these possibilities:
- this Actor queue can be targeted to other queues by the developer when he means for these actor to be executed in an existing execution context / locking domain,
- we disallow Actors to be directly targeted to GCD global concurrent queues ever
- for the other ones we create a new abstraction with stronger and better guarantees (typically limiting the number of possible threads servicing actors to a low number, not greater than NCPU).

Is there a specific important use case for being able to target an actor to an existing queue? Are you looking for advanced patterns where multiple actors (each providing disjoint mutable state) share an underlying queue? Would this be for performance reasons, for compatibility with existing code, or something else?

Mostly for interaction with current designs where being on a given bottom serial queue gives you the locking context for resources naturally attached to it.

Ok. I don’t understand the use-case well enough to know how we should model this. For example, is it important for an actor to be able to change its queue dynamically as it goes (something that sounds really scary to me) or can the “queue to use” be specified at actor initialization time?

I think I need to read more on actors, because the same way you're not an OS runtime expert, I'm not (or rather no longer, I started down that path a lifetime ago) a language expert at all, and I feel like I need to understand your world better to try to explain this part better to you.

No worries. Actually, after thinking about it a bit, I don’t think that switching underlying queues at runtime is scary.

The important semantic invariant which must be maintained is that there is only one thread executing within an actor context at a time. Switching around underlying queues (or even having multiple actors on the same queue) shouldn’t be a problem.

OTOH, you don’t want an actor “listening” to two unrelated queues, because there is nothing to synchronize between the queues, and you could have multiple actor methods invoked at the same time: you lose the protection of a single serial queue.

The only concern I’d have with an actor switching queues at runtime is that you don’t want a race condition where an item on QueueA goes to the actor, then it switches to QueueB, then another item from QueueB runs while the actor is already doing something for QueueA.

I think what you said made sense.

Ok, I captured this in yet-another speculative section:
https://gist.github.com/lattner/31ed37682ef1576b16bca1432ea9f782#intra-actor-concurrency

Great. BTW I agree 100% with:

That said, this is definitely a power-user feature, and we should understand, build, and get experience using the basic system before considering adding something like this.

Private concurrent queues are not a success in dispatch and cause several issues, these queues are second class citizens in GCD in terms of feature they support, and building something with concurrency *within* is hard. I would keep it as "that's where we'll go some day" but not try to attempt it until we've build the simpler (or rather less hard) purely serial case first.

Right, I agree this is not important for the short term. To clarify though, I meant to indicate that these actors would be implemented completely independently of dispatch, not that they’d build on private concurrent queues.

Another problem I haven't touched either is kernel-issued events (inbound IPC from other processes, networking events, etc...). Dispatch for the longest time used an indirection through a manager thread for all such events, and that had two major issues:

- the thread hops it caused, causing networking workloads to utilize up to 15-20% more CPU time than an equivalent manually made pthread parked in kevent(), because networking pace even when busy idles back all the time as far as the CPU is concerned, so dispatch queues never stay hot, and the context switch is not only a scheduled context switch but also has the cost of a thread bring up

- if you deliver all possible events this way you also deliver events that cannot possibly make progress because the execution context that will handle them is already "locked" (as in busy running something else.

It took us several years to get to the point we presented at WWDC this year where we deliver events directly to the right dispatch queue. If you only have very anonymous execution contexts then all this machinery is wasted and unused. However, this machinery has been evaluated and saves full percents of CPU load system-wide. I'd hate for us to go back 5 years here.

I don’t have anything intelligent to say here, but it sounds like you understand the issues well :slight_smile: I agree that giving up 5 years of progress is not appealing.

TBH our team has to explain into more depth how eventing works on Darwin, the same way we (or maybe it's just I, I don't want to disparage my colleagues here :P) need to understand Actors and what they mean better, I think the swift core team (and whoever works on concurrency) needs to be able to understand what I explained above.

Makes sense. In Swift 6 or 7 or whenever actors are a plausible release goal, a lot of smart people will need to come together to scrutinize all the details. Iteration and improvement over the course of a release cycle is also sensible.

The point of the document is to provide a long term vision of where things could go, primarily to unblock progress on async/await in the short term. For a very long time now, the objection to doing anything with async/await has been that people don’t feel that they know how and whether async/await would fit into the long term model for Swift concurrency. Indeed, the discussions around getting “context” into the async/await design are a concrete example of how considering the long term direction is important to help shape the immediate steps.

That said, we’re still a ways off from actually implementing an actor model, so we have some time to sort it out.

(2) is what scares me: the kernel has stuff to deliver, and kernels don't like to hold on data on behalf of userspace forever, because this burns wired memory. This means that this doesn't quite play nice with a global anonymous pool that can be starved at any time. Especially if we're talking XPC Connections in a daemon, where super high priority clients such as the frontmost app (or even worse, SpringBoard) can ask you questions that you'd better answer as fast as you can.

The solution we recommend to solve this to developers (1st and 3rd parties), is for all your XPC connections for your clients are rooted at the same bottom queue that represents your "communication" subsystem, so you can imagine it be the "Incoming request multiplexer" Actor or something, this one is in the 2nd category and is known to the kernel so that the kernel can instantiate the Actor itself without asking permission to userspace and directly make the execution context, and rely on the scheduler to get the priorities right.

Thanks for the explanation, I understand a lot better what you’re talking about now. To me, this sounds like the concern of a user space framework author (e.g. the folks writing Kitura), not the users of the frameworks. As such, I have no problem with making it “more difficult” to set up the right #2 abstractions.

I think Pierre has some really interesting points here, and I wanted to highlight them with some context from a real application / framework. Perhaps it can help a broader audience understand some of these points and help influence the overall design of the concurrency model.

A few years ago, we moved Camera and some related internal frameworks from being pretty naive clients of GCD to embracing the philosophy Pierre has outlined. We learned a lot.

I strongly disagree with this for several reasons.

Some Frameworks have a strong need for a serial context, some don't

It completely depends on the framework. If your framework is, say, a networking subsystem which is very asynchronous by nature for a long time, then yes, having the framework setup a #2 kind of guy inside it and have callbacks from/to this isolated context is just fine (and incidentally what your networking stack does).

However for some frameworks it makes very little sense to do this, they're better served using the "location" provided by their client and have some internal synchronization (locks) for the shared state they have. Too much framework code today creates their own #2 queue (if not queue*s*) all the time out of fear to be "blocked" by the client, but this leads to terrible performance.

[ disclaimer I don't know that Security.framework works this way or not, this is an hypothetical ]

For example, if you're using Security.framework stuff (that requires some state such as say your current security ephemeral keys and what not), using a private context instead of using the callers is really terribly bad because it causes tons of context-switches: such a framework should really *not* use a context itself, but a traditional lock to protect global state. The reason here is that the global state is really just a few keys and mutable contexts, but the big part of the work is the CPU time to (de)cipher, and you really want to parallelize as much as you can here, the shared state is not reason enough to hop.

It is tempting to say that we could still use a private queue to hop through to get the shared state and back to the caller, that'd be great if the caller would tail-call into the async to the Security framework and allow for the runtime to do a lightweight switch to the other queue, and then back. The problem is that real life code never does that: it will rarely tail call into the async (though with Swift async/await it would) but more importantly there's other stuff on the caller's context, so the OS will want to continue executing that, and then you will inevitably ask for a thread to drain that Security.framework async.

In our experience, the runtime can never optimize this Security async pattern by never using an extra thread for the Security work.

This was a really common problem for us. We would often, without really realizing it, spin up a thread for very trivial work. In a lot of cases, this was the result of over-zealous asynchronous API design. We were so terrified of blocking, and with GCD we could “very easily” avoid it. We didn’t stop and think about the resource usage of what we were asking the system to do. We didn’t really need a thread to be spun up for us to update a dictionary.

After reading the proposal, I came away wanting a better understanding of when async/await might involve that sort of resource usage. Understandably, that may be more of a runtime question. However, a goal of the proposal was a solution which scaled to millions of tasks. If async/await is proposed and implemented in Swift 5, we will need to tackle this with a solution that does not involve Actors.

Is this something we’re interested in discussing right now? I’m still pretty curious how we practically get to millions of tasks, even though I agree that GCD is up to the challenge. I don’t think it will be on the back of millions of queues though.

Top level contexts are a fundamental part of App (process) design

It is actually way better for the app developer to decide what the subsystems of the app are, and create well known #2 context for these. In our WWDC Talk we took the hypothetical example of News.app, that fetches stuff from RSS feeds, has a database to know what to fetch and what you read, the UI thread, and some networking parts to interact with the internet.

Such an app should upfront create 3 "#2" guys:
- the main thread for UI interactions (this one is made for you obviously)
- the networking handling context
- the database handling context

The flow of most of the app is: UI triggers action, which asks the database subsystem (brain) what to do, which possibly issues networking requests.
When a networking request is finished and that the assets have been reassembled on the network handling queue, it passes them back to the database/brain to decide how to redraw the UI, and issues the command to update the UI back to the UI.

At the OS layer we believe strongly that these 3 places should be made upfront and have strong identities. And it's not an advanced need, it should be made easy. The Advanced need is to have lots of these, and have subsystems that share state that use several of these contexts.

Having experienced the benefits of embracing this design philosophy, I’m inclined to agree on the overall approach. I’ve had a lot of success with this approach on Darwin, but I’m not that familiar with other platforms. I would think it’s universally applicable, but maybe others could chime in on other approaches?

For any OS where the only threading primitive is a pthread-like thing, I think this is universally applicable. That covers Linux, macOS, *BSD, ...

I know that Windows has a fibril concept that's supposed to be leaner (I suspect they built that for C# (?)) but I know nothing about it past the name, so I can't say for them.

Actors and CoroutineContext (see: "Contextualizing async coroutines”) both seem to provide some facilities for achieving this design. Maybe it’s enough for one or both of those to encourage this design somehow, if we decide it is worth encouraging.

Pierre, were you thinking something different than either of those two approaches?

No, this is exactly what I'm advocating for.

For everything else, I agree this hypothetical News.app can use an anonymous pools or reuse any of the top-level context it created, until it creates a scalability problem, in which case by [stress] testing the app, you can figure out which new subsystem needs to emerge. For example, maybe in a later version News.app wants beautiful articles and needs to precompute a bunch of things at the time the article is fetched, and that starts to take enough CPU that doing this on the networking context doesn't scale anymore. Then you just create a new top-level "Article Massaging" context, and migrate some of the workload there.

The level of re-architecture involved in encountering a scalability problem is usually pretty scary. As a rule, you are usually introduced to the problem at the worst possible time too. In our case, it prompted our adoption of the architecture you’re evangelizing.

I think there’s a lot of work to do here. Ideally, it would be just that simple, but those two systems likely grew entangled together while they were in close proximity. A great deal of discipline would have been needed to make that transition easy. Hopefully we can solve this some day.

This is exactly why we need to help at the language level to think about these problems the right way upfront. It's obvious to me that retrofitting this design in existing code is challenging, and it's best not to do mistakes around here when you start. It's a bit like error handling in that regard, and I think Swift is doing an awesome job as making error handling something you can retrofit more easily in existing codebases thoroughly by making it something the language reasons about.

This is exactly why I'm trying to push for not purely async/await but some reasoning about a "top level context" at the language level so that somehow the compiler helps you with this. I don't *know* which form it should take though, I just know there's a need here.

Personally, I think this leads down a really interesting line of thought about how to construct software, but that’s probably best for another thread or over a drink.

Why this manual partitionning?

It is our experience that the runtime cannot figure these partitions out by itself. and it's not only us, like I said earlier, Go can't either.

The runtime can't possibly know about locking domains, what your code may or may not hit (I mean it's equivalent to the termination problem so of course we can't guess it), or just data affinity which on asymmetric platforms can have a significant impact on your speed (NUMA machines, some big.LITTLE stuff, ...).

The default anonymous pool is fine for best effort work, no doubt we need to make it good, but it will never beat carefully partitioned subsystems.

Spending some time thinking about how we might encourage people to trend toward good concurrent software design as a part of this proposal seems worthwhile. I’m not sure this proposal is quite opinionated enough yet, but I know it’s very early. :slight_smile:

we need to embrace it and explain to people that everywhere in a traditional POSIX world they would have used a real pthread_create()d thread to perform the work of a given subsystem, they create one such category #2 bottom queue that represents this thread (and you make this subsystem an Actor),

Makes sense. This sounds like a great opportunity for actors to push the world even farther towards sensible designs, rather than cargo culting the old threads+channels model.

It is, and this is exactly why I focus on your proposal a lot, I see a ton of value in it that go way beyond the expressiveness of the language.

Also, I think we should strongly encourage pure async “fire and forget” actor methods anyway - IOW, we should encourage push, not pull

I almost agree. We should strongly encourage the `pure async "account for, fire and forget" actor methods`. The `account for` is really backpressure, where you actually don't fire if the remote queue is full and instead rely on some kind of reactive pattern to pull from you. (but I know you wrote that on your proposal and you're aware of it).

Yep, I was trying to get across the developer mindset of “push, not pull” when it comes to decomposing problems and setting up the actor graph.

I think that - done right - the remote queue API can be done in a way where it looks like you’re writing naturally “push” code, but that the API takes care of making the right thing happen.

- since they provide much stronger guarantees in general.

It depends which guarantees you're talking about. I don't think this statement is true. Async work has good and strong properties when you write code in the "normal" priority ranges, what we refer as to "in the QoS world" on Darwin (from background up to UI work).

"stronger guarantees” is probably not the right way to express this. I’m talking about things like “if you don’t wait, it is much harder to create deadlocks”. Many problems are event-driven or streaming, which are naturally push. I can’t explain why I think this, but it seems to me that push designs encourage more functional approaches, but pull designs tend to be more imperative/stateful. The later feels familiar, but encourages the classical bugs we’re all used to :slight_smile:

However, there are tons of snowflakes on any platform that can't be in that world:
- media rendering (video/audio)
- HID (touch, gesture recognition, keyboard, mouses, trackpads, ...)
- some use cases of networking (bluetooth is a very good example, you hate when your audio drops with your bluetooth headset don't you?)
- ...

And these use cases are many, and run in otherwise regular processes all the time.

I think there is some misunderstanding here. I’m not saying that sync is bad, I’m only talking about the default abstraction and design patterns that people should reach for first.

The general design I’m shooting for here is to provide a default abstractions that work 80%+ of the time, allowing developers to have a natural first step to reach for when they build their code. However, any single abstraction will have limitations and problems in some use cases, and some of those snowflakes are SO important (media is a great example) that it isn’t acceptable to take any hit. This is why I think it is just as important to have an escape hatch. The biggest escape hatches we’ve talked about are multithreaded actors, but folks could also simply “not use actors” if they aren’t solving problems for them.

Swift aims to be pragmatic, not dogmatic. If you’re implementing a media decoder, write the thing in assembly if you want. My feelings won’t be hurt :slight_smile:

My concern was not about how you write their code, for all I care, they could use any language. It's about how they interact with the Swift world that I'm worried about.

Assuming these subsystem exist already and are implemented, it is our experience that it is completely impractical to ask from these subsystems to not ever interact with the rest of the world except through very gated interfaces. Eventually they need to use some kind of common/shared infrastructure, whether it's logging, some security/DRM decoding thing that needs to delegate to the SEP or some daemon, etc... and some of these generic OS layers would likely with time use Swift Actors.

Since await is asynchronous wait (IOW as my C-addicted brain translates it, equivalent to dispatch_group_notify(group, queue, ^{ tell me when what I'm 'waiting' on is done please })), that doesn't fly.
Those subsystem need to block synchronously with wait (no a) on a given Actor.

Is that because they’re trying to provide a synchronous API on something that is inherently asynchronous (all Actor communication)?

However it is a fact of life that these subsystems, have to interact with generic subsystems sometimes, and that mean they need to be able to synchronously wait on an actor, so that this actor's priority is elevated. And you can't waive this off, there are tons of legitimate reasons for very-high priorities subsystems to have to interact and wait on regular priority work.

I understand completely, which is why synchronous waiting is part of the model. Despite what I say above, I really don’t want people to avoid actors or write their code in assembly. :slight_smile:

My point about pull model is that it seems like the right *default* for people to reach for, not that it should be the only or exclusive mechanic proposed. This is one reason that I think it is important to introduce async/await before actors - so we have the right mechanic to build this waiting on top of.

I 100% agree with you that if *everything* was asynchronous and written this way, our lives would be great. I don't however think it's possible on real life operating system to write all your code this way. And this is exactly where things start to be *very* messy.

This is one of the topics I’d like to explore more on a separate thread – asynchrony is viral. We’ve got a lot of supporting evidence of this and, as proposed, async/await look to continue this trend.

Cheers,
Elliott

+1, again, this pragmatism is exactly why the proposal describes actor methods returning values, even though it is not part of the standard actor calculus that academia discusses:
https://gist.github.com/lattner/31ed37682ef1576b16bca1432ea9f782#extending-the-model-through-await

If you have some suggestion for how I can clarify the writing to address the apparent confusion here, please let me know and I’ll be happy to fix it.

Unless I misunderstood what await is dramatically, then I don't see where your write up addresses synchronous waiting anywhere yet.
Or is it that await turns into a synchronous wait if the function you're awaiting from is not an actor function? that would seem confusing to me.

-Pierre

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org <mailto:swift-evolution@swift.org>
https://lists.swift.org/mailman/listinfo/swift-evolution

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org <mailto:swift-evolution@swift.org>
https://lists.swift.org/mailman/listinfo/swift-evolution

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org <mailto:swift-evolution@swift.org>
https://lists.swift.org/mailman/listinfo/swift-evolution


(Pierre Habouzit) #35

Something else I realized, is that this code is fundamentally broken in swift:

actor func foo()
{
    NSLock *lock = NSLock();
    lock.lock();

    let compute = await someCompute(); <--- this will really break `foo` in two pieces of code that can execute on two different physical threads.
    lock.unlock();
}

The reason why it is broken is that mutexes (whether it's NSLock, pthread_mutex, os_unfair_lock) have to be unlocked from the same thread that took it. the await right in the middle here means that we can't guarantee it.

Agreed, this is just as broken as:

func foo()
{
    let lock = NSLock()
    lock.lock()

    someCompute {
      lock.unlock()
    }
}

and it is just as broken as trying to do the same thing across queues. Stuff like this, or the use of TLS, is just inherently broken, both with GCD and with any sensible model underlying actors. Trying to fix this is not worth it IMO, it is better to be clear that they are different things and that (as a programmer) you should *expect* your tasks to run on multiple kernel threads.

BTW, why are you using a lock in a single threaded context in the first place??? :wink:

I don't do locks, I do atomics as a living.

Joke aside, it's easy to write this bug we should try to have the compiler/analyzer help here for these broken patterns.
TSD is IMO less of a problem because people using them are aware of their sharp edges. Not so much for locks.

Maybe we could somehow mark a function to cause a warning/error when directly using it from an async function. You'd want to use that on locks, synchronous I/O, probably some other things.

Well the problem is not quite using them (malloc would e.g. and there's not quite a way around it), you don't want to hold a lock across await.

Trying to hard-enforce it would pretty quickly turn into a big, annoying effects-system problem, where even a program not using async at all would suddenly have to mark a ton of functions as "async-unsafe". I'm not sure this problem is worth that level of intrusion for most programmers. But a soft enforcement, maybe an opt-in one like the Clang static analyzer, could do a lot to prod people in the right direction.

Sure, I'm worried about the fact that because POSIX is a piece of cr^W^W^W^Wso beautifully designed, if you unlock a mutex from another thread that the one locking it, you're not allowed to crash to tell the client he made a programming mistake.

the unfair lock on Darwin will abort if you try to do something like that though.

We'll see how much users will make these mistakes I guess.

-Pierre

···

On Sep 12, 2017, at 12:31 PM, John McCall via swift-evolution <swift-evolution@swift.org> wrote:

On Sep 12, 2017, at 2:19 AM, Pierre Habouzit via swift-evolution <swift-evolution@swift.org <mailto:swift-evolution@swift.org>> wrote:

On Sep 11, 2017, at 9:00 PM, Chris Lattner via swift-evolution <swift-evolution@swift.org <mailto:swift-evolution@swift.org>> wrote:
On Sep 4, 2017, at 12:18 PM, Pierre Habouzit <phabouzit@apple.com <mailto:phabouzit@apple.com>> wrote:


(Chris Lattner) #36

I (think that I) understand what you’re saying here, but I don’t think that we’re talking about the same thing.

You seem to be making an argument about what is most *useful* (being able to vector a completion handler to a specific queue), but I’m personally concerned about what is most *surprising* and therefore unnatural and prone to introduce bugs and misunderstandings by people who haven’t written the code. To make this more concrete, shift from the “person who writes to code” to the “person who has to maintain someone else's code”:

Imagine you are maintaining a large codebase, and you come across this (intentionally abstract) code:

  foo()
  await bar()
  baz()

Regardless of what is the most useful, I’d argue that it is only natural to expect baz() to run on the same queue/thread/execution-context as foo and bar. If, in the same model, you see something like:

  foo()
  await bar()
  anotherQueue.async {
    baz()
  }

Then it is super clear what is going on: an intentional queue hop from whatever foo/bar are run on to anotherQueue.

I interpret your email as arguing for something like this:

  foo()
  await(anotherQueue) bar()
  baz()

I’m not sure if that’s exactly the syntax you’re arguing for, but anything like this presents a number of challenges:

1) it is “just sugar” over the basic model, so we could argue to add it at any time (and would argue strongly to defer it out of this round of discussions).

2) We’d have to find a syntax that implies that baz() runs on anotherQueue, but bar() runs on the existing queue. The syntax I sketched above does NOT provide this indication.

-Chris

···

On Sep 5, 2017, at 7:31 PM, Eagle Offshore via swift-evolution <swift-evolution@swift.org> wrote:

OK, I've been watching this thing for a couple weeks.

I've done a lot of GCD network code. Invariably my completion method starts with

dispatch_async(queue_want_to_handle_this_on,....)

Replying on the same queue would be nice I guess, only often all I need to do is update the UI in the completion code.

OTOH, I have situations where the reply is complicated and I need to persist a lot of data, then update the UI.

So honestly, any assumption you make about how this is supposed to work is going to be wrong about half the time unless....

you let me specify the reply queue directly.

That is the only thing that works all the time. Even then, I'm very apt to make the choice to do some of the work off the main thread and then queue up the minimal amount of work onto the main thread.


(David Hart) #37

OK, I've been watching this thing for a couple weeks.

I've done a lot of GCD network code. Invariably my completion method starts with

dispatch_async(queue_want_to_handle_this_on,....)

Replying on the same queue would be nice I guess, only often all I need to do is update the UI in the completion code.

OTOH, I have situations where the reply is complicated and I need to persist a lot of data, then update the UI.

So honestly, any assumption you make about how this is supposed to work is going to be wrong about half the time unless....

you let me specify the reply queue directly.

That is the only thing that works all the time. Even then, I'm very apt to make the choice to do some of the work off the main thread and then queue up the minimal amount of work onto the main thread.

I (think that I) understand what you’re saying here, but I don’t think that we’re talking about the same thing.

You seem to be making an argument about what is most *useful* (being able to vector a completion handler to a specific queue), but I’m personally concerned about what is most *surprising* and therefore unnatural and prone to introduce bugs and misunderstandings by people who haven’t written the code. To make this more concrete, shift from the “person who writes to code” to the “person who has to maintain someone else's code”:

Imagine you are maintaining a large codebase, and you come across this (intentionally abstract) code:

   foo()
   await bar()
   baz()

Regardless of what is the most useful, I’d argue that it is only natural to expect baz() to run on the same queue/thread/execution-context as foo and bar. If, in the same model, you see something like:

   foo()
   await bar()
   anotherQueue.async {
       baz()
   }

Couldn’t it end up being:

foo()
await bar()
await anotherQueue.async()
// on another queue

···

On 7 Sep 2017, at 07:05, Chris Lattner via swift-evolution <swift-evolution@swift.org> wrote:

On Sep 5, 2017, at 7:31 PM, Eagle Offshore via swift-evolution <swift-evolution@swift.org> wrote:

Then it is super clear what is going on: an intentional queue hop from whatever foo/bar are run on to anotherQueue.

I interpret your email as arguing for something like this:

   foo()
   await(anotherQueue) bar()
   baz()

I’m not sure if that’s exactly the syntax you’re arguing for, but anything like this presents a number of challenges:

1) it is “just sugar” over the basic model, so we could argue to add it at any time (and would argue strongly to defer it out of this round of discussions).

2) We’d have to find a syntax that implies that baz() runs on anotherQueue, but bar() runs on the existing queue. The syntax I sketched above does NOT provide this indication.

-Chris

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution


(Marc Schlichte) #38

But what if `bar` was defined like this in a pre async/await world:

`bar(queue: DispatchQueue, continuation: (value: Value?, error: Error?) -> Void)`

^ There are several existing APIs which use this pattern of explicitly providing the queue on which the continuation should run.

My expectation (especially as a maintainer) would be that the async/await version exhibits the same queueing semantics as the `old` CPS style - whatever that was (implicitly on the main-queue, implicitly on some background queue or explicitly on a provided queue).

Also, a related question I have: Will / should it be possible to mix-and-match CPS and async/await style for system APIs? I would say yes, so that we can transfer to the new async/await style at our own pace.

Cheers
Marc

···

Am 07.09.2017 um 07:05 schrieb Chris Lattner via swift-evolution <swift-evolution@swift.org>:

Imagine you are maintaining a large codebase, and you come across this (intentionally abstract) code:

  foo()
  await bar()
  baz()

Regardless of what is the most useful, I’d argue that it is only natural to expect baz() to run on the same queue/thread/execution-context as foo and bar.


(Howard Lovatt) #39

I would argue that given:

    foo()
    await bar()
    baz()

That foo and baz should run on the same queue (using queue in the GCD
sense) but bar should determine which queue it runs on. I say this because:

   1. foo and baz are running synchronously with respect to each other
   (though they could be running asynchronously with respect to some other
   process if all the lines shown are inside an async function).
   2. bar is running asynchronously relative to foo and baz, potentially on
   a different queue.

I say bar is potentially on a different queue because the user of bar, the
person who wrote these 3 lines above, cannot be presumed to be the writer
of foo, baz, and particularly not bar and therefore have no detailed
knowledge about which queue is appropriate.

Therefore I would suggest either using a Future or expanding async so that
you can say:

    func bar() async(qos: .userInitiated) { ... }

You also probably need the ability to specify a timeout and queue type,
e.g.:

   func bar() async(type: .serial, qos: .utility, timeout: .seconds(10))
throws { ... }

If a timeout is specified then await would have to throw to enable the
timeout, i.e. call would become:

   try await bar()

Defaults could be provided for qos (.default works well), timeout (1 second
works well), and type (.concurrent works well).

However a Future does all this already :).

  -- Howard.

···

On 7 September 2017 at 15:13, David Hart via swift-evolution < swift-evolution@swift.org> wrote:

> On 7 Sep 2017, at 07:05, Chris Lattner via swift-evolution < > swift-evolution@swift.org> wrote:
>
>
>> On Sep 5, 2017, at 7:31 PM, Eagle Offshore via swift-evolution < > swift-evolution@swift.org> wrote:
>>
>> OK, I've been watching this thing for a couple weeks.
>>
>> I've done a lot of GCD network code. Invariably my completion method
starts with
>>
>> dispatch_async(queue_want_to_handle_this_on,....)
>>
>> Replying on the same queue would be nice I guess, only often all I need
to do is update the UI in the completion code.
>>
>> OTOH, I have situations where the reply is complicated and I need to
persist a lot of data, then update the UI.
>>
>> So honestly, any assumption you make about how this is supposed to work
is going to be wrong about half the time unless....
>>
>> you let me specify the reply queue directly.
>>
>> That is the only thing that works all the time. Even then, I'm very
apt to make the choice to do some of the work off the main thread and then
queue up the minimal amount of work onto the main thread.
>
> I (think that I) understand what you’re saying here, but I don’t think
that we’re talking about the same thing.
>
> You seem to be making an argument about what is most *useful* (being
able to vector a completion handler to a specific queue), but I’m
personally concerned about what is most *surprising* and therefore
unnatural and prone to introduce bugs and misunderstandings by people who
haven’t written the code. To make this more concrete, shift from the
“person who writes to code” to the “person who has to maintain someone
else's code”:
>
> Imagine you are maintaining a large codebase, and you come across this
(intentionally abstract) code:
>
> foo()
> await bar()
> baz()
>
> Regardless of what is the most useful, I’d argue that it is only natural
to expect baz() to run on the same queue/thread/execution-context as foo
and bar. If, in the same model, you see something like:
>
> foo()
> await bar()
> anotherQueue.async {
> baz()
> }

Couldn’t it end up being:

foo()
await bar()
await anotherQueue.async()
// on another queue

> Then it is super clear what is going on: an intentional queue hop from
whatever foo/bar are run on to anotherQueue.
>
> I interpret your email as arguing for something like this:
>
> foo()
> await(anotherQueue) bar()
> baz()
>
> I’m not sure if that’s exactly the syntax you’re arguing for, but
anything like this presents a number of challenges:
>
> 1) it is “just sugar” over the basic model, so we could argue to add it
at any time (and would argue strongly to defer it out of this round of
discussions).
>
> 2) We’d have to find a syntax that implies that baz() runs on
anotherQueue, but bar() runs on the existing queue. The syntax I sketched
above does NOT provide this indication.
>
> -Chris
>
>
> _______________________________________________
> swift-evolution mailing list
> swift-evolution@swift.org
> https://lists.swift.org/mailman/listinfo/swift-evolution

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution


(Chris Lattner) #40

Imagine you are maintaining a large codebase, and you come across this (intentionally abstract) code:

  foo()
  await bar()
  baz()

Regardless of what is the most useful, I’d argue that it is only natural to expect baz() to run on the same queue/thread/execution-context as foo and bar.

But what if `bar` was defined like this in a pre async/await world:

`bar(queue: DispatchQueue, continuation: (value: Value?, error: Error?) -> Void)`

^ There are several existing APIs which use this pattern of explicitly providing the queue on which the continuation should run.

My expectation (especially as a maintainer) would be that the async/await version exhibits the same queueing semantics as the `old` CPS style - whatever that was (implicitly on the main-queue, implicitly on some background queue or explicitly on a provided queue).

I can understand that expectation shortly after the migration from Swift 4 to Swift 5 (or whatever). However, in 6 months or a year, when you’ve forgotten about the fact that it happened to be implemented with callbacks, this will not be obvious. Nor would it be obvious to the people who maintain the code but were never aware of the original API.

We should design around the long term view, not momentary transition issues IMO.

Also, a related question I have: Will / should it be possible to mix-and-match CPS and async/await style for system APIs? I would say yes, so that we can transfer to the new async/await style at our own pace.

The proposal does not include any changes to system APIs at all, such a design will be the subject of a follow-on proposal.

-Chris

···

On Sep 11, 2017, at 4:19 PM, Marc Schlichte <marc.schlichte@googlemail.com> wrote:

Am 07.09.2017 um 07:05 schrieb Chris Lattner via swift-evolution <swift-evolution@swift.org <mailto:swift-evolution@swift.org>>: