[Concurrency][Pitch] Task Local Values

ktoso · December 15, 2020, 3:58am

Hello everyone,

We'd like to share with you this pitch introducing Task Local Values to the Swift's concurrency model.

Task local values provide, a much needed, missing piece in the Task infrastructure puzzle. It enables instrumentation, profiling and tracing tool authors to build truly great contextualized experiences for debugging, profiling and tracing codebases using asynchronous functions and actors.

At the same time this proposal avoids pitfalls of similar APIs thanks to embracing Swift's Structured Concurrency approach.

Please refer to the complete & up-to-date pitch document here.

Lantua · December 15, 2020, 5:15am

About the @Environment comparison:

and we may need to set multiple values at the same time, the API design becomes a not as clean. It is not trivial to provide a variadic yet type-safe version of Task.with(key2:value1:...keyN:valueN:) API

What if we drop the key and directly use the type:

protocol TaskLocalProtocol {
  associatedtype Value
  static var defaultValue: Value { get }
}

enum Key1: TaskLocalProtocol {
  static var defaultValue: Int { ... }
}

...

await Task.with(Key1.bound(to: 1)) {
  ...
}

which should be essentially the same as the example:

await Task.with(example.bound(to: "A"),
                luckyNumber.bound(to: 13)) {
  // ... 
}

Could even net us some neat wrapper:

@TaskLocal<Key1> var ...

royhsu · December 15, 2020, 5:55am

Maybe use SwiftUI-like modifier to compose local values without variadic generics?

await Task
  .local(\.example, "A")
  .local(\.luckyNumber, 13)

ktoso · December 15, 2020, 8:15am

Thanks for chiming in, let us analyze the API proposal:

The write-side looks okey here and it's an interesting idea but i feel it breaks down a bit when we look at it holistically.

I'm totally open to other shapes of the API by the way if people have better ideas – it would help a lot if when pitching alternative shapes we consider all "sides" of the API: declaration, read side, write side (including multiple values), otherwise it's easy to make one of the sides look great at the expense of other ones.

Write side: Yes, you're right the write side ends up equivalent to what the value handles express. It's the same amount of boilerplate and has the same challenges around "why not just Key.bound(to: ...) { ... }?" which I'll answer pre-emptively: because I'm trying to keep all "creates and modifies tasks" Task APIs on the Task namespace, so they are simple to discover. Maybe that's not so important though.

Read side: With your proposel to just do a key type, the read sides becomes a bit ugly:

let x = await Task.local(Key.self)

Which is the shape that Swift UI and Baggage from Distributed Tracing specifically avoided, after many months of bikeshedding

What those libraries then end up doing is asking developers to make the keys private, and control access via a computed property like this:

private enum ThingKey: ... {}
extension ... { 
  var thing: Thing { 
    get { ... } 
    set { ... } 
  }
}

which is how one arrives at those \.thing APIs eventually.

The issue is that... we can't express what we need using such API shape!

... because access to a task local value must only be performed from within a task, i.e. the functions for reading and binding it must be async functions. Swift does not allow for async accessors. And even if it did (maybe we'll allow async getters), then the "set" operation is also wrong, since we must introduce new scopes when we bind values -- we cannot just "set" them (it'd break the model explained in Detailed design).

I also think when used with a more realistic type and key it becomes tricky how we'd namespace those things. Let's try to stick to RequestID as that's a pretty simple concept but it also includes it's own type already (say we have some RequestID type), then the example above becomes:

struct RequestID { ... } 

enum RequestIDKey: TaskLocalKey {
  static var defaultValue: RequestID? { nil }
}

since we had to disambiguate the actual type from the key we use to refer to it... So I guess we'd need enum TaskLocalValues {} onto which people can put the keys – that's not too bad, and somewhat SwiftUI consistent.

I'm not sure we like the "ugly read" or if we should go all-in on await TaskLocalValues.RequestID.get(). That could work.

Looking forward to more feedback to get a feel what API people would be comfortable with. I'm really not that much married to the API shapes here, as long as the internal design keeps the guarantees as outlines in the proposal.

I'll play around with this some more, and would welcome more complete examples how people feel such shape could work out well in practice.

I don't think property wrappers are the right way to approach this feature.

It is not really right to think about them being "stored" anywhere else other than "in the task." I tend to think about SwiftUI's environment as "top/down" while Task Locals are more like "from beginning to end of task" and that's a small difference on paper, but huge one in how those values are used.

To clarify though: A property wrapper implies there has to be a property defined for it somewhere, meaning, there is some storage in some type allocated for it -- but that storage is always a lie. It cannot contain the actual task local value, because that depends on who (what task) calls it. And even if we said the property wrapper exposes an async get() only, we cannot prevent people from looking at the $storage property -- which would always contain nonsense.

ktoso · December 15, 2020, 8:21am

I'm not entirely sure which side of the API this is pitching, but I assume this is about the binding operation?

// There are inherently 3 operations to task locals: declarations, reading, binding.

So in practice this would mean we'd do:

// declare 
enum Example: TaskLocalKey { 
  static var defaultValue: String? { nil }
}
extension TaskLocalValues { 
  var example: Example { .init() } 
}

// bind
await Task
  .local(\.example, "A")
  .local(\.luckyNumber, 13) { 
...
}

// get 
let x = await Task.local(\.example)

which overall seems okey but I'm not in love with that chained builder... I guess it is SwiftUI inspired with how one setts attributes on views?

Would people feel this is a good looking API? We'd get auto-completion for both the read and bind sides which is nice. I can definitely give this a shot as well.

royhsu · December 15, 2020, 8:29am

Yes, this is completely inspired by the SwiftUI-way to set attributes.
I'm thinking more libraries will take the same approach to define their APIs but with different semantics and context.
Eventually, everyone will feel familiar with this chaining style.

ktoso · December 15, 2020, 8:47am

Yeah, that might be true — I’ll give it a shot and see how it looks in some examples.

ktoso · December 15, 2020, 11:35am

Updated the surface API based on feedback, it reads pretty well actually -- thanks for the quick feedback.

The semantics and all rules remain the same as previously.

ktoso:

// declare
extension TaskLocalValues {
  
    public enum RequestIDKey: TaskLocalKey {
      public static let defaultValue: String? { nil } 
    }
    public var requestID: RequestIDKey { .init() }
  
}

// read
let id = await Task.local(\.requestID) ?? "<unknown>"

// bind
await Task.withLocal(\.requestID, boundTo: "1234-5678") {
  await printRequestID() // 1234-5678
}

I'm now working on the implementation, while people have time to chime in and ask questions.

Lantua · December 15, 2020, 1:19pm

Maybe we can have simpler declaration:

// declare
extension Task.DefaultLocalValues {
  public var requestID: String? { nil }
}

Though I didn't understand the original purpose of EnvironmentKey either, so I could be missing something.

I first thought it'd work if we apply it only to local variables (which isn't available yet). I'll need to think about this some more .

Lantua · December 15, 2020, 1:54pm

There's also a role of nil during read operation. Prior, nil represents the lack of binding. Now that everything has a default value, maybe we can remove that, and use default to nil for those that need it instead.

Karl · December 15, 2020, 2:52pm

Could you please document the API you are pitching? It's an interesting manifesto-style document, but a proposal (or proposal draft) needs specifics - declarations.

I will presume from headings like "value.bound(to:body:) implementation" and "get(key:) implementation" that there are functions called bound and get... uhmmm... somewhere. Apparently there is also a TaskLocalValues type and TaskLocalKey protocol, but again - I can't find any specifics, only a general discussion about the approach.

ktoso · December 16, 2020, 1:29pm

Sure, I'll add the API more explicitly, though the reason for this draft pitch is really to get initial feedback on the shape/direction of the API and that's just what happened up-thread, so that's good Getting on the same page about the feature and internal details is as important if not more-so than the surface of it I thought.

I'll update the pitch doc, along with an implementation shortly.

pertti · December 17, 2020, 1:41pm

Could this be used to set the executor of a task, something like

// here we are in some executor E
await Task.withLocal(\.executor, boundTo: .UI) {
    // here we are in the UI executor
}
// here we are back to executor E

(Yes, this is exactly how Kotlin does this. I will immediately admit that I have used Kotlin coroutines extensively, which probably colours my perspective somewhat.)

Besides allowing for jumping between executors like that, having the executor always specified by a Task local would eliminate any ambiguity in how the executor is chosen in different scenarios. Unless otherwise specified, child tasks "inherit" their parent's executor, which is exactly what I would expect to happen. The default executor is explicitly the default value of the Task local, again unsurprising. IIUC this would be an alternative to the Global Actors proposed as part of the Actors proposal, which in my opinion do not really feel like actors.

This would, of course, make Task locals more prominent. We should avoid ending up in a world where every async function that cares about its executor starting with an await Task.withLocal call. Perhaps it could be possible to design a generic way of expressing Task local bindings with function annotations:

@Executor(.UI) func runsOnUIThread() async {}

Task.runDetached could also allow for binding Task locals to somewhat reduce boilerplate (and improve performance by removing the initial hop to the default executor):

let task = Task.runDetachedWithLocal(\.executor, boundTo: myExecutor)
               .withLocal(\.requestId, boundTo: "1234") {
    // No need to Task.withLocal here,
    // task is immediately running on myExecutor
}

This would also make for a somewhat ergonomic yet explicit parallelism API:

async let foo = Task.withLocal(\.executor, boundTo: .threadPool) {
    calcFoo()
}
async let bar = Task.withLocal(\.executor, boundTo: .threadPool) {
    calcBar()
}
frobnicate(foo: await foo, bar: await bar)

Mordil · December 17, 2020, 8:27pm

I'd like to see this be separated as its own proposal (as an extension of this pitch) with an explicit and dedicated method call

Task.withExecutor(myExecutor)
    .withLocal(\.requestID, boundTo: "1234") {
    // No need to Task.withLocal here,
    // task is immediately running on myExecutor
}

ktoso · December 18, 2020, 12:20am

pertti:

// here we are in some executor E
await Task.withLocal(\.executor, boundTo: .UI) {
    // here we are in the UI executor
}
// here we are back to executor E
(Yes, this is exactly how Kotlin does this. I will immediately admit that I have used Kotlin coroutines extensively, which probably colours my perspective somewhat.)

Besides allowing for jumping between executors like that, having the executor always specified by a Task local would eliminate any ambiguity in how the executor is chosen in different scenarios. Unless otherwise specified, child tasks "inherit" their parent's executor, which is exactly what I would expect to happen.

Yes I think that's another very good use case and indeed something we're thinking about with these - I should mention in the use-cases, thanks!

We don't want to jump deep into the executor configuration details in the task locals proposal itself, since we don't have the executor stuff fleshed out yet really. But that is absolutely one of the use cases

ktoso · December 18, 2020, 10:57am

Okey, first batch of updates:

It seems the API that floated out from this short thread actually everyone I've shown it to (so far) quite like, and it feels quite natural. So I've committed to it and using it throughout the proposal and the actual implementation - thanks @royhsu @Lantua
added additional use-cases that folks externally and internally mentioned: executor configuration, instruments etc. Thanks @pertti @Mordil
added more details about how this plays into synchronous functions; this will be doable after ABI of async functions is locked in during 2021; we can then add small new APIs for access of task locals in non-async functions; this would be an addition, and is not strictly necessary for step 1 of these APIs.

Implementation is going pretty well too; I think we'll have a functional version of this before the year ends

Remaining work is about various tradeoffs of the storage strategies. I'm going to explore also a CoW approach which is interesting however may cause a lot of copying in tracing scenarios because it's quite frequent to just mutate a single trace identifier with leaving the rest untouched, which the chain approach handles very cheaply. It's tricky since we don't have tons of existing projects using async to measure the tradeoffs well on a large scale, so we'll have to estimate with synthetic workloads a little bit. The good news is that this is not impacting the usage of this feature.

Looking forward to further comments and thoughts, thanks in advance!

Lantua · December 18, 2020, 12:13pm

extension TaskLocalValues {
  public struct RequestIDKey: TaskLocalKey {
  public var defaultValue: String? { nil } 
  }
  public var requestID: RequestIDKey { .init() }
}

I asked earlier, but do we really need TaskLocalKey? I think having a path in a namespace TaskLocalDefaultValue might be just enough:

extension TaskLocalDefaultValues {
  var requestID: String? { nil }
}

I'm not even sure why we need to create a RequestIDKey (EnvironmentKey uses static function).

ktoso · December 18, 2020, 12:31pm

Thanks for spotting that, fixed now in the snippet you refer to -- that's what I get for writing proposal samples by hand rather than copy from impl

Yeah, the defaultValue is meant to be static (and already is in the proposal, in the protocol definition https://github.com/ktoso/swift-evolution/blob/wip-tasklocals/proposals/nnnn-task-locals.md#declaring-task-local-values ) I missed updating the snippet above, thanks for spotting it.

We indeed don't use actual instances of the keys at all.

I do think though having the key type is nice; it feels a bit weird to use the value of the computed var as "this happens to be the default value if value was not set" with keys we have a place to spell out what this value is. Though I guess that's a minor "win", depends what shape the core team would be comfortable with I guess. Having a type leaves more room for future extension though.

Lantua · December 18, 2020, 2:26pm

I don't like that the variable returns a key instance. It's purely ceremonious as you can just fatalError. I think having it in a TaskLocalDefaultValues should be enough not to get the user confused.

If we want to leave room for extension, maybe we should go full EnvKey, and provide a main subscript endpoint?

struct TaskLocalValues {
  subscript<Key: TaskLocalKey>(_: Key.Type) -> Key.Value {
    ...
  }
}

Then when we declare:

extension TaskLocalValues {
  private enum K: TaskLocalKey { ... }

  var newValue: K.Value { self[K.self] }
}

though I'm not sure what kind of extension do you have in mind. It's kinda hard to judge.

gnuoyd · December 21, 2020, 3:25am

Hi Konrad,

Your proposal brings to mind nested environments in Scheme,

https://groups.csail.mit.edu/mac/ftpdir/scheme-7.4/doc-html/scheme_2.html#SEC9

Suppose that a Swift program starts with an initial environment.
Programs create new environments either implicitly or explicitly:

Implicitly

A new Task always starts with a new environment that is a copy of
its parent's environment. When a Task completes, its environment
is destroyed.

Explicitly

A program creates a new environment and runs a closure in it. The
closure's environment is initialized with a snapshot of the parent
environment and augmented with new bindings

withEnvironment(/* optional bindings */) { /* ... */ }

When the closure completes, its environment is destroyed.

A program can modify the bindings in its environment, but modifications
in an environment cannot be seen in parent or sibling environments.

It seems like an environment of this kind can subsume the functions
of your task-local values without relying on tasks.

One use that I see for environments-minus-Tasks that I see is to
establish some kind of arithmetic mode or floating-point parameters.
E.g.,

withEnvironment(\.epsilon, boundTo: .001) { x == y }

or

withEnvironment(\.roundingMode, boundTo: .down) { speed * time }

Are environments of this kind very different from what you propose?

Dave