SE-0311: Task-local values

ktoso · April 19, 2021, 1:26pm

This isn't a security thing; it's a "c'mon, don't do that" nudging

Anything in the same process is screwed if an attacker gets enough control to inspect and poke around in its memory like this.

I don't agree we should be exposing such API unless some real and strong use-cases arise for it, and so far over discussions with tracing, instruments and other teams the need did not arise.

Tools like debuggers and instruments of course could allow inspecting the task locals, but they don't need a public user-facing API to do so.

slashmo · April 19, 2021, 2:11pm

What is your evaluation of the proposal?

+1 game-changer for context propagation

Is the problem being addressed significant enough to warrant a change to Swift?

Does this proposal fit well with the feel and direction of Swift?

Yes. For distributed tracing, we already tried going the manual context passing route but doing so implicitly makes for a much cleaner API.

How much effort did you put into your review? A glance, a quick reading, or an in-depth study?

I read through the proposal multiple times and poked around with the implementation.

Use-Case: Distributed Tracing

This is really great for distributed tracing and context propagation in general. I was able to implement "automagic" Baggage propagation through task locals (Add async versions of `withSpan` API by slashmo · Pull Request #49 · apple/swift-distributed-tracing · GitHub). This allows libraries to add tracing support without changing their public API. For example, let's say a web framework wants to add tracing and propagate a span context to its handler functions. Previously, the handlers would need to accept a LoggingContext. Now that Baggage is stored inside a task local the handler can simply start a new span and it automatically is the child of the root span created by the web framework:

// web framework
await InstrumentationSystem.tracer.withSpan(
  routeString, 
  ofKind: .server, 
  at: request.timestamp
) { span in
  let response = try await route
      .handler(request, requestContext)
      .convertToResponse(request: request, context: requestContext)

  context.eventLoop.execute {
    context.writeAndFlush(self.wrapOutboundOut(response), promise: nil)
  }
}

// handler function
private func getItems(request: Request, context: RequestContext) async throws -> Response {
  let items = try await InstrumentationSystem.tracer.withSpan("Get all items") {
    try await self.database.getItems()
  }

  return try Response(encoding: items, using: jsonEncoder, request: request, context: context)
}

The RequestContext argument still remains to pass along things like the eventLoop and allocator. Other frameworks may choose to include these in their Request object directly.

The above example results in 3 spans:

/items
- Get all items
  - SELECT storage.items

Without the user having to worry about propagating Baggage at all.

Swift Distributed Tracing: Async PR
Swift Distributed Tracing Baggage Core: task-locals branch
Swift FruitStore Example (Contains the above snippet and more)

Lantua · April 19, 2021, 2:29pm

Thanks, that clears things up a bit. I think we're losing some "structure" this way, though. Since Task 3 can outlive the Task.withLocal that setup value2. That's why I first thought that ChildTask created this way takes local storage from the withTaskGroup that spawns it.

orobio · April 19, 2021, 3:10pm

Would it be possible to fallback to a thread specific task, so a task is guaranteed to always be available?

I can imagine this does not fit the model, because Task seems very much connected with async. Another way could be to not add the task-local values API to Task, but introduce a Context type, which can be used regardless of a task being available or not. In the case of a task being available, Context will be task-local. If not, it will be thread-local.

ktoso · April 19, 2021, 3:43pm

Oh boy, we may have noticed a real issue here... I'll give it some sleep and revisit in the morning, thank you for being persistent on that one. Not unsolvable but perhaps needs more implementation smarts...

ktoso · April 21, 2021, 3:01am

Following up on:

Lantua:

    Task.withLocal(\.key, value2) { // ktoso: Task 1
      // value2 // ktoso: yes
      group.spawn { // ktoso: Task 3
        // value1 ? // ktoso: no; 
                    // by the same rules as in Task 2, it is value2
      }
    }

Thank you for spotting the issue @Lantua

We discussed this a little in the team and I poked around what would be the best way to handle it.

We concluded: this is illegal and will proactively crash. Because we're not able to guarantee the rigid task-allocation bound structure of task local values as the task "escapes" the scope of the value binding, at which point the task local item will be deallocated.

Specifically, the following pattern will be made to crash with an informative error (complete error listed below):

    await withTaskGroup(...) { group in
        await <task-local>.withValue(1234) { // BAD!
            group.spawn { ... }
        }
    }

while the following ones are correct:

a) Setting the value for the entire group:

    await <task-local>.withValue(1234) { // OK!
        await withTaskGroup(...) { group in
            group.spawn { ... }
        }
    }

b) Binding the value inside of the specific child task:

    await withTaskGroup(...) { group in
        group.spawn {
            await <task-local>.withValue(1234) { // OK!
                ...
            }
        }
    }

I will add wording about this to the proposal and have the implementation adjustment prepared already.

Continue reading for more details and discussion:

In theory this could be "solved" (scary quotes on purpose, as none of those solutions are viable solutions) in two ways:

a) reads of the task-local value are racy; and depending on if it still is stored in the parent or not, it might be there in the child. This is terrible, it breaks the promise of "just works", forces locks into lookup implementation and generally is just a bad idea to have lookups be racy.
b) it would be possible to detect that we're running inside of a task group, and then heap allocate the task items, reference count them and keep them retained until all tasks that MAY observe these values have been destroyed. This is very complex and causes a huge performance cliff from an implementation of tasks and task locals which otherwise never have to hit global allocation.

None of these "solutions" are complete or viable solutions. And the core of the issue is that such usage is just incorrect and against our notion of structured concurrency - for better or worse.

The reality of this pattern though is... that it's wrong, and thus we should crash. It is a bit hard to make this a static check, we today we opt to just crash in this bad situation and direct the developer to do the right thing.

Thanks to @Joe_Groff for suggesting that this is a fine way to handle this.

The error will also show up more nicely in Xcode UI (as we're using the same technique in other places of the async runtime).

An error would look something like this (subject to change, it's quite long but explains the crash very well):

error: task-local value: detected illegal task-local value binding at /Users/ktoso/code/swift-project/swift/stdlib/public/Concurrency/TaskLocal.swift:68.
Task-local values must only be set in a structured-context, such as: around any (synchronous or asynchronous function invocation), around an 'async let' declaration, or around a 'with(Throwing)TaskGroup(...){ ... }' invocation. 

Notably, binding a task-local value is illegal *within the body* of a withTaskGroup invocation. 

For example:

    await withTaskGroup(...) { group in
        await <task-local>.withValue(1234) { // BAD!
            group.spawn { ... }
        }
    }

is illegal and should be replaced with either:

a) Setting the value for the entire group:

    await <task-local>.withValue(1234) { // OK!
        await withTaskGroup(...) { group in
            group.spawn { ... }
        }
    }

b) Binding the value inside of the specific child task:

    await withTaskGroup(...) { group in
        group.spawn {
            await <task-local>.withValue(1234) { // OK!
                ...
            }
        }
    }

Zhu_Shengqi · April 21, 2021, 3:26am

I have some doubts about using property wrapper for task-local values declaration:

IMO property wrapper is just some syntactic sugar in this case. Although it may be convenient but we may also need a more formal way (maybe more verbose) to declare keys for such an important concurrency feature.
With the property wrapper approach, we can only define default value once. What if different callers of some specific task-local value want different default values on each call site? An optional value can give some advantages here:
1. we can tell from the call site whether a specific task-local value exists;
2. we can use Task.local(#someKey) ?? defaultValue pattern to designate a default value easily on every call site.

ktoso · April 21, 2021, 3:37am

I shared my own concerns about it as well; the type-key style is much safer potential misuse. I don't think this feature necessarily warrants new keywords in the language, we'll quickly explode in complexity there as keys need to have options etc.

At this point I'm really looking forward to the core-team's preference here, so we'll see.

With any approach that defines a default value you define the default value once. Be it the wrapper or any other mode - including the initial proposal with types as keys.

You're free to define a value as optional and make the default nil -- giving control over the type allows the author of a type control over this. I.e. you can achieve what you're asking for like so:

@TaskLocal static var hello: String? = nil

with the usage being Library.hello ?? "default hello", as simple as that and no special things have to be introduced for it.

I argue that for many cases this does not matter, as many values inherently have some "default or undefined" value. While priority isn't a task local (because it has optimized storage as it's so frequently accessed), if it were, it'd be just that: @TaskLocal var priority: Task.Priority = .unspecified. The same is popular for "container types" such as "headers" or "permissions"; often it is much simpler to work with the empty-object rather than every single call site having to ?? .empty.

The design with allowing users control over the task local type, allows authors who have a very good understanding of their key's usage to define it in useful ways: be it as optional or not.

Chris_Lattner3 · April 21, 2021, 4:02am

Thank you for the iteration on this!

+1, me too! This is shaping up very nicely

Awesome.

Ok, I see three options here:

Reject these things by requiring async (current plan).
Make this a dynamic error (like a force unwrap) when done in a sync context but not on a task.
Try to define the behavior when on a sync context without a task.

I think it would be very valuable to make it so we can do sets from sync functions.

Have you investigated path #3 at all? On the surface, it seems unwise because the global legacy world has no structure. On the other hand, unifying and defining a model for that would make a lot of other things that build on top of TLS simpler, and Swift over time will converge to "everything being on a task" even though Swift-initially will not be there. #2 and #3 are both "safe" in this respect, but I'm curious if there is a model that is definable that works here. I assume that all the "setters" for TLS have to be synchronization safe anyway, because subtasks could be concurrently accessing the structures, so defining a model here (even if it is weird for the legacy case) may not actually be a problem.

If that isn't practical, then I think it is worth considering #2. While we could work around it, this will also drive a bunch of functions to be artificially marked as async, which then drives async pollution up the call chain. It would be really great if we could fix this and have a unified approach, either with a trap on the bad case or by defining the bad case to being "merely weird, but defined".

I think it is also worth considering what the global/static variable model would/should be, because the most natural way to handle it is to make globals/statics be implicitly task local when accessed from a task, but be the actual process-global value when not. I am not sure what current thinking is about this and we haven't seen a proposal here. This is currently /the/ big memory safety problem with actors and structured concurrency, and I haven't heard an alternate model that seems likely to be accepted.

Just to share my train of thought - my intuition on this was that the TLS design is ultimately like a threadsafe scoped dictionary, and Dictionary in Swift has getters and setters. The getter does have a "with default value" version, but the default one returns a optional with a nil on a missing entry. This seems like the conceptual model we should align around.

That said, if the complexity can be handled with a property wrapper approach, then that could significant change the calculus by defining away a lot of the boilerplate. I'll look for the next iteration to see how this settles out.

To argue against myself, "global" variables with initializers in Swift have lazily initialized default values, so unifying TLS and globals together in a single model would argue for allowing programatic default values.

awesome

Which keys do you expect to be privileged, and how do you get people to admit that they shouldn't be privileged? This seems very similar to static constructors - everyone building a library thinks it is ok if they use them, but think that no one else should. In practice, libraries get accumulated into massive applications and nothing is really that important ;-)

That works for me. This seems like the right thing to suffer. I think that your approach makes this work: it isn't the default, and copying them over makes the lifetime of the other values more clear.

Sounds like the right tradeoff. N

I'd also prefer something like that, but I'll let y'all iterate on it before I share an opinion about one vs the other. Kneejerk reaction is that exposing the $ name seems sad. Maybe the property wrapper should have a get/set member explicitly (yuck) or expose some other fancy projection that handles it.

-Chris

orobio · April 21, 2021, 7:19am

This is what I meant earlier, although reading it back now, I might've been a bit cryptic about it.

If we view a thread as being just a single task, task-local values seem to map into that just fine. However, since Task is connected to async, in that case it's perhaps better to separate this from the Task type and use for example a Context type:

// Based on the originally proposed syntax without property wrappers

func asyncPrintRequestID() async {
  let id = Context[\.requestID]
  print("request-id: \(id)")
}

func syncPrintRequestID() {
  let id = Context[\.requestID]
  print("request-id: \(id)")
}

// Setting the context in an async function will
// always set a task-local value
func asyncPrintRequestTestCase() async {
  withContext(\.requestID, boundTo: "1234-5678") {
    await asyncPrintRequestID() // prints: 1234-5678
    syncPrintRequestID()        // prints: 1234-5678
  }
}

// Setting the context in a sync function will set 
// a task-local value or a thread-local value depending
// on whether a task is available or not
func syncPrintRequestTestCase() {
  withContext(\.requestID, boundTo: "1234-5678") {
    syncPrintRequestID()        // prints: 1234-5678
  }
}

This unifies the concept for sync and async contexts, and has the advantage that await can be dropped when setting the value (which was synchronous anyway).

I'm not familiar with the implementation, so this might not be implementable at all in a reasonable way, but conceptually this looks good to me. Or did I miss anything that makes this unsound?

ktoso · April 22, 2021, 12:35pm

Good morning everyone,
I took all the feedback from the thread and implemented all the main new features, API changes and runtime checks.

The compile time check wrt. a task local property wrapper only being allowed on a static property we cannot check just yet, but if we decide we want to, this also would be doable.

I asked the team how to best handle such large change to the proposal surface API and they suggested we focus on the revised proposal in the same thread.

Review of updated proposal, version 4

This revision introduces a property wrapper based key declaration style, as many requested in the thread:

enum ExampleTrace { 
  @TaskLocal(default: TraceID.empty)
  static var traceID
}

struct TraceID {  
  static var empty: TraceID? = nil
}

// ------
ExampleTrace.traceID.get() // -> TraceID?
ExampleTrace.traceID.withValue(TraceID(...)) { ... }

Please review the updated proposal:

updated proposal, version 4: (Pull Request and diff #1337) and readable view of updated proposal

And you can have a look at the associated implementation changes:

implementation changes and tests aligned with the proposal: https://github.com/apple/swift/pull/36959

The main questions are:

the property wrapper approach offers less boilerplate but adds some costs:
- it is possible to make the mistake of not making the property static; this may be confusing to debug because each instance of such declaration is an unique key, so no lookups will work; We should diagnose this as a compile time error if possible yet no mechanism exists for this yet. We could extend the property wrapper "_enclosingInstance" mechanism to support this requirement.
the property wrapper declarations are very short but a bit weird with having to omit the return type and leaving it to type inference to fill in the right type.
It is now possible to bind task-locals within a non async context via the UnsafeCurrentTask this approach was always planned but is now also implemented; Opinions on exact shape of this API are welcome though it must use the UnsafeCurrentTask.

Thanks in advance and please let me know @John_McCall if we should split off this iteration somehow more formally. @Douglas_Gregor suggested just marking with a "big post" in the thread would be good enough for these kinds of iteration; if necessary, extending the review period though I'd hope we can fit in the initial timeframe (?).

bzamayo · April 22, 2021, 12:49pm

This immediately stood out to me as awkward. Is it not possible to declare like @TaskLocal static var traceID: TraceID?

ktoso · April 22, 2021, 12:51pm

That is not possible because of how property wrappers work.

The actual type is explained in the proposal, it is TaskLocal<TraceID?>.Access.

Without this we end up in terrible use-site APIs which sometimes have to write $traceID and sometimes just traceID. It also makes the "read" operation seem trivial: Example.traceID which is not something I think we should do for this API.

bzamayo · April 22, 2021, 1:02pm

I see. Well, in that case, I'd say the previous approach was better :/

I guess you can argue that the locals will be used in code more often than they are declared, so maybe we should optimise for the use-site ...

pertti · April 22, 2021, 6:23pm

I feel like I'm missing something important here, but if the property wrapper solution becomes that weird, wouldn't it be better to just have variables of type TaskLocal<T> instead? That example would then become something like

enum ExampleTrace {
    static let traceID = TaskLocal<TraceID?>()
}
ExampleTrace.traceID.get() // -> TraceID?
ExampleTrace.traceID.withValue(TraceID(...)) { ... }

Rolls off the tongue better, IMO. TaskLocal would have an init(defaultValue: T), and when T is Optional, it would also have an init() that sets nil as the default value.

pyrtsa · April 22, 2021, 6:49pm

How about a property wrapper where wrappedValue: Value was made unavailable and all access thus basically forced to go through projectedValue (of type TaskLocal<Value>.Access)?

Then code using the task local would—for better or worse—all go through a $-prefixed variable, but on the other hand you could set the default value with a natural looking syntax via init(wrappedValue:) such as:

@TaskLocal var request = RequestID.empty
@TaskLocal var traceID: TraceID?  // implicit nil for ExpressibleByNilLiteral

ktoso · April 22, 2021, 11:36pm

That’s an interesting idea - thank you.

I don’t love the $ prefix, but at least it is consistent then, and error messages guide people to the right thing to do. It seems like a viable approach to me.

How do people feel about that?

Alejandro_Martinez · April 23, 2021, 12:52pm

Personally I feel like declaring a global variable (or inside a "namespace" enum that a dev makes up) seems too magical. I know there is a type/wrapper but still, is hard to see the connection with what's going on.

Instead I feel that the proposed solution was more clear since you extend an existing system provided namespace, so at least you see a connection. Maybe this is just because of the familiarity with SwiftUI's api.

Chris_Lattner3 · April 25, 2021, 5:35pm

I'm not very worried about this. We can always have the compiler "know" about specific property wrappers and have extra semantic checks for Swift.TaskLocal structs. The proper check is that we don't want values of Swift.TaskLocal to be declared as non-global properties or local variables. This can be enforced in the type checker with an adhoc check if needed. It isn't really associated with the property wrapper - that is just the sugar that provides the type declaration.

About the design, why do you need the redundancy with two declarations:

enum ExampleTrace { 
  @TaskLocal(default: TraceID.empty)
  static var traceID
}

struct TraceID {  
  static var empty: TraceID? = nil
}

Is it possible to condense this down to something like this?

enum ExampleTrace { 
  @TaskLocal
  static var traceID : Int
}

ExampleTrace.traceID.get() // -> Int?

This gets back to my question above about "why have a distinguished 'default' value" instead of using optionals? TLS is fundamentally a scoped key/value store like a dictionary. Dictionaries can have values of arbitrary type, but handle the missing case by returning a "optional none" instead of storing a default value. I don't understand why TLS would diverge from that widely used practice.

-Chris

ktoso · April 26, 2021, 12:55am

Yeah, it's pretty trivial to add this and I was about to do so right away.

I did get some some unhappy reactions about doing such one off check for these. It would be nice to get the core team chime in on this check explicitly, because various people on the team seem to have different opinions if this is a good idea or a terrible idea.

Here specifically it is to get the Int? inferred. We could not just default it to nil and get an Int? inferred.

This is the follow change mentioned here, so yes it's possible but it will look like this on use sites:

// @TaskLocal static var empty: TraceID? = nil
ExampleTrace.$traceID.get() // -> Int?
ExampleTrace.$traceID.withValue(1234) {
}

If we do this style of API, it's at least consistent that we always have to use the $. I don't see any way around it, because how types must be used in a property wrapper and it's projectedValue / wrappedValue. As in, we can't have it work with traceID.functions as that must be the wrapped value, and the wrapped value must be the "Access".

All in all it's "okey..." at least it would be consistent to always have to access them via $ for both reading and writing, albeit is is somewhat ugly. It seems though people are used to such APIs already from existing property wrapper APIs which are very prominent in SwiftUI apps.