Un-pyramid-of-doom the `with` style methods

Spinning off the tangent:

Guard seems to be good fit for this task as it establishes a new scope without nesting, can we somehow tweak it so instead of:

func fugazi<T>(_ data: Data, variable: T) -> Int {
    var variable = variable
    return data.withUnsafeBytes { a in
        withUnsafeTemporaryAllocation(of: UInt8.self, capacity: data.count) { b in
            withUnsafeBytes(of: &variable) { c in
                let ap = a.baseAddress!.assumingMemoryBound(to: UInt8.self)
                let bp = b.baseAddress!
                let cp = c.baseAddress!.assumingMemoryBound(to: UInt8.self)
                var sum = 0
                for i in 0 ..< data.count {
                    sum += Int(ap[i]) + Int(bp[i]) + Int(cp[i])
                }
                return sum
            }
        }
    }
}

we could write:

// pseudocode:
func bugazi<T>(_ data: Data, variable: T) -> Int {
    var variable = variable
    guard let a = data.withUnsafeBytes else { fatalError() }
    guard let b = withUnsafeTemporaryAllocation(of: UInt8.self, capacity: data.count) else { fatalError() }
    guard let c = withUnsafeBytes(of: &variable) else { fatalError() }
    let ap = a.baseAddress!.assumingMemoryBound(to: UInt8.self)
    let bp = b.baseAddress!
    let cp = c.baseAddress!.assumingMemoryBound(to: UInt8.self)
    var sum = 0
    for i in 0 ..< data.count {
        sum += Int(ap[i]) + Int(bp[i]) + Int(cp[i])
    }
    return sum
}
3 Likes

Don't we need a notion of non-escapable variable? Such that a variable can only be accessed in the current scope.

withXYZ { ... }-style methods give you scoped access to a resource. There are a couple of reasons why you might need that, and there are already some sketches of ways we could improve them. They rely on transforming these methods in to yield-once coroutines.

So instead of:

func withXYZ<Result>(
  _ body: (SomeResource) throws -> Result
) rethrows -> Result {

  // setup.
  let resource = SomeResource(...)
  // cleanup.
  defer { ... }
  // body.
  return try body(resource)
}

// Caller requires closure scope:

someValue.withXYZ { resource in
  resource.doSomething()
}

You have:

var xyz: SomeResource {
  read {
    // setup.
    let resource = SomeResource(...)
    // cleanup.
    defer { ... }
    // body.
    yield resource
  }
}

// Caller does not need to write a closure scope:

someValue.xyz.doSomething()

That's the basic idea. Of course, to fully tackle scoped resources, it needs to do more:

  1. read/modify/yield do not yet officially exist.

  2. The yield finishes when the property access completes. We need the ability to extend the access, so we can perform multiple operations on the resource without invoking the coroutine multiple times or introducing a closure scope. Basically, we need to bind it to a name, do some work on it, then drop it to end the access.

  3. read/modify (as currently implemented) only apply to properties. Sometimes you need parameters to configure a scoped resource, so it would be cool if methods could also become yield-once coroutines.

  4. Closure scopes are sometimes used as a visual indication that the provided resource has a limited lifetime. Yielded values would lack that.

Happily, there are ideas which could solve all of these issues. For #2, we might introduce borrow variables:

For #3, my hope is that we extend the idea of ref/inout variables to function return types.

For example, I'm currently working on an API for WebURL which provides a view of a URL component as a list of key-value pairs. The view can be configured with a URL component and schema, but this unfortunately means it can't take advantage of modify accessors (properties don't support parameters). This API is only possible using a closure scope:

// Example: working with W3C media fragments:

var url = WebURL("http://www.example.com/example.ogv#track=french&t=10,20")!

url.withMutableKeyValuePairs(in: .fragment, schema: .percentEncoded) { kvps in
   kvps["track"] = "german"
}

print(url)
// ✅ "http://www.example.com/example.ogv#track=german&t=10,20"
//                                              ^^^^^^

I hope users will eventually be able to write something like:

var url = WebURL("http://www.example.com/example.ogv#track=french&t=10,20")!

inout kvps = url.keyValuePairs(in: .fragment, schema: .percentEncoded)
kvps["track"] = "german"
// <drop 'kvps' somehow to end the access to 'url'>

print(url)
// ✅ "http://www.example.com/example.ogv#track=german&t=10,20"
//                                              ^^^^^^

Or even one-liners like:

url.keyValuePairs(in: .fragment, schema: .percentEncoded)["track"] = "german"

To tackle #4, there are sketches for @nonescaping read and modify coroutines, which should help express the idea that the yielded value has a limited lifetime.

So the issue has definitely been noticed. Besides making code more difficult to read, closure scopes are just not enough for important things like enforcing lifetimes. As we gain these new expressive capabilities, we'll hopefully be able to replace uses of the withXYZ { ... } pattern.

12 Likes

Note that the scopes of a, b & c variables in the above two examples is (almost) identical:

i think this assumes the cleanup can happen synchronously, but this is often not the case — i have many APIs that rely on performing a sort of “goodbye” handshake, and this can only be implemented with a with-style closure.

Can you provide a simple example of that?

Edit: I mean something with withUnsafeBytes / withUnsafeTemporaryAllocation / or similar being discussed.

here is an real API i am currently working on:

i commented the “cleanup” lines with // cleanup

extension Mongo.DriverBootstrap
{
    /// Sets up a session pool and executes the given closure passing the pool
    /// as a parameter. Waits for the closure to return, then attempts to gracefully
    /// drain and shut down the session pool, returning only when the shutdown
    /// procedure has either completed or failed.
    ///
    /// When this method returns, all sockets have been closed and it is safe to
    /// immediately re-initialize another session pool, without needing to worry
    /// about any “hangover effects” from the previous session pool.
    ///
    /// Shutdown failures are silent, as they only imply a failure to log out of
    /// active server sessions, which is inevitable in situations such as a network
    /// outage. This means that if the passed `body` closure throws an error, that
    /// error will be the same error observed by the caller of this method.
    public
    func withSessionPool<Success>(seedlist:Set<MongoTopology.Host>,
        _ body:(Mongo.SessionPool) async throws -> Success) async rethrows -> Success
    {
        let monitor:Mongo.Monitor = .init(bootstrap: self)
        await monitor.seed(with: seedlist)
        let pool:Mongo.SessionPool = .init(cluster: monitor.cluster)
        do
        {
            let success:Success = try await body(pool)
            await monitor.cluster.end(sessions: await pool.drain()) // cleanup
            await monitor.unseed() // cleanup
            return success
        }
        catch let error
        {
            await monitor.cluster.end(sessions: await pool.drain()) // cleanup
            await monitor.unseed() // cleanup
            throw error
        }
    }
}

It's definitely a topic worthy of a design and solution...

We're often forced into "with..." style when some cleanup is needed, but the fact that things are then in a closure can be quite problematic.

actor X {

    var state: [Int: Int] = [:]

    func take(_: Bar) async {
        await PseudoSpan.$current.withValue("updating state") {
            self.state[1] = 1 // ERROR: Actor-isolated property 'state' cannot be passed 'inout' to 'async' function call
            return await test()
        }
    }
}

The error is "right" in the sense that if the call suspends, you would have an exclusivity violation on actor state. But really, the only reason such TaskLocal using APIs are using closures is because they need to ensure cleanup, like this:

actor X {

    var state: [Int: Int] = [:]

    func take(_: Bar) async {
        // push a task-local Span
          self.state[1] = 1
          return await test()
        // pop a task-local Span
    }
}

So this is very unfortunate since it makes traces much harder to use when we also mutate state like this.

If we had a form of:


actor X {

    var state: [Int: Int] = [:]

    func take(_: Bar) async {
       let span = using PseudoSpan.start("updating state")
       // setup:
       // push task-local Span
          self.state[1] = 1
          return await test()
        // cleanup:
        // pop a task-local Span
    }
}

The important thing here being that span must be kept alive until the end of the scope, and it would be semantically wrong to clean it up earlier. If it is just a variable we don't really have such strong guarantees where it'll be deinitialized; Though @Andrew_Trick may need to double check me there...

Relying on people manually doing a "pop the task-local" is unsafe since they'll / I'll forget and it'll corrupt the task-local stack. So... currently there is no good way to do this, unless we had some way to ensure "live until the end of this scope" for a variable, be it some using ... or other marker or something else...

So... yeah, I think this is an important topic worthy of a solution.

5 Likes

I don't think there's any reason that yield-once coroutines couldn't accommodate async setup/cleanup work. There might be suspension points when the access begins or ends, but it's not entirely dissimilar to async properties and async let variables.

For instance, I could imagine us introducing something like async borrow/async inout references as siblings of async let.

It’s at least not possible with the current implementation of _read and _modify, for two reasons:

  1. async set and async _modify aren’t allowed, only async get (and maybe async _read, but I haven’t checked yet).
  2. code after the yield is not run if the access throws. This is why code that uses _modify so often also uses defer. You can’t defer { await cleanup() } because you can’t defer await, but you also can’t yield &value; await cleanup() because then it might not always be run. The only option I can think of is defer { Task { await cleanup() } }, which spawns an extra task, and doesn’t even necessarily work depending on how the cleanup function is defined.