Un-pyramid-of-doom the `with` style methods

tera · January 3, 2023, 10:36am

Spinning off the tangent:

Guard seems to be good fit for this task as it establishes a new scope without nesting, can we somehow tweak it so instead of:

func fugazi<T>(_ data: Data, variable: T) -> Int {
    var variable = variable
    return data.withUnsafeBytes { a in
        withUnsafeTemporaryAllocation(of: UInt8.self, capacity: data.count) { b in
            withUnsafeBytes(of: &variable) { c in
                let ap = a.baseAddress!.assumingMemoryBound(to: UInt8.self)
                let bp = b.baseAddress!
                let cp = c.baseAddress!.assumingMemoryBound(to: UInt8.self)
                var sum = 0
                for i in 0 ..< data.count {
                    sum += Int(ap[i]) + Int(bp[i]) + Int(cp[i])
                }
                return sum
            }
        }
    }
}

we could write:

// pseudocode:
func bugazi<T>(_ data: Data, variable: T) -> Int {
    var variable = variable
    guard let a = data.withUnsafeBytes else { fatalError() }
    guard let b = withUnsafeTemporaryAllocation(of: UInt8.self, capacity: data.count) else { fatalError() }
    guard let c = withUnsafeBytes(of: &variable) else { fatalError() }
    let ap = a.baseAddress!.assumingMemoryBound(to: UInt8.self)
    let bp = b.baseAddress!
    let cp = c.baseAddress!.assumingMemoryBound(to: UInt8.self)
    var sum = 0
    for i in 0 ..< data.count {
        sum += Int(ap[i]) + Int(bp[i]) + Int(cp[i])
    }
    return sum
}

tclementdev · January 3, 2023, 12:51pm

Don't we need a notion of non-escapable variable? Such that a variable can only be accessed in the current scope.

Karl · January 3, 2023, 1:31pm

withXYZ { ... }-style methods give you scoped access to a resource. There are a couple of reasons why you might need that, and there are already some sketches of ways we could improve them. They rely on transforming these methods in to yield-once coroutines.

So instead of:

func withXYZ<Result>(
  _ body: (SomeResource) throws -> Result
) rethrows -> Result {

  // setup.
  let resource = SomeResource(...)
  // cleanup.
  defer { ... }
  // body.
  return try body(resource)
}

// Caller requires closure scope:

someValue.withXYZ { resource in
  resource.doSomething()
}

You have:

var xyz: SomeResource {
  read {
    // setup.
    let resource = SomeResource(...)
    // cleanup.
    defer { ... }
    // body.
    yield resource
  }
}

// Caller does not need to write a closure scope:

someValue.xyz.doSomething()

That's the basic idea. Of course, to fully tackle scoped resources, it needs to do more:

read/modify/yield do not yet officially exist.
The yield finishes when the property access completes. We need the ability to extend the access, so we can perform multiple operations on the resource without invoking the coroutine multiple times or introducing a closure scope. Basically, we need to bind it to a name, do some work on it, then drop it to end the access.
read/modify (as currently implemented) only apply to properties. Sometimes you need parameters to configure a scoped resource, so it would be cool if methods could also become yield-once coroutines.
Closure scopes are sometimes used as a visual indication that the provided resource has a limited lifetime. Yielded values would lack that.

Happily, there are ideas which could solve all of these issues. For #2, we might introduce borrow variables:

A roadmap for improving Swift performance predictability: ARC improvements and ownership control

Borrow variables

When working with deep object graphs, it’s natural to want to assign a local variable to a property deeply nested within the graph:
let greatAunt = mother.father.sister
greatAunt.sayHello()
greatAunt.sayGoodbye()
However, with shared mutable objects, such as global variables and class instances, these local variable bindings necessitate a copy of the value out of the object; either mother or mother.father above could be mutated by code anywhere else in the program referencing the same objects, so the value of mother.father.sister must be copied to the local variable greatAunt to preserve it independently of changes to the object graph from outside the local function. Even with value types, other mutations in the same scope would force the variable to copy to preserve the value at the time of binding. We may want to prevent such mutations while the binding is active, in order to be able to share the value in-place inside the object graph for the lifetime of the variable referencing it. We can do this by introducing a new kind of local variable binding that binds to the value in place without copying, while asserting a borrow over the objects necessary to access the value in place:
// `ref` comes from C#, as a strawman starting point for syntax.
// (It admittedly isn't a perfect name, since unlike C#'s ref, this would
// actively prevent mutation of stuff borrowed to form the reference, and
// if the right hand side involves computed properties and such, it may not
// technically be a reference)
ref greatAunt = mother.father.sister
greatAunt.sayHello()
mother.father.sister = otherGreatAunt // error, can't mutate `mother.father.sister` while `greatAunt` borrows it
greatAunt.sayGoodbye()
[...]

It’s also highly desirable to be able to perform multiple mutations on part of an object graph in a single access. If we write something like:
mother.father.sister.name = "Grace"
mother.father.sister.age = 115
then not only is that repetitive, but it’s also inefficient, since the get/set, or read/modify, sequence to access mother, then mother.father, then mother.father.sister must be repeated twice, in case there were any intervening mutations of the shared state between operations. As above, we really want to make a local variable, that asserts exclusive access to the value being modified for the scope of the variable, allowing us to mutate it in-place without repeating the access sequence to get to it:
inout greatAunt = &mother.father.sister
greatAunt.name = "Grace"
mother.father.sister = otherGreatAunt // error, can't access `mother.father.sister` while exclusively borrowed by `greatAunt`
greatAunt.age = 115

For #3, my hope is that we extend the idea of ref/inout variables to function return types.

For example, I'm currently working on an API for WebURL which provides a view of a URL component as a list of key-value pairs. The view can be configured with a URL component and schema, but this unfortunately means it can't take advantage of modify accessors (properties don't support parameters). This API is only possible using a closure scope:

// Example: working with W3C media fragments:

var url = WebURL("http://www.example.com/example.ogv#track=french&t=10,20")!

url.withMutableKeyValuePairs(in: .fragment, schema: .percentEncoded) { kvps in
   kvps["track"] = "german"
}

print(url)
// ✅ "http://www.example.com/example.ogv#track=german&t=10,20"
//                                              ^^^^^^

I hope users will eventually be able to write something like:

var url = WebURL("http://www.example.com/example.ogv#track=french&t=10,20")!

inout kvps = url.keyValuePairs(in: .fragment, schema: .percentEncoded)
kvps["track"] = "german"
// <drop 'kvps' somehow to end the access to 'url'>

print(url)
// ✅ "http://www.example.com/example.ogv#track=german&t=10,20"
//                                              ^^^^^^

Or even one-liners like:

url.keyValuePairs(in: .fragment, schema: .percentEncoded)["track"] = "german"

To tackle #4, there are sketches for @nonescaping read and modify coroutines, which should help express the idea that the yielded value has a limited lifetime.

A roadmap for improving Swift performance predictability: ARC improvements and ownership control

Contiguous collections like Array can provide new subscript operators, allowing access to part of their in-place contents via a BufferView:
extension Array {
    subscript<Range: RangeExpression>(bufferView range: Range) -> BufferView<Element> {
        @nonescaping read
        @nonescaping modify
    }
}
Note that these APIs use the @nonescaping modifier on read and modify coroutines, indicating that when client code accesses the BufferView, it cannot copy or otherwise prolong the lifetime of the view outside of the duration of the accessor coroutine.

[...]

The nonescaping constraint allows BufferView to be safe, while having overhead on par with UnsafeBufferPointer, and @nonescaping coroutines that produce BufferViews provide a more expressive alternative to the withUnsafe { } closure-based pattern used in Swift’s standard library today. Multiple BufferViews from multiple data structures can be worked with in the same scope without a “pyramid of doom” of closure literals.

So the issue has definitely been noticed. Besides making code more difficult to read, closure scopes are just not enough for important things like enforcing lifetimes. As we gain these new expressive capabilities, we'll hopefully be able to replace uses of the withXYZ { ... } pattern.

tera · January 4, 2023, 1:05am

Note that the scopes of a, b & c variables in the above two examples is (almost) identical:

taylorswift · January 4, 2023, 7:19pm

i think this assumes the cleanup can happen synchronously, but this is often not the case — i have many APIs that rely on performing a sort of “goodbye” handshake, and this can only be implemented with a with-style closure.

tera · January 4, 2023, 7:30pm

Can you provide a simple example of that?

Edit: I mean something with withUnsafeBytes / withUnsafeTemporaryAllocation / or similar being discussed.

taylorswift · January 4, 2023, 7:33pm

here is an real API i am currently working on:

i commented the “cleanup” lines with // cleanup

extension Mongo.DriverBootstrap
{
    /// Sets up a session pool and executes the given closure passing the pool
    /// as a parameter. Waits for the closure to return, then attempts to gracefully
    /// drain and shut down the session pool, returning only when the shutdown
    /// procedure has either completed or failed.
    ///
    /// When this method returns, all sockets have been closed and it is safe to
    /// immediately re-initialize another session pool, without needing to worry
    /// about any “hangover effects” from the previous session pool.
    ///
    /// Shutdown failures are silent, as they only imply a failure to log out of
    /// active server sessions, which is inevitable in situations such as a network
    /// outage. This means that if the passed `body` closure throws an error, that
    /// error will be the same error observed by the caller of this method.
    public
    func withSessionPool<Success>(seedlist:Set<MongoTopology.Host>,
        _ body:(Mongo.SessionPool) async throws -> Success) async rethrows -> Success
    {
        let monitor:Mongo.Monitor = .init(bootstrap: self)
        await monitor.seed(with: seedlist)
        let pool:Mongo.SessionPool = .init(cluster: monitor.cluster)
        do
        {
            let success:Success = try await body(pool)
            await monitor.cluster.end(sessions: await pool.drain()) // cleanup
            await monitor.unseed() // cleanup
            return success
        }
        catch let error
        {
            await monitor.cluster.end(sessions: await pool.drain()) // cleanup
            await monitor.unseed() // cleanup
            throw error
        }
    }
}

ktoso · January 4, 2023, 11:59pm

It's definitely a topic worthy of a design and solution...

We're often forced into "with..." style when some cleanup is needed, but the fact that things are then in a closure can be quite problematic.

actor X {

    var state: [Int: Int] = [:]

    func take(_: Bar) async {
        await PseudoSpan.$current.withValue("updating state") {
            self.state[1] = 1 // ERROR: Actor-isolated property 'state' cannot be passed 'inout' to 'async' function call
            return await test()
        }
    }
}

The error is "right" in the sense that if the call suspends, you would have an exclusivity violation on actor state. But really, the only reason such TaskLocal using APIs are using closures is because they need to ensure cleanup, like this:

actor X {

    var state: [Int: Int] = [:]

    func take(_: Bar) async {
        // push a task-local Span
          self.state[1] = 1
          return await test()
        // pop a task-local Span
    }
}

So this is very unfortunate since it makes traces much harder to use when we also mutate state like this.

If we had a form of:


actor X {

    var state: [Int: Int] = [:]

    func take(_: Bar) async {
       let span = using PseudoSpan.start("updating state")
       // setup:
       // push task-local Span
          self.state[1] = 1
          return await test()
        // cleanup:
        // pop a task-local Span
    }
}

The important thing here being that span must be kept alive until the end of the scope, and it would be semantically wrong to clean it up earlier. If it is just a variable we don't really have such strong guarantees where it'll be deinitialized; Though @Andrew_Trick may need to double check me there...

Relying on people manually doing a "pop the task-local" is unsafe since they'll / I'll forget and it'll corrupt the task-local stack. So... currently there is no good way to do this, unless we had some way to ensure "live until the end of this scope" for a variable, be it some using ... or other marker or something else...

So... yeah, I think this is an important topic worthy of a solution.

Karl · January 5, 2023, 4:42am

I don't think there's any reason that yield-once coroutines couldn't accommodate async setup/cleanup work. There might be suspension points when the access begins or ends, but it's not entirely dissimilar to async properties and async let variables.

For instance, I could imagine us introducing something like async borrow/async inout references as siblings of async let.

bbrk24 · January 7, 2023, 2:54pm

It’s at least not possible with the current implementation of _read and _modify, for two reasons:

async set and async _modify aren’t allowed, only async get (and maybe async _read, but I haven’t checked yet).
code after the yield is not run if the access throws. This is why code that uses _modify so often also uses defer. You can’t defer { await cleanup() } because you can’t defer await, but you also can’t yield &value; await cleanup() because then it might not always be run. The only option I can think of is defer { Task { await cleanup() } }, which spawns an extra task, and doesn’t even necessarily work depending on how the cleanup function is defined.