Passing large amounts of data between actors without making copies

I've got a Swift 6 strict concurrency question. I'm trying to figure out how to pass a large amount of non-Sendable data between actors without making copies and without making the data Sendable (and without using any of the "unchecked"/"unsafe" escape hatches). Here's the situation in my toy example app:

I've got a @MainActor class AppState and a Worker actor. From AppState I call Worker's generateProducts() method. The Worker makes a big array of non-Sendable WorkProduct objects and then returns them.

Here's the code where AppState calls Worker to generate the products:

And here's the code for Worker's generateProducts() method:

This compiles and works in Swift 6 language mode. Yay! I've got one actor making a bunch of non-Sendable things and returning them to a @MainActor class! Success!

Later, I want my AppState object to pass the array of non-Sendable WorkProduct objects back to the Worker actor for processing, and then have the Worker return the processed WorkProduct objects to AppState.

Here's what I've tried so far, none of which works:

I feel like what I'm doing should be possible—without making WorkProduct Sendable. I guess I'm searching for some equivalent of the borrowing/consuming stuff from non-copyable types, maybe?

AppState and Worker never need to have the big array of non-Sendable WorkProduct objects at the same time. Either one has it or the other one has it. It should never be shared.

It seems like it should be possible to pass this big chunk of data back and forth without making copies (and therefore keeping the RAM footprint from getting bigger than is needed for a single copy of the array).

Anyone have any advice or ideas?

2 Likes

Have you tried nonisolated(unsafe) let localProducts = self.products? It’s a bit tricky way to turn off checks, because compiler cannot (and not sure if ever by able) reason about safety there.

1 Like

I believe this might be one of the use-cases for Span types. I have not looked too closely at Span… but you might want to check this out if you are targeting modern OS platforms.

sending is probably the basic tool you're looking for — if you want to communicate that a particular value is required to be disconnected from other objects, and so it's safe to transfer it to a different isolation, that's what sending is for. It's not unlikely that you'll run into limits of what Swift can prove about it, though, so please keep track of any shortcomings and bring them up here.

10 Likes

As you can see in Worker's generateProducts() method, I am using sending to successfully accomplish the first part: having one actor make a bunch of non-Sendable things and then return them to a @MainActor class.

It's the second part that's tripping me up: passing the array of non-Sendable WorkProduct objects back to the Worker actor for processing, and then have the Worker return the processed WorkProduct objects to AppState. I have not found an application of sendable to the methods involved in this part that seems to make it work.

Ah, yeah, I should have clarified, I'm also trying to avoid using any of the "unsafe"/"unchecked" escape hatches.

Yeah, so ideally what you’d do here is flag that the values have to stay disconnected while they’re stored on the actor, and that is just a feature we don’t have yet. CC’ing @Michael_Gottesman.

As a general rule, you shouldn’t feel bad about using unsafe escape hatches when you need them. I always say that the right way of understanding unsafe features is that there’s a correctness condition that the language isn’t smart enough to take over responsibility for yet. Write your code in as safely-structured a way as you can right now, documenting the preconditions and assumptions that make it actually correct. The goal should be that you know exactly what feature would let you write this safely, such that you could immediately adopt it when it’s available.

14 Likes

My understanding is that we need an ability to express something like [sending Element] to be able to write this without unsafe opt-outs. Otherwise I’m not sure Swift would be able to reason that reference type instance in array is safe to pass.

[sending Element] would be very tricky, and I'm not sure how that substitution would have to interact with generics. Fortunately, I don't think we actually need that here, because the array value seems to be transferred as a whole. You just need the ability to declare something like var products: sending [WorkProduct].

Out of curiosity, why avoid making it sendable?

I tried your example with a modification of merely making WorkProduct sendable (via protecting its internal state with a lock) and everything became easier: compiled out of the box and on top of that I was also able removing sending.

2 Likes

I haven't looked at this class in detail, but just protecting the internal state with a lock is not generally going to provide any meaningful data race protection. You have to actually redesign a class around providing a more transactional API. Without that, I mean, yes, you won't have the undefined behavior of low-level data races, but your code will still be wrong. Sometimes that's the right move, but often it isn't: the cleaner and more compositional thing to do really is to just assume that an object is only being used from a single context at once.

7 Likes

Could you please clarify?

The usage of that class (in the provided example) is basically:

    product.name = generateRandomString(length: 5)

which I changed to:

            product.withLock { product in
                product.name = generateRandomString(length: 5)
            }

And this is the sensibility change I did:

Originally was:

nonisolated final class WorkProduct {
    var id = UUID()
    var name : String? = nil
    var data = Data(repeating: 0xFF, count: 1024)
}

Changed to be Sendable:

nonisolated final class WorkProduct: @unchecked Sendable {
    struct State {
        var id = UUID()
        var name : String? = nil
        var data = Data(repeating: 0xFF, count: 1024)
    }
    private var state = State()
    private var lock = NSLock()
    
    func withLock<R>(_ body: (inout State) -> R) -> R {
        lock.withLock {
            body(&state)
        }
    }
}

What could go wrong with this change?

Or does this qualify as a redesign now that the class API is slightly different?

So, obviously, this is a toy example. My real situation is that I have a potentially very large volume of data that takes a long time to generate, so of course it happens off the main thread (in an actor, in my case).

Later, I present a user interface that allows the properties of this giant collection of data to be viewed and modified in a single UI. (Think of a giant table view with controls embedded in each row.) That user interface is all happening on the main thread, of course, and so it needs access to all the data in order to power the UI.

Later still, all that data will be processed in another time-consuming operation, which again happens off the main thread in the actor.

The large amount of data is non-Sendable, and making it Sendable is nontrivial. If I try to make it Sendable by converting it all to value types, it becomes very difficult to make a read/write UI for all the data. With value types, the SwiftUI Views will either have copies of the data, in which case making changes does not work as expected, or else it's @Bindings as far as the eye can see, which is easy to mess up, IME.

And If I wrap all read/write properties in mutexes to make the data Sendable, it adds quite a bit of mutex-related overhead during the generation and processing steps, even though during those steps only the actor should have access to the data, so there should be no actual contention.

When it comes to ownership of the large amount of data, there are a bunch of ways to handle it. I could make the actor the "owner" of this large amount of data, but then it is a bit inconvenient to reference that data in a bunch of MainActor SwiftUI Views, espeically without making copies of the data.

Or I could make the MainActor the owner. But since the data is both generated and (later) processed in the actor off the main thread, then I have to move that data between actor and the main thread and back—again, preferably without making copies.

What I'm actually doing in my app right now is that the actor "owns" the data, but it uses smaller Sendable containers to ferry the needed data between the actor and the main thread to power the UI. This, of course, involves copying data, but at least they're smaller copies of a subset of the data. And when updates happen in the UI, I do them both in the "local" (smaller) data as well as in the "real" data owned by the actor, which is tedious and an easy place to make mistakes.

So that's why I'm looking for a better solution that lets me keep the large volume of (non-Sendable) data as-is while passing it back and forth between the actor and the main thread as needed, never having more than one copy of it in memory.

Reference types with mutable properties are great for powering a UI without making copies of your data. But reference types are not Sendable without careful work to protect their state. (And sometimes the data is out of our control entirely and we just have to deal with the types we get.)

3 Likes

I think removing all the existing operations from the class and doing everything from closures passed to withLock is likely to be a pretty major redesign for a non-toy example, yes.

5 Likes

Got you. The following version keeps the API intact:

nonisolated final class WorkProduct: @unchecked Sendable {
    struct State {
        var id = UUID()
        var name : String? = nil
        var data = Data(repeating: 0xFF, count: 1024)
    }
    private var state = State()
    private let queue = DispatchQueue(label: "WorkProductQueue")
    
    var id: UUID {
        get { queue.sync { state.id }}
        set { queue.sync { state.id = newValue }}
    }
    var name : String? {
        get { queue.sync { state.name }}
        set { queue.sync { state.name = newValue }}
    }
    var data: Data {
        get { queue.sync { state.data }}
        set { queue.sync { state.data = newValue }}
    }
}

Although I'm not sure how well will that work with 1000s of WorkProduct instances.

Sure. And thank you for stripping the code down to the minimal example and a thorough explanation!

Hmm... If you are sure that only one actor accesses data... what would happen if you just make the class @unchecked Sendable without actually protecting the state? And if you make a mistake then Swift exclusivity checker will tell you?

Yeah, like I said earlier, I could used @unchecked Sendable and similar constructs that say "trust me, I'm handling it" but I prefer to let the language/compiler handle it for me.

When I started working on this, I assumed it was a common situation: a large amount of data with time-consuming work done on it, so of course it happens off the main thread. But, of course, the UI for presenting/changing that data is on the main thread.

I thought it would be so common that there'd be an obvious solution/pattern for dealing with this situation without making copies of the large amount of data (and without requiring all the data to be Sendable). But if there is one, I haven't found it yet.

3 Likes

Making the type @unchecked Sendable is definitely not the right solution, because it really is important to learn about places where you might be sharing these objects unwittingly between concurrent contexts.

In this case, here's what you're trying to do:

  1. transfer the array out of self.products,
  2. send it to worker.processProducts,
  3. receive the new array back from worker.processProducts, and
  4. assign that back into self.products.

So you want to have an invariant that self.products holds a disconnected value, and then you want processProducts to both take and return a disconnected value. For the latter, you're just missing the sending return value. For the former, this is something that we can't express yet in Swift, but you seem to understand what the rules on it would need to be, so I'd say to just encapsulate your uses of it like so:

private func claimProducts() -> sending [WorkProduct] {
  // There's an invariant that self.products remains disconnected,
  // but Swift doesn't let us express that yet, so we have to use this
  // workaround.
  nonisolated(unsafe) let products = self.products
  self.products = []
  return products
}

private func setProducts(_ products: sending [WorkProduct]) {
  self.products = products
}

You could also just encapsulate that in a little ~Copyable type. Eventually, Swift will support sending properties directly, and you'll be able to get rid of this.

14 Likes

What happens with the UI (of the real app) if it wants to re-render (for some reason) and needs accessing products?

    // Approach 1:
    //self.products = await worker.processProducts(self.products)

With this approach if UI update happens at the same time you'll get "simultaneous access" violation, no?

    // Approach 2:
    let localProducts = self.products
    self.products = []
    self.products = await worker.processProducts(localProducts)

and with this approach, the UI (should it want to update for some reason) will render an empty list?

It helps considering a use case of "processing" taking long time (say, a minute). What happens to the UI during that time (if you scroll, etc)? It must show something, right?

It doesn't matter SwiftUI or UIKit - if it wants the data on the main actor and that data is currently "owned" by another actor that's a violation of your desired goal... Looks like you have to pick either "copying" or "synchronising" (whichever is less evil in your case).

2 Likes

The original sending SE proposed a Disconnected type for the stdlib, which would translate sending into ~Copyable & Sendable and back again; something like:

public struct Disconnected<Value: ~Copyable>: ~Copyable, @unchecked Sendable { // swiftlint:disable:this unchecked_sendable
    private nonisolated(unsafe) var value: Value

    public init(_ value: consuming sending Value) {
        self.value = value
    }

    public consuming func consume() -> sending Value {
        value
    }

    public mutating func swap(_ other: consuming sending Value) -> sending Value {
        let result = self.value
        self = Disconnected(other)
        return result
    }

    public mutating func take() -> sending Value where Value: ExpressibleByNilLiteral {
        let result = self.value
        self = Disconnected(nil)
        return result
    }

    public mutating func withValue<R: ~Copyable, E: Error>(
        _ work: (inout sending Value) throws(E) -> sending R
    ) throws(E) -> sending R {
        try work(&self.value)
    }
}

Writing it yourself obviously violates your "no unsafe code" maxim, but it is probably the piece you need.

You can already use Mutex to provide this functionality, though (you could use Mutex to make an inefficient but all-safe implementation of Disconnected).

All provided you don't run afoul of Can't send into Mutex · Issue #81546 · swiftlang/swift · GitHub or Can't forward `inout sending` argument to another function · Issue #82553 · swiftlang/swift · GitHub though! sending is pretty broken.

9 Likes

@John_McCall I'm not sure how to incorporate your claimProducts() and setProducts() approach into the code of my sample project. Can you elaborate?