The addition of async/await and actors has greatly enhanced Swift for handling concurrency, and the algorithms project adds numerous useful data types to the language. In this proposal, I want to push the idea that we can start to build on these to add data structures explicitly designed for concurrent systems: shared data types.
Currently, the philosophy of working with data and concurrency in Swift is to copy the data to guarantee exclusivity, pass it off to another thread/queue/task, and fetch the results at the end. In essence, it is a message passing approach. It is encouraged further by the widespread use of value types, which are automatically copied on a function call.
This approach to working with data in a concurrent system is established, and works well, but it isn't the only approach. There is room for safe shared data structures, which can be read and written safely from any thread, and are not susceptible to races, locking delays or clobbering of updates.
To give you a feel for what I am on about, I want to propose such a type here: a branching resource.
A BranchingResource
would be a type with a generic parameter for the payload it carries (ie the resource). The resource would begin with a single branch, called "main" or "trunk" or "truth". The app could add as many auxiliary, named branches as it likes.
In pseudo Swift, the main API would look something like this:
struct Branch: Hashable {
let name: String
static let trunk = Branch(name: "trunk")
}
struct BranchingResource<Payload> {
init(payload: Payload, auxiliaryBranches: Set<Branch>) {}
private(set) var auxiliaryBranches: Set<Branch>
func payload(in branch: Branch) throws -> Payload
mutating func update(_ branch: Branch, with payload: Payload) throws
mutating func mergeIntoTrunk(auxiliaryBranch: Branch, resolvingConflictsWith: Resolver) throws
mutating func mergeTrunk(intoAuxiliaryBranch auxiliaryBranch: Branch, resolvingConflictsWith: Resolver) throws
}
Think of this as a small piece of Git functionality, but for in app data. (Note that it is not a complete history of changes, like Git is.)
The Payload
could be simply Data
, or Codable
types, or even files on disk. The BranchingResource
should include enough hooks for these possibilities. (It would even be possible to add asynchronous API that carries out tasks like querying a network resource during merging.)
Without going into the implementation in detail, such a type tracks the branches, and most importantly, ensures that each branch keeps a so-called "common ancestor" with the trunk. This decoupling of the trunk from other branches is what makes this data structure lockless, and safely sharable. What happens on one branch is independent to what is happening on another. Only when a particular branch wants to get the latest changes from other branches, is a merge carried out, and it is completely at the discretion of the branch owner.
The merging is a 3-way merge, like in Git. Given the two new branch payloads, and the common ancestor, the Resolver
can choose how to merge the payloads. The Resolver
would be a protocol so that users could design custom merge logic, but some standard resolvers could be included, like last-change-wins.
The advantage of an abstraction like a branching resource is that it decouples components of an app. Is there a component downloading something, but at the same time your UI is busy in some modal state which makes it difficult to handle the downloaded data? No problem, make a BranchingResource
for the downloaded data. The "network" branch keeps the downloaded data until you are ready to merge it into the "ui" branch. If the app user also made changes while the downloading was taking place, they won't be clobbered by the network download. Each merge has a common ancestor, and you can always determine with a diff what has happened in each branch, and merge at the desired granularity.
I want to preempt a question I'm sure many will have: many branches does not necessarily mean many payloads. In the worst case, you can have a copy of the payload for each branch, in addition to common ancestors. Although that situation can arise, it is usually short lived, because once the trunk is merged into a branch, and the branch merged back into the trunk, the two are exactly the same. The branch payload, the trunk payload, and the common ancestor are all the same, so only one payload is stored. In a situation where all branches are full merged into the trunk, there is only a single payload, exactly the same size as a non-branching resource.
I realize this is quite a break from the types of data structures already implemented in Swift, and from the concurrency aspects too. A BranchingResource
brings the two together. Just as actors help to make concurrency simpler, a BranchingResource
type would help making working with shared data safer and simpler.
Ideas?