Swift-CowBox: Easy Copy-on-Write Semantics for Structs

Swift-CowBox 0.1.0

Swift-CowBox is a simple set of Swift Macros for adding easy copy-on-write semantics to Swift Structs.

Background

Since the early days of Swift, most engineers have had an important choice to make when modeling the basic building blocks of their state: do we choose structs (value types), or do we choose classes (reference types)?[1] Suppose we have a Contacts application for storing collections of People. We might have a simple model type to represent one person:

//  Here is a Person Struct.
public struct Person {
  public let id: String
  public var name: String
}
//  Here is a Person Class.
public class Person {
  public let id: String
  public var name: String
}

It's not an arbitrary distinction; structs and classes come with legit tradeoffs. One of the important benefits from modeling data with immutable value types (like structs) is what James Dempsey calls β€œlocal reasoning”:

Assigning a value [type] to a constant or variable, or passing a value into a function or method, always makes a copy of the value. […] Being able to look through code in a single spot and figure out what is going on is called local reasoning. […] One advantage of using value types is that you can be certain no other place in your program can affect the value. You can reason about the code in front of you without needing to know what else is happening elsewhere.[2]

When choosing between structs and classes, Apple recommends choosing structs by default.[3] In addition to the benefits of local reasoning, we also see performance benefits from using structs in Swift Collections (like Array) that can opt-out of expensive bridging needed to support objects.[4]

One benefit of object-oriented programming we might lose by modeling data with value types is the ability to quickly copy objects by reference. If our data was modeled as a class, passing data from one place to another means copying one pointer (8 bytes on a 64-bit platform). Our Person example is simple, but suppose we have a larger model type with (potentially) hundreds (or thousands) of bytes saved in stored properties. Passing a struct from one place to another means we are copying all those bytes. If we have very large data types (or we are copying many times), this memory pressure can lead to the system terminating other apps that might be running in the background, or even terminating our app while running in the foreground.[5]

If we want to keep the benefits of modeling our data with an immutable value type (like the ability to reason locally about our code), but we want to leverage object-oriented programming for faster copying, a copy-on-write data structure might be the right direction for us.[6] With a copy-on-write data structure, we preserve value semantics while leveraging object-oriented programming β€œunder the hood”. To put it another way: the β€œinterface” of our type β€œpresents” as an immutable value type, but the private β€œimplementation” of our type is an object reference.

If you've used the Swift Standard Library Collections (like Array), then you’ve already seen copy-on-write in action! The Array is a value type from the perspective of the public interface, but it’s built on an object reference internally.[7] When we pass an instance of an Array β€œby-value”, the Array instance copies an object reference. We don’t actually copy all N objects in the Array until a mutation occurs.

Writing our own copy-on-write data structures has always been an option, but meant writing (and maintaining) a lot of repetitive boilerplate code. Leveraging Swift Macros[8], we can finally make it easy to add copy-on-write semantics in just a few steps.

Requirements

Swift-CowBox builds from Swift 5.9.2 (and up) and Swift-Syntax 509.0.0 (up to 600.0.0). There are no explicit platform requirements (other than what is required from Swift-Syntax). Please file a GitHub issue if you encounter any compatibility issues while building or deploying.

Usage

Start by importing the Swift-CowBox package as a dependency. Here is an example from Swift Package Manager:

// swift-tools-version: 5.9.2

import PackageDescription

let package = Package(
  name: "MyPackage",
  platforms: [
    .macOS(.v10_15),
    .iOS(.v13),
    .tvOS(.v13),
    .watchOS(.v6),
    .macCatalyst(.v13),
  ],
  dependencies: [
    .package(
      url: "https://github.com/swift-cowbox/swift-cowbox.git",
      from: "0.1.0"
    )
  ],
  targets: [
    .target(
      name: "MyPackage",
      dependencies: [
        .product(
          name: "CowBox",
          package: "swift-cowbox"
        )
      ]
    ),
  ]
)

Let’s see the macro in action. Suppose we define a simple Swift Struct:

public struct Person {
  public let id: String
  public var name: String
}

This struct is a Person with two stored variables: a non-mutable id and a mutable name. Let’s see how we can use the CowBox macros to give this struct copy-on-write semantics:

import CowBox

@CowBox public struct Person {
  @CowBoxNonMutating public var id: String
  @CowBoxMutating public var name: String
}

Our CowBoxNonMutating macro attaches to a stored property to indicate we synthesize a getter (we must transform the let to var before attaching an accessor). We use CowBoxMutating to indicate we synthesize a getter and a setter. Let’s expand this macro to see the code that is generated for us:

public struct Person {
  public var id: String {
    get {
      self._storage.id
    }
  }
  public var name: String {
    get {
      self._storage.name
    }
    set {
      if Swift.isKnownUniquelyReferenced(&self._storage) == false {
        self._storage = self._storage.copy()
      }
      self._storage.name = newValue
    }
  }
  
  private final class _Storage: @unchecked Sendable {
    let id: String
    var name: String
    init(id: String, name: String) {
      self.id = id
      self.name = name
    }
    func copy() -> _Storage {
      _Storage(id: self.id, name: self.name)
    }
  }
  
  private var _storage: _Storage
  
  public init(id: String, name: String) {
    self._storage = _Storage(id: id, name: name)
  }
}

extension Person: CowBox {
  public func isIdentical(to other: Person) -> Bool {
    self._storage === other._storage
  }
}

All of this boilerplate to manage and access the underlying storage object reference is provided by the macro. The macro also provides a memberwise initializer. An isIdentical function is provided for quickly confirming two struct values point to the same storage object reference.

CowBox also knows how to provide support for some common Swift Protocols you might choose to adopt:

@CowBox public struct Person: CustomStringConvertible, Hashable, Codable {
  @CowBoxNonMutating public var id: String
  @CowBoxMutating public var name: String
}

If you adopt one of these protocols in your CowBox, the macro with synthesize the conformance for you. If you provide your own conformance, CowBox will respect the custom implementation you provided.

The following protocols are currently supported with CowBox:

  • CustomStringConvertible
  • Equatable
  • Hashable
  • Decodable
  • Encodable
  • Codable

Benchmarks

How does CowBox affect performance? How does CowBox improve CPU or memory usage?

Let's start with an experiment inspired by Jared Khan.[9] We’ll define a simple Swift Struct with ten 64-bit integers stored as properties:

struct StructElement {
  // A struct with about 80 bytes
  let a: Int64
  let b: Int64
  let c: Int64
  let d: Int64
  let e: Int64
  let f: Int64
  let g: Int64
  let h: Int64
  let i: Int64
  let j: Int64
}

A little quick math tells us every instance of this struct should need at least 640 bits (or 80 bytes) of memory.

Suppose we now build a CowBox version of this. What would that look like?

@CowBox struct CowBoxElement {
  @CowBoxNonMutating var a: Int64
  @CowBoxNonMutating var b: Int64
  @CowBoxNonMutating var c: Int64
  @CowBoxNonMutating var d: Int64
  @CowBoxNonMutating var e: Int64
  @CowBoxNonMutating var f: Int64
  @CowBoxNonMutating var g: Int64
  @CowBoxNonMutating var h: Int64
  @CowBoxNonMutating var i: Int64
  @CowBoxNonMutating var j: Int64
}

What does the memory usage look like now? We can assume that creating an instance of CowBoxElement from scratch should need at least 88 bytes of memory. We need 640 bits (or 80 bytes) to store the original ten properties. We also need (assuming we are running on a 64 bit platform) an additional 64 bits (or 8 bytes) for a pointer. That’s the memory of our first instance. What about our second instance (assuming we are copying without making any mutations)? The second instance needs a pointer (8 bytes), but the storage object reference itself is shared between both instances. Our two CowBox struct instances need (in the aggregate) at least 96 bytes, but our two simple Swift struct instances need at least 160 bytes.

Let’s continue with this experiment and see how these two types perform in large arrays. We’ll start by adding ten million instances of our simple Swift struct to a standard Swift.Array, and then try making one mutation on a copy of that array (we append one additional element). This mutation will cause Array to copy all N elements over to a new instance. We’ll use the Ordo One package for benchmarking memory and CPU.[10]

Memory (resident peak)
╒══════════════════════════════════════════════╀═════════╀═════════╀═════════╀═════════╀═════════╀═════════╀═════════╀═════════╕
β”‚ Test                                         β”‚      p0 β”‚     p25 β”‚     p50 β”‚     p75 β”‚     p90 β”‚     p99 β”‚    p100 β”‚ Samples β”‚
β•žβ•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•ͺ═════════β•ͺ═════════β•ͺ═════════β•ͺ═════════β•ͺ═════════β•ͺ═════════β•ͺ═════════β•ͺ═════════║
β”‚ Benchmarks:Array<StructElement> (M)          β”‚     665 β”‚     809 β”‚     809 β”‚     809 β”‚     809 β”‚     809 β”‚     809 β”‚     100 β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ Benchmarks:Array<StructElement> Copy (M)     β”‚    1531 β”‚    1609 β”‚    1609 β”‚    1609 β”‚    1609 β”‚    1609 β”‚    1609 β”‚     100 β”‚
β•˜β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•§β•β•β•β•β•β•β•β•β•β•§β•β•β•β•β•β•β•β•β•β•§β•β•β•β•β•β•β•β•β•β•§β•β•β•β•β•β•β•β•β•β•§β•β•β•β•β•β•β•β•β•β•§β•β•β•β•β•β•β•β•β•β•§β•β•β•β•β•β•β•β•β•β•§β•β•β•β•β•β•β•β•β•β•›

Time (total CPU)
╒══════════════════════════════════════════════╀═════════╀═════════╀═════════╀═════════╀═════════╀═════════╀═════════╀═════════╕
β”‚ Test                                         β”‚      p0 β”‚     p25 β”‚     p50 β”‚     p75 β”‚     p90 β”‚     p99 β”‚    p100 β”‚ Samples β”‚
β•žβ•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•ͺ═════════β•ͺ═════════β•ͺ═════════β•ͺ═════════β•ͺ═════════β•ͺ═════════β•ͺ═════════β•ͺ═════════║
β”‚ Benchmarks:Array<StructElement> (ΞΌs) *       β”‚   52704 β”‚   55869 β”‚   56132 β”‚   57311 β”‚   57475 β”‚   57770 β”‚   58006 β”‚     100 β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ Benchmarks:Array<StructElement> Copy (ΞΌs) *  β”‚   51749 β”‚   53281 β”‚   53772 β”‚   54067 β”‚   54690 β”‚   58491 β”‚   59300 β”‚     100 β”‚
β•˜β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•§β•β•β•β•β•β•β•β•β•β•§β•β•β•β•β•β•β•β•β•β•§β•β•β•β•β•β•β•β•β•β•§β•β•β•β•β•β•β•β•β•β•§β•β•β•β•β•β•β•β•β•β•§β•β•β•β•β•β•β•β•β•β•§β•β•β•β•β•β•β•β•β•β•§β•β•β•β•β•β•β•β•β•β•›

As we expected, the first array needs approximately 800MB of memory (ten million elements each needing 80 bytes). When we make a copy of that array and mutate our copy, the two arrays need (collectively) approximately 1600MB of memory.

Let's try this same experiment with a CowBox struct to see how this affects performance:

Memory (resident peak)
╒══════════════════════════════════════════════╀═════════╀═════════╀═════════╀═════════╀═════════╀═════════╀═════════╀═════════╕
β”‚ Test                                         β”‚      p0 β”‚     p25 β”‚     p50 β”‚     p75 β”‚     p90 β”‚     p99 β”‚    p100 β”‚ Samples β”‚
β•žβ•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•ͺ═════════β•ͺ═════════β•ͺ═════════β•ͺ═════════β•ͺ═════════β•ͺ═════════β•ͺ═════════β•ͺ═════════║
β”‚ Benchmarks:Array<CowBoxElement> (M)          β”‚    1054 β”‚    1057 β”‚    1057 β”‚    1057 β”‚    1057 β”‚    1057 β”‚    1057 β”‚     100 β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ Benchmarks:Array<CowBoxElement> Copy (M)     β”‚    1107 β”‚    1137 β”‚    1137 β”‚    1137 β”‚    1137 β”‚    1137 β”‚    1137 β”‚     100 β”‚
β•˜β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•§β•β•β•β•β•β•β•β•β•β•§β•β•β•β•β•β•β•β•β•β•§β•β•β•β•β•β•β•β•β•β•§β•β•β•β•β•β•β•β•β•β•§β•β•β•β•β•β•β•β•β•β•§β•β•β•β•β•β•β•β•β•β•§β•β•β•β•β•β•β•β•β•β•§β•β•β•β•β•β•β•β•β•β•›

Time (total CPU)
╒══════════════════════════════════════════════╀═════════╀═════════╀═════════╀═════════╀═════════╀═════════╀═════════╀═════════╕
β”‚ Test                                         β”‚      p0 β”‚     p25 β”‚     p50 β”‚     p75 β”‚     p90 β”‚     p99 β”‚    p100 β”‚ Samples β”‚
β•žβ•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•ͺ═════════β•ͺ═════════β•ͺ═════════β•ͺ═════════β•ͺ═════════β•ͺ═════════β•ͺ═════════β•ͺ═════════║
β”‚ Benchmarks:Array<CowBoxElement> (ΞΌs) *       β”‚  142901 β”‚  145752 β”‚  146670 β”‚  148111 β”‚  148898 β”‚  149946 β”‚  152233 β”‚     100 β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ Benchmarks:Array<CowBoxElement> Copy (ΞΌs) *  β”‚   30470 β”‚   31130 β”‚   31392 β”‚   32145 β”‚   32653 β”‚   35881 β”‚   44306 β”‚     100 β”‚
β•˜β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•§β•β•β•β•β•β•β•β•β•β•§β•β•β•β•β•β•β•β•β•β•§β•β•β•β•β•β•β•β•β•β•§β•β•β•β•β•β•β•β•β•β•§β•β•β•β•β•β•β•β•β•β•§β•β•β•β•β•β•β•β•β•β•§β•β•β•β•β•β•β•β•β•β•§β•β•β•β•β•β•β•β•β•β•›

What do we see here? Creating one array of our CowBox struct elements uses approximately 20 percent more memory (and is over three times slower) than creating one array of our simple Swift struct elements. The savings come when we need to perform our copies.

Creating one array of our simple Swift struct elements and performing a mutation on a copy needs approximately 1600MB of memory. Performing that same operation on an array of our CowBox struct elements needs only 1100MB of memory. For speed, we spent approximately 110ms creating (and mutating a copy of) our original struct array. We spent approximately 178ms creating (and mutating a copy of) our CowBox array.

Let assume we repeat this pattern many times. How can we expect this to perform after many copies? Here's what the (estimated) cumulative time spent looks like across multiple copy operations (with zero copies implying the time spent to create our first array):

Copies Struct Array CowBox Array
0 56.132ms 146.67ms
1 109.904ms 178.062ms
2 163.676ms 209.454ms
3 217.448ms 240.846ms
4 271.22ms 272.238ms
5 324.992ms 303.63ms

We spend a lot more time creating CowBox elements from scratch, but if those elements are large, and we expect to copy those elements several times, we quickly come out ahead when measuring the cumulative time spent on those operations.

Another side effect of the CowBox macro is we get a cheap and easy way to test for equality when two struct values wrap the same storage object reference. Instead of performing an equality comparison against all stored properties, if we know that two CowBox struct instances point to the same storage object reference, the instances must be equal by value. Let’s see how much time that can save us.

As discussed earlier, Swift.Array implements copy-on-write semantics: if one Array instance is copied (without any mutations), both of those instances point to the same storage object reference. This means that an equality check against those two references can return in constant time (without needing to linearly check through all N elements).[11] To opt-out of this behavior (and benchmark the performance of our elements), we create two different two different Array instances from scratch (created from the same elements).

When we try this experiment (comparing an Array built from simple Swift structs against an Array built from CowBox structs), we see that the Array built from simple Swift structs performs its equality check over five times slower than the Array built from CowBox structs.

Time (total CPU)
╒══════════════════════════════════════════════╀═════════╀═════════╀═════════╀═════════╀═════════╀═════════╀═════════╀═════════╕
β”‚ Test                                         β”‚      p0 β”‚     p25 β”‚     p50 β”‚     p75 β”‚     p90 β”‚     p99 β”‚    p100 β”‚ Samples β”‚
β•žβ•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•ͺ═════════β•ͺ═════════β•ͺ═════════β•ͺ═════════β•ͺ═════════β•ͺ═════════β•ͺ═════════β•ͺ═════════║
β”‚ Benchmarks:Array<StructElement> Equal (ΞΌs) * β”‚   29070 β”‚   29360 β”‚   29426 β”‚   29606 β”‚   29786 β”‚   29966 β”‚   30058 β”‚     100 β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ Benchmarks:Array<CowBoxElement> Equal (ΞΌs) * β”‚    4834 β”‚    5005 β”‚    5026 β”‚    5075 β”‚    5186 β”‚    5476 β”‚    5528 β”‚     100 β”‚
β•˜β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•§β•β•β•β•β•β•β•β•β•β•§β•β•β•β•β•β•β•β•β•β•§β•β•β•β•β•β•β•β•β•β•§β•β•β•β•β•β•β•β•β•β•§β•β•β•β•β•β•β•β•β•β•§β•β•β•β•β•β•β•β•β•β•§β•β•β•β•β•β•β•β•β•β•§β•β•β•β•β•β•β•β•β•β•›

Many more benchmarks are defined in the Benchmarks package. If you choose to experiment with CowBox in your own project, you can start with trying to benchmark your current simple Swift structs for memory and CPU. Then, try and benchmark those same structs using the CowBox macro. You would expect to measure the biggest performance improvements with complex struct elements that need to be copied many times through the course of your app lifecycle.

Known Issues

Please reference the CowBoxClient executable for examples of known issues and limitations of the macro (along with some suggested workarounds).

Please file a GitHub issue for any new issues or limitations you encounter.

Thanks!

Copyright

Copyright 2024 North Bronson Software

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.

  1. Value and Reference Types - Swift Blog - Apple Developer β†©οΈŽ

  2. Swift.org - Value And Reference Types In Swift β†©οΈŽ

  3. Choosing Between Structures and Classes | Apple Developer Documentation β†©οΈŽ

  4. swift/docs/OptimizationTips.rst at swift-5.10-RELEASE Β· apple/swift Β· GitHub β†©οΈŽ

  5. Reducing terminations in your app | Apple Developer Documentation β†©οΈŽ

  6. Immutable object - Wikipedia β†©οΈŽ

  7. mikeash.com: Friday Q&A 2015-04-17: Let's Build Swift.Array β†©οΈŽ

  8. Documentation β†©οΈŽ

  9. Swift’s Copy-on-write Optimisation | Jared Khan β†©οΈŽ

  10. GitHub - ordo-one/package-benchmark: Swift benchmark runner with many performance metrics and great CI support β†©οΈŽ

  11. swift/stdlib/public/core/Array.swift at release/5.10 Β· apple/swift Β· GitHub β†©οΈŽ

9 Likes

Shouldn’t isIdentical be ===?

1 Like

Maybe! If we did add a === operator to this struct… we would be "overloading" the existing convention that === is for comparing reference types (unless there is precedent that === can be used to compare value types). That might make sense… but it also might make sense to not give the impression that our CowBox struct presents as a reference type conforming to AnyObject.

For now… I thought launching the isIdentical function made the most sense… but if the community wants to add a === binary operator we could always discuss that. Thanks!

1 Like

Hi, great work!

Having a macro for cow :cow: make it easier to implement and less error prone. Moreover, Benchmarks package would be a very useful tool.

Some points to discuss:

  • Is there plans to support mutating functions?
  • besides CustomStringConvertible, Equatable, Hashable, Decodable, Encodable, Codable it is worth to add support of these protocols:
    • Sendable
    • CustomDebugStringConvertible
    • Error
    • LocalizedError
    • CustomNSError
  • What are your thoughts about making all var properties to have the behaviour of @CowBoxMutating by default? This way CowBoxMutating can be abolished and only @CowBoxNonMutating will be needed. So only the struct itself is needed to be marked with macro in most cases and @CowBoxNonMutating will then be rarely used. All together this will make code less verbose and clean.
  • final class _Storage in current implementation is private. There are cases when it is needed to be fileprivate/ internal / package
1 Like

Is there plans to support mutating functions?

I'm not sure I completely understand the question. WRT mutating functions defined on your struct (before applying the CowBox macro) those mutating functions will access the stored properties through the new getter and setter accessors (that forward to the storage object reference). There's not much out-of-the-box support from the CowBox itself other than synthesizing the correct accessors. If a mutating function that worked before CowBox is now broken after CowBox… that looks like a bug (please file a GitHub issue).

besides CustomStringConvertible, Equatable, Hashable, Decodable, Encodable, Codable it is worth to add support of these protocols:

  • Sendable
  • CustomDebugStringConvertible
  • Error
  • LocalizedError
  • CustomNSError

AFAIK these protocols do not come with an existing derived protocol synthesis from the compiler.[1] The CustomDebugStringConvertible does not come with a derived conformance, but the "default" implementation constructs a description string almost identical to the existing CustomStringConvertible description.[2] A client of the CowBox macro is free to add their own adoption and provide their own implementation of these protocols (or their own custom protocols) and the CowBox macro does not interfere. If CowBox does interfere with a protocol adoption in unexpected ways, please file a GitHub issue.

The Sendable conformance is interesting… CowBox leverages isUniquelyReferenced in a way that appears to be "safe" for concurrency (copying values across actor or thread boundaries leads to a strong retain).[3] Which implies that a CowBox can be Sendable. If anything about this implementation means a CowBox must not be Sendable, please file a GitHub issue.

What are your thoughts about making all var properties to have the behaviour of @CowBoxMutating by default?

As a matter of style, I prefer keeping the attributes and behavior explicit for now. If this is what blocks the community from adopting the macro, we can have a discussion about whether or not to try something different.

final class _Storage in current implementation is private. There are cases when it is needed to be fileprivate/ internal / package

The _Storage class and _storage variable are private by design. The synthesized getters and setters attached to the struct are what provide access (and those accessors respect the access control in your original declaration). There might be some more advanced use cases where an engineer would need to directly manipulate the underlying storage from outside the struct… but that would also mean some kind of "guard-rails" should need to be in place to defend against other engineers writing bugs (where storage mutates in unexpected ways leading to value types not presenting with consistent state). The use-case I'm optimizing for (in this first version) is keeping things easy and simple… but if the community wants to unlock those more advanced use cases we could always discuss that. Thanks!


  1. swift/include/swift/AST/Decl.h at swift-5.10-RELEASE Β· apple/swift Β· GitHub β†©οΈŽ

  2. swift/stdlib/public/core/OutputStream.swift at swift-5.10-RELEASE Β· apple/swift Β· GitHub β†©οΈŽ

  3. Michael Tsai - Blog - Why Swift’s Copy-on-Write Is Safe β†©οΈŽ

The previous post is mostly for opening a discussion, it was not for concrete desires.

I should be more concrete. There are two points about mutating funcs:

  • mutating func can access many stored properties. With current macro imp. access to each property leads to a call to Swift.isKnownUniquelyReferenced and can trigger .copy() of underlying RefBox which may be undesirable. A single call to Swift.isKnownUniquelyReferenced / RefBox.copy() is a more efficient pattern.
  • mutating func can never mutate any stored property by design, but calling such function triggers get / set observers of enclosing instance, which can be used for making a copy.
    It's also worth to mention properties with mutating get and nonmutating set for discussion.

I'm also for explicitness in general. For me personally it is not a blocker for adopting the library. My main rationale here is to make code less noisy providing a reasonable default.
The nearest reference is @Observable macro where @ObservationIgnored is used as a marker when properties shouldn't be tracked.

As an example I can refer to Swift Collections OrderedSet.ReserveCapacity method and some more usages of _HashTable.Storage class in several files.
There two points here:

  • it allows to split implementation to several files
  • it allows to make extensions for _Storage class
    • in the same file when _Storage is fileprivate
    • in other files when _Storage is internal
      This suggestion is also for opening the discussion with others. I'm not sure it is reasonable to use such macro in libraries similar to swift-collections. Particularly swift-collections is not very good example because _HashTable.Storage doesn't rely on stored properties and use only managedBuffer under the hood. I refer to it only as a file structure organisation example. May be there downsides and in such cases it is better to not use this macro.
      PS: for now, let's take the macro compilation speed out of the brackets and focus on implementation design only when different ACLs are used for _Storage.
      In my own practice I only need _Storage to be fileprivate.
1 Like

mutating func can access many stored properties. With current macro imp. access to each property leads to a call to Swift.isKnownUniquelyReferenced and can trigger .copy() of underlying RefBox which may be undesirable. A single call to Swift.isKnownUniquelyReferenced / RefBox.copy() is a more efficient pattern.

Hmm… if an engineer defines a CowBox struct with two (or more) mutable stored properties… and then defines a mutating "wrapper" function that mutates both properties… I'm not sure how this could lead to different behavior than if the two mutations had just been inlined directly… the first mutation attempted when the CowBox has a shared storage is what begins the copy:

@CowBox struct Person {
  @CowBoxNonMutating var id: String
  @CowBoxMutating var name: String
  @CowBoxMutating var location: String
  
  mutating func eraser() {
    self.name = ""
    self.location = ""
  }
}

let p1 = Person(id: "id", name: "name", location: "location")
var p2 = p1
p2.eraser()

Calling self.name = "" in this code leads to the isKnownUniquelyReferenced check against self._storage… that check leads to a new instance of _Storage assigned to the self._storage property. Calling self.location = "" then leads to the isKnownUniquelyReferenced check against self._storage. At that point self._storage only has one strong reference… it was just created. Which means we do not create a copy of self._storage (we don't have to).

There might be some concurrency issues to think through if a copy had been made "inline" with the execution of an asynchronous mutating func… but just from thinking through the synchronous example I don't see how this could ever lead to unnecessary copies being made.

mutating func can never mutate any stored property by design, but calling such function triggers get / set observers of enclosing instance, which can be used for making a copy.

I'm not sure I follow. The example is a function marked as mutating that does not actually mutate the property?

@CowBox struct Person {
  @CowBoxNonMutating var id: String
  @CowBoxMutating var name: String
  @CowBoxMutating var location: String
  
  mutating func eraser() {
    self.name = ""
    self.location = ""
  }
  
  mutating func nothing() {
    
  }
}

let p1 = Person(id: "id", name: "name", location: "location")
var p2 = p1
p2.nothing()
precondition(p1.isIdentical(to: p2))  //  true

This is ok… what would be the unexpected behavior with this example that would need to be guarded against or what additional functionality would need to be added before this behaves as expected?

It's also worth to mention properties with mutating get and nonmutating set for discussion.

The CowBox macro currently does not attach to computed properties (it only attaches to stored instance properties). I'm not sure this macro would make sense on a computed property… but if the community has more examples that this is important then we could discuss that.

If what you were suggesting is that a computed property (from before the CowBox was attached) could then touch the storage object reference directly… that behavior is not supported for now (the storage object reference is still considered private).

The nearest reference is @Observable macro where @ObservationIgnored is used as a marker when properties shouldn't be tracked.

I'm actually still a little confused about how the ObservationTracked should (or should not) be directly used by clients.[1] That conversation also contributed to me choosing to keep the CowBox property macros explicit for now. If we did move to making the Mutating property macro optional in the future (and make mutating property the default)… I would prefer to "not break" any code where an engineer chose to attach the Mutating property macro directly.

it allows to split implementation to several files
it allows to make extensions for _Storage class
in the same file when _Storage is fileprivate
in other files when _Storage is internal

Hmm… I'm still not completely sure I see the clear use case to enable engineers to access the Storage class (or storage property) directly. Is the missing functionality being able to directly force a copy of the storage from any arbitrary place (outside the setters where this copying is already enabled)?

If there was a specific use-case that needed the underlying storage to be accessed… my preference would be to expose extra functionality on the CowBox protocol (similar to how we expose isIdentical to check reference equality against the storage). Making that storage variable anything other than private isn't worth it IMO (since it means code outside the class can then touch it directly). I wouldn't even think of any reason for someone to touch the storage inside the CowBox struct itself (other than the from the codegen added from the macro)… but there's no formal way to enforce that AFAIK (other than the informal convention that the underscored variable should be considered off-limits).


  1. Unable to manually attach ObservationTracked. Is this intended behavior? β†©οΈŽ

Surely, we can keep unnecessary copies out of discussion for now as an edge case.
Lets focus on duplicated isKnownUniquelyReferenced calls:

@CowBox struct Person {
  @CowBoxMutating var firstName: String
  @CowBoxMutating var lastName: String
  @CowBoxMutating var middleName: String
  @CowBoxMutating var nickName: String
  @CowBoxMutating var birthDate: String

  @CowBoxMutating var deliveryAddress: String
  @CowBoxMutating var location: String
  @CowBoxMutating var phone: String
  @CowBoxMutating var contactEmail: String
  @CowBoxMutating var notifications: [String]
  @CowBoxMutating var permissions: [String]
  @CowBoxMutating var lastVisit: String

  mutating func updateWith(args: ...) {
    if condition1 {
      // erase all 12 properties
    } else if condition2 {
      lastVisit = ""
    } else {
      nickName = ""
    }
  }

  mutating func triggerEnclosingDidSet() {
    // no stored property mutations in fact
  }
}


final class Foo {
  var person: Person {
    get {}
    set {}
  }
}

let foo = Foo.init(...)
// 1)

foo. person.updateWith(args: ...)
// Up to 12 properties can be mutated resulting to 12 calls of `isKnownUniquelyReferenced`
// The overhead of conescutive `isKnownUniquelyReferenced` calls can be eliminated 
// and replaced by one call
// 2)

struct PersonA {
  var __uniquenessToken = NSObject()
  
  mutating func triggerEnclosingDidSet() {
    if !isKnownUniquelyReferenced(&__uniquenessToken) {
      __uniquenessToken = NSObject()
    }
  }
}

struct PersonB {
  var __uniquenessToken = NSObject()
  
  mutating func triggerEnclosingDidSet() {
  }
}

class Foo {
    private var _personA: PersonA
    var personA: PersonA {
      get {
        print("Foo.get personA \(_personA.__uniquenessToken)")
        return _personA
      }
      set {
        print("Foo.set personA \(_personA.__uniquenessToken) -> \(newValue.__uniquenessToken)")
        _personA = newValue
      }
    }
    
    private var _personB: PersonB
    var personB: PersonB {
      get {
        print("Foo.get personB \(_personB.__uniquenessToken)")
        return _personB
      }
      set {
        print("Foo.set personB \(_personB.__uniquenessToken) -> \(newValue.__uniquenessToken)")
        _personB = newValue
      }
    }

    init() {
      _personA = PersonA()
      _personB = PersonB()
    }
  }
    
  print("will call personA.triggerEnclosingDidSet()")
  foo.personA.triggerEnclosingDidSet()
  print("will call personB.triggerEnclosingDidSet()")
  foo.personB.triggerEnclosingDidSet()

// prints:
will call personA.triggerEnclosingDidSet()
Foo.get personA <NSObject: 0x60000000c4b0>
Foo.set personA <NSObject: 0x60000000c4b0> -> <NSObject: 0x60000000c570>
Foo.get personA <NSObject: 0x60000000c570>
will call personB.triggerEnclosingDidSet()
Foo.get personB <NSObject: 0x60000000c410>
Foo.set personB <NSObject: 0x60000000c410> -> <NSObject: 0x60000000c410>
Foo.get personB <NSObject: 0x60000000c410>

As we see the underlying object is changed after mutation of PersonA and is not changed after mutation of PersonB .

Without attaching @CowBox macro the call of .triggerEnclosingDidSet() can be described by the following steps:

  1. a get call is made resulting with an implicit copy of person instance.
  2. this copy is then mutated via call to .triggerEnclosingDidSet(). If Person instance have stored properties that are mutated, then we have two different by equality Person instances at this time. In other words two unique instances.
  3. the updated instance is set back to foo
    My point is that attaching @CowBox macro should possibly keep the same behaviour – a call to mutating func should make a unique copy even if there is no mutation of stored properties during this call.
    I'm not 100% sure it is a right default behaviour, though, and want know what do others think.

It is clearly stated in @Observable proposal that @ObservationTracked should not be normally used. It is an option for specific situations.
At the same time @ObservationIgnored is supposed to be normally used as it is a quite common case when some properties shouldn't be tracked.

Exactly. I'm bringing this up for discussion. It is interesting to hear other opinions about support of Computed properties and mutating get/ nonmutating set especially.
In my concrete case I have a couple of computed properties where some logic is performed inside set body. The computed property is fileprivate and its underlying underscored stored property is private. The update of underlying stored property must always be accompanied by recalculation of several other values. That is the reason it is wrapped by a computed property. So in this case macro potentially should not wrap the underlying stored property, it should wrap computed property.
Besides get / set there are also _read / _modify, that are more complicated and not officially suppoerted yet. For this reason I see them as only a future direction for now, while be interested in conversation .
To be clear, I do not insist on supporting of computed properties. I want to draw your attention to this example and what experience do others have. There is always an option to do cow manually as we do currently or make a library fork if manual work turns into a nightmare.

Some examples from Swift Collections:

// 0. Main File
  internal var _table: _HashTable? {
    get { __storage.map { _HashTable($0) } }
    set { __storage = newValue?._storage }
  }

  internal mutating func _ensureUnique() {
    if __storage == nil { return }
    if isKnownUniquelyReferenced(&__storage) { return }
    _table = _table!.copy()
  }

// 1. OrderedSet+ReserveCapacity.swift
internal mutating func _reserveCapacity(..) {
    _ensureUnique()
    if _reservedScale != reservedScale {
      __storage!.header.reservedScale = reservedScale
    }
}

// 2. OrderedSet+UnorderedView.swift
extension OrderedSet.UnorderedView: Equatable {
  public static func ==(left: Self, right: Self) -> Bool {
    if left._base.__storage != nil,
       left._base.__storage === right._base.__storage
    {
      return true
    }
    guard left._base.count == right._base.count else { return false }
    ...
    return true
  }
}
1 Like

Thanks for sharing your work. Actually, I and one of my colleagues are developing a functionality-wise identical package (mainly for the internal clients from our company for now) here: GitHub - WeZZard/COWMacro: A copy-on-write macro for Swift structs.

I haven't read the sources yet; by just reading the post I would like to raise some topics to discuss:

  1. Explicit @CowBoxMutating on stored properties is definitely way too verbose for the client developers. From our experience, they prefer solutions as least intrusive as possible.
  2. We use storage class to wrap an internal struct and rely on the compiler to synthesize protocol conformance on it, i.e. not expanding stored properties directly in the storage class. By doing so we can greatly save the effort imitating the expected behavior from compiler (while needing to fight a handful of related compiler bugs, especially when dealing with Equatable). However, we found that accessing a var struct (the actual "storage" in our solution) introduces swift_beginAccess overhead recently, and we are considering workarounds, including switching back to traditional direct class storage like your solution.
  3. private storage inhibits cross-module inlining and has a noticeable performance drop compared to non-COW counterparts in terms of reading of properties. We choose internal + @usableFromInline for optimal reading performance, since it is the most common use case. This may not be your concern though since your solution expands direct field accesses and does not use _read and _modify (which we expand for stored properties instead of get and set).
2 Likes

// Up to 12 properties can be mutated resulting to 12 calls of isKnownUniquelyReferenced
// The overhead of conescutive isKnownUniquelyReferenced calls can be eliminated
// and replaced by one call

Ahh… yes. This is correct. The CowBox macro (currently) offers no ability to "batch" this operation. One mutating function that sets N stored properties would lead to (potentially) N different calls to check reference count (unless the compiler has the option to perform some magic to optimize them down to just one). I have not benchmarked the performance of that isKnownUniquelyReferenced function… but if this becomes a bottleneck at scale then there could be a new API from CowBox to perform a batch operation.

a call to mutating func should make a unique copy even if there is no mutation of stored properties during this call.

Hmm… I believe I understand the concept you are describing… but I don't see any clear or compelling reason why that would be the preferred (or expected) default behavior. Could you think of an example where the current behavior (a mutating func that performs no mutations returns with an identical storage instance) would lead to any unexpected behavior from a client that expected this CowBox struct to behave like any other simple struct?

I and one of my colleagues are developing a functionality-wise identical package (mainly for the internal clients from our company for now) here: GitHub - WeZZard/COWMacro: A copy-on-write macro for Swift structs.

Ahh… wow! I did a search last month and didn't find any projects doing the same thing. TBH… I would like to see Swift Cow Types become a legit first-class citizen one day (official language support). No need for macros or codegen at that point!

We use storage class to wrap an internal struct and rely on the compiler to synthesize protocol conformance on it, i.e. not expanding stored properties directly in the storage class.

Ahh… interesting! I use stored variables directly on a class. No need for an extra struct (for now)!