How to check two `Array` instances for identity equality in constant time?

vanvoorden · March 25, 2025, 12:13am

I'm looking for some public API on standard library collections that might deliver some concept of identity equality or "value might have changed" in constant time.

Somewhat inspired by the recent changes to Observation in 6.2 to condition observation events on a value equality check for certain observable values,^[1] I was trying to then think about alternatives to value equality that might still optimize performance from clients of this observable type.

Suppose we test two Array values for equality. These array values are copy-on-write data structures. An important optimization is to block our linear-time value equality operation with a constant time identity equality operation.^[2] If the identity of the underlying buffer reference is equal these two Array instances must be equal by value. We return true in constant time: no need to iterate through all N elements.

The underlying buffer is a private property… but I'm wondering if there are any other creative ways I might leverage the identity of the buffer to perform a "identity equality" check on two Array instances.

What is the use case? If an observable type is publishing notifications on an Array property and I set a new value I can choose to try and block the observation event on a constant time check instead of linear. If these two values are not equal by identity I could fire an observation event. At that point the client of my observable (which might be SwiftUI) might then choose to perform a linear time value equality check on these two values. But I have optimized what would have been two linear time equality checks (one from the Observation infra and one from the SwiftUI infra) down to one constant time operation and one linear time operation. This might have some big perf wins at scale.

This could benefit not just Observable Array properties… Dictionary and Set and any additional copy-on-write data structures that offered some public ability to check for identity equality could deliver similar performance optimizations.

Does this sound familiar to any previous discussions that might have taken place around these collection types from standard library? Could there be any public APIs (like withUnsafeBufferPointer or withUnsafeBytes) that might provide some stable and reliable ability to check for equality by identity?

dnadoba · March 25, 2025, 4:31am

I have wanted something like this already a couple of times for similar purposes. I think it would make sense to have some kind of protocol or similar abstraction that would allow to check for this. It would potentially make sense to reuse the trip equal === operator for that which today only works for AnyObject

bbrk24 · March 25, 2025, 4:40am

The only stored member of Array is its buffer, so you should be able to unsafeBitCast to Int and compare those:

  1> MemoryLayout<Array<Any>>.size
$R0: Int = 8
  2> let a = [1]
a: [Int] = 1 value {
  [0] = 1
}
  3> let b = a
b: [Int] = 1 value {
  [0] = 1
}
  4> unsafeBitCast(a, to: Int.self) == unsafeBitCast(b, to: Int.self)
$R1: Bool = true
  5> let c = [1]
c: [Int] = 1 value {
  [0] = 1
}
  6> unsafeBitCast(a, to: Int.self) == unsafeBitCast(c, to: Int.self)
$R2: Bool = false

After playing around a bit, I discovered that the pointed-to buffer contains, in order:

Two words that are inscrutable when viewed as Ints, probably the class header containing refcount and such
One word for size
One word for capacity
The contents

jflan · March 25, 2025, 4:56am

Array does that check in its implementation of ==

github.com/swiftlang/swift

stdlib/public/core/Array.swift

2f970b69d


      
          /// - Parameters:
          ///   - lhs: An array to compare.
          ///   - rhs: Another array to compare.
          @inlinable
          public static func ==(lhs: Array<Element>, rhs: Array<Element>) -> Bool {
            let lhsCount = lhs.count
            if lhsCount != rhs.count {
              return false
            }
          
            // Test referential equality.
            if unsafe lhsCount == 0 || lhs._buffer.identity == rhs._buffer.identity {
              return true
            }
          
          
            _internalInvariant(lhs.startIndex == 0 && rhs.startIndex == 0)
            _internalInvariant(lhs.endIndex == lhsCount && rhs.endIndex == lhsCount)
          
            // We know that lhs.count == rhs.count, compare element wise.
            for idx in 0..<lhsCount {

vanvoorden · March 25, 2025, 5:40am

Right… but I want that value. The Array instances are performing a check for identity equality as an optimization to return early from their check for value equality. I want to know if identity equality is false.

jflan · March 25, 2025, 2:09pm

As an academic exercise it's an interesting question, but if I were guessing I'd bet that the use case you outlined above would actually result in a performance degradation in common scenarios given Arrays existing internal fast paths.

The only case that's likely to perform a non-trivial linear time check is:

two long arrays of the same length (first fast path exits if lhs.count != rhs.count)
with different underlying buffers
where the first difference in their content is far enough into the array that looping over the matching prefixes takes relevant time

In the use case you described above you could skip that third bullet point, but at the cost of potentially causing the downstream consumer to do extra work. Let's say the downstream consumer of the updates performs a network request when there's an updated value. If you send a change event that's not actually a change the consumer either a) has to always recheck equality itself before sending the request because it might get false updates, or b) assume that the observation stream only emits when values actually change, which is the default behavior of the API, and fire off the network request for every update even if they're not actual changes because the linear time check was skipped.

If you're making an update in your @Observable model that you know, or strongly suspect, is going to cause a change you can explictly skip the check in that case and fire the update event yourself.

@Observable
class TestModel {

  var array = [1, 2, 3]

  func changeArray() {
    // Skip @Observable's equality check and directly trigger the mutation ourself
    withMutation(keyPath: \.array) {
      _array = [2, 3, 4]
    }
  }
}

Ben_Cohen · March 25, 2025, 2:47pm

Can I recap the requirement again to make sure I follow:

You want a "maybe different" comparison, where "definitely equal" is a fast path, and "maybe different" will fire sometimes even when nothing differs.

The motivation being, an observer might be about to do some linear operation anyway when values have changed, and that linear operation is fairly similar in effort to an equality check in cases of false change notifications, so they might as well do it even when nothing has changed rather than pay the price of first checking for definitive inequality, then performing that work.

Slava_Pestov · March 25, 2025, 4:59pm

I am not completely certain but I think the following is safe and correct:

func definitelyEqual<T>(_ lhs: [T], _ rhs: [T]) -> Bool {
  return lhs.withUnsafeBufferPointer { x in
    return rhs.withUnsafeBufferPointer { y in
      return x.baseAddress == y.baseAddress && x.count == y.count
    }
  }
}

let a = [1, 2, 3]

print(definitelyEqual(a, a))

jrose · March 25, 2025, 5:34pm

I thought of that too, but I’m still confused about what you do with the answer. It’s not exactly equivalent to “when did copy-on-write happen” because arrays need to get reallocated for other reasons sometimes (though maybe you know you’re not doing those, it won’t happen capriciously), and if you were “just” tracking buffer addresses to give an Array buffer identity, (1) please don’t, and (2) you wouldn’t phrase that in terms of a comparison.

EDIT: I understand now, I did not read carefully enough!

Cyberbeni · March 25, 2025, 5:53pm

It's only correct until a future Swift version changes the behavior to give a different virtual address every time an unsafe pointer is created (probably only in debug mode) to avoid escaping those pointers from their intended scope. In which case, you could get a different virtual address inside the 2 callbacks even when the real address is the same. (If there is no special handling of this case in the compiler)

The real world use case for such a change would be to catch possible errors related to calling into C code where the implicitly created pointer you pass in gets saved and used later. In this case you need to explicitly create an UnsafeMutablePointer and initialize it instead of implicitly creating a new one every time you pass a pointer to a struct into the C function.

vanvoorden · March 25, 2025, 6:15pm

Sure… but SwiftUI is (we think) eagerly performing the value equality check itself through its diffing reconciliation process. That comes from SwiftUI infra… it's not something product engineers always have an easy and natural ability to opt-out or control.

vanvoorden · March 25, 2025, 6:18pm

Correct. And this is something I might also like to generalize across arbitrary copy-on-write data structures. Some ability to trade accuracy for speed and deliver a "maybe different" that might return true even if both instances are equal by value.

hassila · March 25, 2025, 6:56pm

Analogous to the semantics of a bloom filter… which is very useful.

Slava_Pestov · March 25, 2025, 7:43pm

While this is technically possible, it seems that this would make Array.withUnsafeBufferPointer into an O(n) operation, which isn’t likely to happen, even in debug builds.

EDIT: and for a non-Copyable type, this would be completely impossible, because you can’t make a copy of the buffer.

David_Smith · March 25, 2025, 8:04pm

This seems like a pretty reasonable, if very niche, thing to want, and is certainly quite implementable. My main concern is how to name it in a way that doesn't lead to people who aren't looking for it accidentally stumbling over it and getting something they didn't expect.

Cyberbeni · March 25, 2025, 8:10pm

Let's say the array's buffer is at the address range of 0x20-0x40. We tell the OS to create a new virtual memory address range for us that points to this address range, it gives us 0x2020-0x2040. The developer reads/writes to 0x2020-0x2040 inside the withUnsafeBufferPointer block (which the OS translates into writes to the same physical address that the 0x20-0x40 virtual address range points to). Once the block ends, we tell the OS to invalidate the 0x2020-0x2040 address range and crash our application if we try to access it again.

There would be no copying, so it would work with ~Copyable just as well.

Slava_Pestov · March 25, 2025, 8:15pm

This would only be valid if the buffer’s element type is bitwise movable. Otherwise, if the interpretation of a value depends on its address, you can’t do this anymore. (That rules out weak references, mutexes, imported C++ types that contain internal pointers and have a non-trivial move constructor, etc.)

Also, changing the address space mapping is a pretty expensive operation, and like copying the buffer, it’s not something you want to do in withUnsafeBufferPointer which should be basically free.

vanvoorden · April 2, 2025, 7:33pm

Hmm… I do like the idea of a common protocol that types in the standard library can conform to… but if we add that protocol and those conformances to the standard library would that not lead to an ABI breaking change?

Ahh… it looks like since that protocol would be introduced in the same release it might be ok:

github.com/swiftlang/swift

docs/LibraryEvolution.rst?plain=1

f61a3c941


      
          - Adding a conformance to an ABI-public protocol *that was introduced in the
            same release* (see below).

Slava_Pestov · April 2, 2025, 8:25pm

This part of the document is out of date. We support availability checking for protocol conformances as of Swift 5.5. You can declare the conformance in an extension and then you can give it a narrower availability range than the conforming type or protocol itself. And if the protocol is also new there is no problem in any case as you said.

vanvoorden · April 2, 2025, 8:48pm

Hmm… I'm also wondering then what might be possible on an existing protocol:

github.com/swiftlang/swift

docs/LibraryEvolution.rst?plain=1

41f7ac3e7


      
          - A new non-type requirement may be added (with the appropriate availability),
            as long as it has an unconstrained default implementation. If the requirement
            uses ``Self`` and the protocol has no other requirements using ``Self`` and
            no associated types, this is a `binary-compatible source-breaking change` due
            to restrictions on protocol value types.

Would this imply that we think code like this would not be ABI breaking?

protocol Equatable {
  // existing stuff
  ...
  TODO: add availability checking
  func maybeDifferent(to: Self) -> Bool
}

extension Equatable {
  // default implementation
  func maybeDifferent(to: Self) -> Bool { true }
}

The library evolution doc says that a requirement that uses Self is source breaking if the protocol does not have other requirements using Self… but I'm not sure that implies that a requirement that uses Self is not source breaking if the protocol does have other requirements using Self (and Equatable already depends on Self for the == operator).