I'm looking for some public API on standard library collections that might deliver some concept of identity equality or "value might have changed" in constant time.
Somewhat inspired by the recent changes to Observation in 6.2 to condition observation events on a value equality check for certain observable values,[1] I was trying to then think about alternatives to value equality that might still optimize performance from clients of this observable type.
Suppose we test two Array values for equality. These array values are copy-on-write data structures. An important optimization is to block our linear-time value equality operation with a constant time identity equality operation.[2] If the identity of the underlying buffer reference is equal these two Array instances must be equal by value. We return true in constant time: no need to iterate through all N elements.
The underlying buffer is a private property… but I'm wondering if there are any other creative ways I might leverage the identity of the buffer to perform a "identity equality" check on two Array instances.
What is the use case? If an observable type is publishing notifications on an Array property and I set a new value I can choose to try and block the observation event on a constant time check instead of linear. If these two values are not equal by identity I could fire an observation event. At that point the client of my observable (which might be SwiftUI) might then choose to perform a linear time value equality check on these two values. But I have optimized what would have been two linear time equality checks (one from the Observation infra and one from the SwiftUI infra) down to one constant time operation and one linear time operation. This might have some big perf wins at scale.
This could benefit not just Observable Array properties… Dictionary and Set and any additional copy-on-write data structures that offered some public ability to check for identity equality could deliver similar performance optimizations.
Does this sound familiar to any previous discussions that might have taken place around these collection types from standard library? Could there be any public APIs (like withUnsafeBufferPointer or withUnsafeBytes) that might provide some stable and reliable ability to check for equality by identity?
I have wanted something like this already a couple of times for similar purposes. I think it would make sense to have some kind of protocol or similar abstraction that would allow to check for this. It would potentially make sense to reuse the trip equal === operator for that which today only works for AnyObject
The only stored member of Array is its buffer, so you should be able to unsafeBitCast to Int and compare those:
1> MemoryLayout<Array<Any>>.size
$R0: Int = 8
2> let a = [1]
a: [Int] = 1 value {
[0] = 1
}
3> let b = a
b: [Int] = 1 value {
[0] = 1
}
4> unsafeBitCast(a, to: Int.self) == unsafeBitCast(b, to: Int.self)
$R1: Bool = true
5> let c = [1]
c: [Int] = 1 value {
[0] = 1
}
6> unsafeBitCast(a, to: Int.self) == unsafeBitCast(c, to: Int.self)
$R2: Bool = false
After playing around a bit, I discovered that the pointed-to buffer contains, in order:
Two words that are inscrutable when viewed as Ints, probably the class header containing refcount and such
Right… but I want that value. The Array instances are performing a check for identity equality as an optimization to return early from their check for value equality. I want to know if identity equality is false.
As an academic exercise it's an interesting question, but if I were guessing I'd bet that the use case you outlined above would actually result in a performance degradation in common scenarios given Arrays existing internal fast paths.
The only case that's likely to perform a non-trivial linear time check is:
two long arrays of the same length (first fast path exits if lhs.count != rhs.count)
with different underlying buffers
where the first difference in their content is far enough into the array that looping over the matching prefixes takes relevant time
In the use case you described above you could skip that third bullet point, but at the cost of potentially causing the downstream consumer to do extra work. Let's say the downstream consumer of the updates performs a network request when there's an updated value. If you send a change event that's not actually a change the consumer either a) has to always recheck equality itself before sending the request because it might get false updates, or b) assume that the observation stream only emits when values actually change, which is the default behavior of the API, and fire off the network request for every update even if they're not actual changes because the linear time check was skipped.
If you're making an update in your @Observable model that you know, or strongly suspect, is going to cause a change you can explictly skip the check in that case and fire the update event yourself.
@Observable
class TestModel {
var array = [1, 2, 3]
func changeArray() {
// Skip @Observable's equality check and directly trigger the mutation ourself
withMutation(keyPath: \.array) {
_array = [2, 3, 4]
}
}
}
Can I recap the requirement again to make sure I follow:
You want a "maybe different" comparison, where "definitely equal" is a fast path, and "maybe different" will fire sometimes even when nothing differs.
The motivation being, an observer might be about to do some linear operation anyway when values have changed, and that linear operation is fairly similar in effort to an equality check in cases of false change notifications, so they might as well do it even when nothing has changed rather than pay the price of first checking for definitive inequality, then performing that work.
I thought of that too, but I’m still confused about what you do with the answer. It’s not exactly equivalent to “when did copy-on-write happen” because arrays need to get reallocated for other reasons sometimes (though maybe you know you’re not doing those, it won’t happen capriciously), and if you were “just” tracking buffer addresses to give an Array buffer identity, (1) please don’t, and (2) you wouldn’t phrase that in terms of a comparison.
EDIT: I understand now, I did not read carefully enough!
It's only correct until a future Swift version changes the behavior to give a different virtual address every time an unsafe pointer is created (probably only in debug mode) to avoid escaping those pointers from their intended scope. In which case, you could get a different virtual address inside the 2 callbacks even when the real address is the same. (If there is no special handling of this case in the compiler)
The real world use case for such a change would be to catch possible errors related to calling into C code where the implicitly created pointer you pass in gets saved and used later. In this case you need to explicitly create an UnsafeMutablePointer and initialize it instead of implicitly creating a new one every time you pass a pointer to a struct into the C function.
Sure… but SwiftUI is (we think) eagerly performing the value equality check itself through its diffing reconciliation process. That comes from SwiftUI infra… it's not something product engineers always have an easy and natural ability to opt-out or control.
Correct. And this is something I might also like to generalize across arbitrary copy-on-write data structures. Some ability to trade accuracy for speed and deliver a "maybe different" that might return true even if both instances are equal by value.
While this is technically possible, it seems that this would make Array.withUnsafeBufferPointer into an O(n) operation, which isn’t likely to happen, even in debug builds.
EDIT: and for a non-Copyable type, this would be completely impossible, because you can’t make a copy of the buffer.
This seems like a pretty reasonable, if very niche, thing to want, and is certainly quite implementable. My main concern is how to name it in a way that doesn't lead to people who aren't looking for it accidentally stumbling over it and getting something they didn't expect.
Let's say the array's buffer is at the address range of 0x20-0x40. We tell the OS to create a new virtual memory address range for us that points to this address range, it gives us 0x2020-0x2040. The developer reads/writes to 0x2020-0x2040 inside the withUnsafeBufferPointer block (which the OS translates into writes to the same physical address that the 0x20-0x40 virtual address range points to). Once the block ends, we tell the OS to invalidate the 0x2020-0x2040 address range and crash our application if we try to access it again.
There would be no copying, so it would work with ~Copyable just as well.
This would only be valid if the buffer’s element type is bitwise movable. Otherwise, if the interpretation of a value depends on its address, you can’t do this anymore. (That rules out weak references, mutexes, imported C++ types that contain internal pointers and have a non-trivial move constructor, etc.)
Also, changing the address space mapping is a pretty expensive operation, and like copying the buffer, it’s not something you want to do in withUnsafeBufferPointer which should be basically free.
Hmm… I do like the idea of a common protocol that types in the standard library can conform to… but if we add that protocol and those conformances to the standard library would that not lead to an ABI breaking change?
Ahh… it looks like since that protocol would be introduced in the same release it might be ok:
This part of the document is out of date. We support availability checking for protocol conformances as of Swift 5.5. You can declare the conformance in an extension and then you can give it a narrower availability range than the conforming type or protocol itself. And if the protocol is also new there is no problem in any case as you said.
The library evolution doc says that a requirement that uses Self is source breaking if the protocol does not have other requirements using Self… but I'm not sure that implies that a requirement that uses Selfis not source breaking if the protocol does have other requirements using Self (and Equatable already depends on Self for the == operator).