Hi! Here is a pitch derived from this discussion:
Distinguishable
protocol and isIdentical
Function for Quick Comparisons
Introduction
I propose a new Distinguishable
protocol and isIdentical
function for determining if two instances are distinguishable in constant-time.
Motivation
Suppose we have some code that listens to values from an AsyncSequence
. Every value received from the AsyncSequence
is then used to perform some work that scales linearly with the size of the Element
:
func doLinearOperation<T>(with: T) {
// perform some operation
// scales linearly with T
}
func f1<S>(sequence: S) async throws where S: AsyncSequence {
for try await element in sequence {
doLinearOperation(with: element)
}
}
Here we perform a linear operation on every value received from sequence
. How much performance does this use?
Suppose we know that the work performed in doLinearOperation
is only necessary when element
is not equal (here we define "equal" to be "value equality"). The first call to doLinearOperation
is important, and the next calls to doLinearOperation
are only important if element
is not equal-by-value to the last element
that was used to perform doLinearOperation
.
When we know that Element
conforms to Equatable
we can choose to "memoize" our values before we perform doLinearOperation
:
func f2<S>(sequence: S) async throws where S: AsyncSequence, S.Element: Equatable {
var oldElement: S.Element?
for try await element in sequence {
if oldElement != element {
oldElement = element
doLinearOperation(with: element)
}
}
}
When our sequence
produces many elements that are equal-by-value, "eagerly" passing that element to doLinearOperation
performs more work than necessary. Performing a check for value-equality before we pass that element to doLinearOperation
saves us the work from performing more doLinearOperation
than necessary, but we have now traded performance in a different direction. Because we know that the work performed in doLinearOperation
scales linearly with the size of the Element
, and we know that the ==
operator also scales linearly with the size of the Element
, we now perform two linear operations whenever our sequence
delivers a new value that is not equal-by-value to the previous input to doLinearOperation
.
At this point our product engineer has to make a tradeoff: do we "eagerly" perform the call to doLinearOperation
without a preflight check for value equality on the expectation that sequence
will produce many non-equal values, or do we perform the call to doLinearOperation
with a preflight check for value equality on the expectation that sequence
will produce many equal values?
There is a third path forward… a "quick" check against elements that returns in constant-time and guarantees these types must be equal by value.
Prior Art
Swift.String
already ships a public-but-underscored API that returns in constant time:[1]
extension String {
/// Returns a boolean value indicating whether this string is identical to
/// `other`.
///
/// Two string values are identical if there is no way to distinguish between
/// them.
///
/// Comparing strings this way includes comparing (normally) hidden
/// implementation details such as the memory location of any underlying
/// string storage object. Therefore, identical strings are guaranteed to
/// compare equal with `==`, but not all equal strings are considered
/// identical.
///
/// - Performance: O(1)
@_alwaysEmitIntoClient
public func _isIdentical(to other: Self) -> Bool {
self._guts.rawBits == other._guts.rawBits
}
}
We don't see this API currently being used in standard library, but it's possible this API is already being used to optimize performance in private frameworks from Apple.
Proposed Solution
Many types in Swift and Foundation are "copy-on-write" data structures. These types present as value types, but can leverage a reference to some shared state to optimize for performance. When we copy this value we copy a reference to shared storage. If we perform a mutation on a copy we can preserve value semantics by copying the storage reference to a unique value before we write our mutation: we "copy" on "write".
This means that many types in Swift and Foundation already have some private reference that can be checked in constant-time to determine if two values are identical. Because these types copy before writing, if two values are identical by their shared storage must be equal by value.
Suppose our Element
conformed to a Distinguishable
protocol. This Element
now adopts a function that can return in constant time if two values are identical and must be equal by-value
We can now refactor our operation on AsyncSequence
to take advantage of this:
func f3<S>(sequence: S) async throws where S: AsyncSequence, S.Element: Distinguishable {
var oldElement: S.Element?
for try await element in sequence {
if oldElement.isIdentical(to: element) == false {
oldElement = element
doLinearOperation(with: element)
}
}
}
What has this done for our performance? We know that doLinearOperation
performs a linear operation over element
. We also know that isIdentical
returns in constant-time. If isIdentical
returns true
we skip performing doLinearOperation
. If isIdentical
returns false
we perform doLinearOperation
, but this is now one linear operation. We will potentially perform this linear operation even if the element
returned is equal by-value, but since the preflight check to confirm value equality was itself a linear operation, we now perform one linear operation instead of two.
Detailed Design
Here is a new protocol defined in Standard Library:
@available(SwiftStdlib 6.3, *)
protocol Distinguishable {
/// - Performance: O(1)
func isIdentical(to other: Self) -> Bool
}
Because Distinguishable
is similar but orthogonal to value equality, we give infra engineers the ability to define what "identity equality" means for them and the types that adopt Distinguishable
.
The most common adoptions would be on Standard Library and Foundation types that are copy-on-write values that also conform to Equatable
. Examples include Array
and Dictionary
. If two Array
instances return true
for isIdentical
, then these two Array
instances must be equal by-value.
Source Compatibility
This code is additive. The protocol definition is guarded by available
. The adoptions on Standard Library and Foundation types are guarded by available
.
Impact on ABI
Introducing a new protocol and adopting that protocol on existing types does not have to be ABI breaking. The implication is that we want to "get it right" and settle on all the types that should adopt Distinguishable
because if we ship Distinguishable
in 6.3 and then decide that an existing type should adopt Distinguishable
in 6.4 that would break ABI.
Determining exactly what types from Standard Library and Foundation should adopt Distinguishable
should block the landing of Distinguishable
, but does not necessarily need to block the design review of Distinguishable
itself. We can follow up on a design review of Distinguishable
with an audit of Standard Library and Foundation to agree of what types should adopt Distinguishable
.
Future Directions
Engineers outside of Swift and Foundation also ship copy-on-write data structures that conform to Equatable
. An example is TreeDictionary
from swift-collections
. We would like engineers to be able to easily adopt Distinguishable
on their own types.
Alternatives Considered
Could we "overload" the ===
operator from AnyObject
? Because Distinguishable
makes no promises or assumptions about the nature of the type itself, this proposal recommends keeping the ===
operator only on AnyObject
types.
Instead of publishing an isIdentical
function which implies two types must be equal, could we think of things from the opposite direction? Could we publish a maybeDifferent
function which implies two types might not be equal? This then introduces some potential ambiguity for product engineers: to what extent does maybeDifferent
imply "probably different"? This ambiguity could be settled with extra documentation on the protocol, but isIdentical
solves that ambiguity up-front.
Without breaking ABI, we could add the isIdentical
function directly to Equatable
. Because we expect most types that would adopt Distinguishable
would return true
to indicate two instances must be equal by-value, there are arguments that this should itself belong in Equatable
. At this point we have now coupled these two concepts and reduced the flexibility available to product engineers. A "pure" value type might conform to Equatable
without any underlying reference that can return true
in constant time to define identity equality in constant time. The "default" behavior on these types would be to always return false
for isIdentical
: we can't make any constant-time judgement about any potential value-equality.
Acknowledgments
Thanks @dnadoba for suggesting this isIdentical
function should exist on a new protocol.
Thanks @ben_cohen for helping to think through and generalize the original use-case and problem-statement.
Thanks @Slava_Pestov for helping to investigate ABI stability implications of a new protocol.
Thanks @lorentey for shipping the optimization on Swift.String
that shows us a precedent for this kind of operation.[1:1]