It’s only a source break with the legacy “pre-any” P
existential type syntax. Once any P
was introduced you could form an existential type for any protocol (get it?) regardless of whether it has associated types or Self requirements, so it’s not source breaking to change a protocol in this way as long as any
usage is enforced.
I can see how that makes sense for the case where a new requirement would be the first use of Self
. I think my question was then more specifically if:
- a new requirement uses
Self
- an existing requirement already uses
Self
That's what wasn't clear for me reading the evolution doc. If those two conditions hold would that then not be a source breaking change… including for product engineers building on the legacy existential syntax:
struct S: Equatable { }
let eq: Equatable = S()
We never accepted bare Equatable
as a type so that’s fine too.
Hmm… this code builds for me with warnings instead of errors from 6.1 and Swift 6 language mode:
func f(_: Equatable) { }
`- warning: use of protocol 'Equatable' as a type must be written 'any Equatable'; this will be an error in a future Swift language mode
func main() {
struct S: Equatable { }
let eq: Equatable = S()
`- warning: use of protocol 'Equatable' as a type must be written 'any Equatable'; this will be an error in a future Swift language mode
f(eq)
}
Is the implication that this code should not break if the Equatable
protocol was changed with that new requirement?
I see, it was downgraded to a warning. Yes, adding new requirements to Equatable will not change existing code. However, I don’t think a “probably equal” requirement belongs on Equatable
. There are many possible ways to define “probable equality” and picking one “official” choice of axioms is difficult.
Maybe worth making it a base of Equatable? Being a separate protocol it could then be used without (and unrelated to) Equatable.
// bike shed name
protocol ProbablyEqual {
func probablyEqual(_ other: Self) -> Bool
}
protocol Equatable: ProbablyEqual { ... }
Unfortunately inserting a new base protocol is an ABI break.
BTW, "probably equal" name doesn't convey the right message. What is the "not" of it? "definitely equal"? "definitely different?". With "definitely equal" there's no confusion.
Example of using DefinitelyEqual to speed Set.contains operation up(pseudo code):
func contains(_ element: Element) -> Bool {
if count < 100 { // threshold
for v in all set elements {
if element.definitelyEqual(v) { return true }
}
}
// traditional path, calculate hash, etc
....
}
Hmm… I was kind of leaning in the direction of maybeDifferent
or mightHaveChanged
:
false
implies these two instances must be "equal" (in quotes)true
does not imply anything about "equality"
And then the "equality" concept means something more general than value-equality if this lives outside of Equatable
. My use-cases so far are all for putting this on types that already adopt Equatable
but also have some kind of implementation detail (like CoW storage) that make this kind of function a O(1)
operation.
Similarly for Array's contains (pseudocode):
func contains(_ element: Element) -> Bool {
for v in all array elements {
if element.definitelyEqual(v) { return true }
}
// traditional path:
for v in all array elements {
if element == v { return true }
}
}
The initial quick path that does a shallow check won't bring element contents into cache, could be a win in some situations.
watch the hands:
- "maybe different" ~> "maybe not equal" (obviously)
- "maybe not equal" ~> "maybe equal" (right?)
- "maybe equal" ~> "probably equal" (semantics)
and then we are back to what would it mean when something is "not ("probably equal")"
I think if you’re optimizing a contains() check at this level you really want to build a specialized data structure for your application instead of relying on magical behaviors of Array.
Always performing this definitelyEqual check for an Element type of Int for example is just pure overhead.
unless there's some (very quick) check along the lines of:
// the above shallow loop
// then:
if element type is purely bitwise comparable {
return false // we are done, nothing else left to check
}
// traditional path:
...
OTOH the "not" of "definitely equal" could be also confusing...
What is the opposite of "we know it's true"...
formal logic suggests "we don't know whether it's true or lie" but it could (and would) be confused with "we know it's a lie".
A hypothetical fast path for bitwise comparable types could avoid protocol witness calls inside the loop altogether, and would directly compare bytes in a vectorized loop. So it’s certainly plausible to have an implementation of contains() or similar that would check the condition first, and then execute one of two loops depending on the outcome.
Quick and dirty test suggests huge wins are possible in certain scenarios.
struct S: Equatable {
var x: Int
var y: Int
}
extension Array where Element == String {
func contains2(_ element: Element) -> Bool {
let a = unsafeBitCast(element, to: S.self)
for v in self {
let b = unsafeBitCast(v, to: S.self)
if a == b {
return true
}
}
for v in self {
if element == v { return true }
}
return false
}
}
let strings = (0 ..< 10_000).map { i in
var a = String(repeating: "a", count: 10_000)
a.append("\(i)")
return a
}
var a = strings[9000]
var start = Date()
let x = strings.contains(a)
let elapsed1 = Date().timeIntervalSince(start)
start = Date()
let y = strings.contains2(a)
let elapsed2 = Date().timeIntervalSince(start)
start = Date()
print(elapsed1 / elapsed2) // 1623
The other direction I was thinking was instead of figuring out how to express some concept of "might be changed" or "most certainly did not change" we could express "is identical".
For a copy-on-write data structure like Array
this naturally can map to both values wrapping "the same" storage reference. At this point… we are kind of "leaking" an implementation detail. Is it right for a product engineer to know this "value" type is actually kinda sorta also a reference behind the scenes? There's pros and cons… but it does help clean up the ambiguity about "might be changed".
If the API exposed isIdentical
it should hopefully be clear what true
and false
are trying to express. The product engineer could then use that true
or false
to express their "might be changed" or "definitely equal" semantics in their most preferred style.
There is also some public-but-underscored precedent in standard library. This function on String
seems to work similar to what we are talking about here:
AFAIK we are also planning to ship a isIdentical
function on Span
and RawSpan
from SE-0447.[1]
I think all structs that use the Copy on Write mechanism should conform to a protocol that provides the identity of the internal memory in which the data is stored.
Something like this:
protocol StorageIdentifiable<StorageID> {
associatedtype StorageID : Hashable
var storageID: Self.StorageID { get }
}
extension Array:StorageIdentifiable {
var storageID: ObjectIdentifier {
withUnsafeBytes { unsafeBitCast( $0.baseAddress, to: ObjectIdentifier.self) }
}
}
/*
extension Set:StorageIdentifiable {
var storageID: /* not possible today */
}
extension Dictionary:StorageIdentifiable {
var storageID: /* not possible today */
}
*/
let a = [1,2,3]
let b = a
print("Same storage = \( a.storageID == b.storageID )")
// print "Same storage = true"
In this way, not only does it become easy to perform comparisons such as the one required not only for arrays, but also for sets, dictionaries, etc., but, above all, to implement file persistence without saving the contents of container structures that refer to the same data multiple times.
It's an interesting idea… but also keep in mind that CoW is not exactly a contract that needs to be visible externally… it's an implementation choice on the part of the infra engineer. And some types that do CoW on "large" values might choose to save "small" values on the stack as a performance optimization. So a type like Data
might not always have the ability to return a storageID
.
This is great! My immediate and practical use-case example for wanting this API were for product engineers to use this check as an optimization going into SwiftUI… but if there are side-quests that can refactor internal standard library code that SGTM.
Why is it not possible today for sets / dictionaries?
Looks doable. Example with Set and String, Dictionary would be similar to Set:
protocol Identifiable<Identity> {
associatedtype Identity : Hashable
var identity: Self.Identity { get }
}
extension Set: Identifiable {
var identity: ObjectIdentifier {
unsafeBitCast(self, to: ObjectIdentifier.self)
}
}
extension String: Identifiable {
struct Identity: Hashable {
var a, b: UInt64
}
var identity: Identity {
unsafeBitCast(self, to: Identity.self)
}
}
let a: Set = [1,2,3]
let b = a
let c: Set = [1,2,3]
precondition(a.identity == b.identity)
precondition(a.identity != c.identity)
let d = String(cString: "short str")
let e = String(cString: "short str")
let f = e
let g = String(cString: "long long string string")
let h = String(cString: "long long string string")
let i = h
precondition(d.identity == e.identity) // same short strings are identical
precondition(f.identity == e.identity)
precondition(g.identity != h.identity) // same long strings are not identical
precondition(i.identity == h.identity)
The problem is that once you form the Identifier there is nothing keeping the original buffer alive, so it might be deallocated underneath you, in which case a subsequent allocation might reuse the same address and result in a false positive.