To be clear, though, the semantic guarantees of standard library APIs go through Swift Evolution. For example, documenting that Swift's sorting algorithm is stable (which it already was) is the subject of SE-0372. If Array is to explicitly document additional semantics, that would go through Evolution review.
Fair enough. If a library maintainer working on Array creates a new evolution proposal for stronger semantic guarantees then they can choose to cross that bridge when they come to it. I'm not authoring this proposal with the intention of blocking those stronger guarantees (whether or not they go through evolution).
But I would probably push back on library maintainers working on Array choosing weaker semantic guarantees. And if those library maintainers submitted those weaker guarantees as an evolution proposal I would probably vote "no" based on the information I have today.
The proposal basically (sorta?) has this already, since each doc comment has a note about the method comparing hidden details ”such as the underlying array storage object”.
/// Comparing arrays this way includes comparing (normally) hidden
/// implementation details such as the memory location of any underlying
/// array storage object. Therefore, identical arrays are guaranteed to
/// compare equal with `==`, but not all equal arrays are considered
/// identical.
I just think it would be good to lift that part, with a more pedagogical phrasing, into the summary line for the documentation and be more explicit about it.
”Two arrays/dictionaries/sets/etc are trivially identical when they share the same storage” is much easier to understand both in terms of what it does, but also in terms of when I might want to use it.
EDIT: In fact, to me as someone who has wanted this API for a long time, ”Two arrays are trivially identical if they share the same storage” is a much more useful guarantee than ”Two arrays are trivially identical if there is no way to distinguish between them”, which is the current phrasing.
Why do these all need to share a single name if it's not a protocol? Name them according to their implementation details, e.g. Array can have sharesSameStorageAs(). This solves all the issues with naming in this thread.
That's a fair question. I'd give you two reasons why we wouldn't want to do that:
As a general rule, an API consumer should be concerned with what an API does rather than how an API is implemented. So we would also want to avoid a name that bakes in the specifics of the current implementation. Let's say in the future we figure out a way to improve the algorithm such that we can efficiently check for identicality (is that a word?) even when the backing storage differs. We'd then either:
Be stuck with sharesSameStorageAs() despite it not being an accurate name, or
Need to deprecate the existing name and come up with a new name appropriate for the new algorithm.
Having different names for these operations depending on which type you're using increases the mental load imposed on developers using them. "I need the is-there-a-flower-yet function, what's it called on AppleTree?" vs. "I need the is-there-a-flower-yet function, it's called isBlooming()".
A "Future Direction" from SE-0453 was a potential SmallArray built on a potential FixedCapacityArray and a potential HypoArray.
Once we have FixedCapacityArray and some hypothetical noncopyable heap allocated
array type (which SE-0437 Noncopyable Standard Library Primitives
dons as HypoArray as a placeholder), it should be very trivial to define a SmallArray type similar to the one found in LLVM APIs llvm::SmallVector.
public enum SmallArray<let Capacity: Int, Element: ~Copyable>: ~Copyable {
case small(FixedCapacityArray<Capacity, Element>)
case large(HypoArray<Element>)
}
which would act as an inline allocated array until one out grew the inline
capacity and would fall back to a dynamic heap allocation.
That's a lot of "potentials"… but I guess it is theoretically possible that at some point in the future Swift.Array could maybe implement its storage with inline values for "smol" arrays and then heap buffers for "large" arrays. Which would then imply that is[Trivially]Identical(to:) performs an O(1) comparison of a buffer pointer for large arrays and an O(1) bitwise comparison for smol arrays — which is O(1) assuming of course the "k" value threshold when an array is no longer smol is fixed and defined by the library maintainer.
I'm also of the opinion that "identical" is confusing language, and not well defined in this proposal.
The proposal seems to be motivated by performance concerns, but isn't this what Hashable is for?
I would like to note these are not technically "equality" at a program-wide level, or at best, they are "equality" only from a particular perspective at a high level of abstraction; at lower levels of abstraction these are kinds of equivalence. That is, they are situations where the value space has deliberately been reduced, by the programming language, so that there are no distinguishing characteristics that a well-behaved application ought not to be distinguishing (but might still be appropriate for other parties to distinguish between).
-0.0 only compares equal to 0.0 if you use a specific floating point comparison instruction (e.g. FCOM on x86). If you use a general comparison operation, these will compare unequal.
"\u{E9}" is equivalent to "e\u{301}", and therefore normalizes the same, but is not equal, and they will sort differently unless normalized (hence why NFC is so common). They compare differently in many languages like ECMAScript; but at a higher abstraction level like Swift's Character, they are "equal" because they're not supposed to be distinguishable to an application—there may be performance reasons to generate one or the other arbitrarily—and defining Character 's == as the "Unicode equivalence" operation accessible promotes interoperability.
Two sets with identical elements are not guaranteed to have any iteration order; it could hypothetically change every time you iterate a given set. Sets by definition are only distinguished by their elements, so a == operator that only compares the elements again promotes interoperability.
"two proofs of the same theorem are considered the same if
one can be transformed into the other through a series of
logical, or trivial, operations"
"something is not just similar, but exactly alike"
Exact sameness: It implies a level of certainty and precision
that goes beyond simple similarity. The phrase emphasizes that
no differences can be found, even if very close inspection is
performed.
No, because hashing a string or an array requires you to traverse its entire contents, which is specifically what this check is designed to avoid. Plus, you might have an array of things that aren't Hashable and still want to perform this check.
Nonetheless, Hashable is still more performant than Equals in many situations. Otherwise why would it exist? The proposal should note Hashable as a "considered alternative".
This sort of proves my point that "identical" should be well defined or should be changed for a better term. In particular, "equivalent by definition" broader than "equal".
I think the layman uses "identical" to mean "instances may not be distinguishable by their immutable characteristics" e.g. "those two cars are identical" since one could change their mutable characteristics, perhaps swap their locations, and not be able to determine this change was ever made.
I have to say… I'm not sure I understand the feedback suggesting Hashable as an alternative to what's being proposed.
What might help here is to please reference the concrete examples presented in the "Motivation" and "Proposed Solution" of the proposal. How would knowing that this Array adopts Hashable help improve performance of the real-world examples we presented in the proposal?
The main purpose of Hashable is in the first line of its documentation:
You can use any type that conforms to the Hashable protocol in a set or as a dictionary key.
While it's true that you want Hashable to be performant to guarantee efficient lookups in those collections, that's not a suitable way to performantly compare two things for identicality. Notably, two Arrays with independent backing stores but equal elements would return true if their hash values were compared (it must, since they are ==), and that is explicitly not the proposed semantics of is[Trivially]Identical(to:).
To answer this directly—Hashable allows you to look up a single element within a container with a time complexity that depends ~only on the element itself and not the size of the collection. But "the element itself" in the case of, say, a Set<[Int]> encompasses all the elements of the [Int]. OTOH the point of isTriviallyIdentical(to:) is to give a positve answer in the "identical" case without needing to process all the elements of an array.
So I would not, in the general case, expect lhs.hashValue == rhs.hashValue to be dramatically more performant than lhs == rhs. (Indeed, the docs for Hashable explicitly request that you feed exactly the same data into Hashable which is incorporated into the Equatable implementation.) Rather, Hashable is meant to make the set.contains(val) operation substantially more performant than array.contains(val).
Right, and to clarify what I meant by "performant", I really just mean "do not do more work than necessary", where "necessary" is load-bearing. If you're hashing a collection, "necessary" means "consider every element" or you're likely to cause hash collisions.
My memory is failing me, but I want to say that Java (?) famously had an issue a long time ago with their String hash where they only considered some small N-length prefix of the string, which made it very easy for an attacker to harm a service by feeding it strings that would intentionally collide.
I'm not suggesting it as an alternative; I'm suggesting it ought to be listed as an "alternative considered" and remark on why it's not the same thing.
In my personal opinion from a policy perspective, I wouldn't consider that to be an "alternative considered" since it's simply not suitable to solve the problem. I would define "alternatives" to mean "different but viable approaches".
I mean… where do we draw the line there? What about Codable? Does Codable deserve its own space as a legit "Alternative Considered"?
I'm blocked on presenting Hashable as an alternative because I'm blocked on understanding howHashable is an alternative.
I hate to sound like a broken record… but what might help:
Would you like to please address my question that was not answered?
While we are outside the "official" review cycle — which ended on Monday — I do appreciate the feedback and I do assume you left this feedback in good faith. I see no reason to update our proposal draft at this time but LSG can make the final call along with any necessary modifications if they decide to accept SE-0494. Thanks!