In evaluating whether “representation” is a good term for what this operation compares, my example is still relevant, as it explicitly intends for two instances with completely different values (and therefore different representations in memory) to compare “identically.”
We already have the axiom that a copy is a representation that is identical. But I do not believe we need the axiom that a representation that is identical is a copy. A "copy" seems to be a much stronger guarantee than we need for library maintainers to ship isIdentical(to:) as an impactful API:
If there does exist legit documentation in Standard Library or evolution where we formally define that identical representations are also copies then please show me where to find that. I searched around everywhere I could think of and the only clues I could find implied the other direction: that copies are identical.
Anyway… the point of this was that I would see nothing wrong with a library maintainer choosing to implement isIdentical(to:) in a way that does not always compare as "same bitwise representation". As long as that library maintainer can adopt and satisfy the semantic guarantees then they should be free to have that extra flexibility here.
I personally would choose the name isKnownIdentical or something similar because whether it returns true or false depends on implementation details of the type and it can easily provide unreliable and/or unexpected results. For instance because of the inline small string representation, this produces two identical strings:
let x = "a" + "b"
let y = "a" + "b"
x._isIdentical(to: y) // true
whereas the same operation with strings slightly longer may or may not produce identical strings:
let x = "aaaaaaaa" + "bbbbbbbb"
let y = "aaaaaaaa" + "bbbbbbbb"
x._isIdentical(to: y) // true or false, depending on optimization level
There’s no documented logic for this difference: whether the two strings are identical or not depends on internal implementation details of the type and compiler optimizations. Perhaps a future version of the standard library will produce different results, or a different version of the compiler, different constraints on the optimizer, and probably it’s already different for other architectures. If this identical check cannot have predictable semantics, this uncertainty should be reflected in the name. Using isKnown* helps portray the uncertainty, just like isKnownUniquelyReferenced which is influenced by similar factors.
Personally I’d tend to implement this with a standalone function doing a dumb bitwise comparison of the two value’s representations and tolerate the occasional false negatives when there’s a difference in padding bits. No need to add functions to every type, or for a protocol, and it works with every type out of the box. It keeps things simple… but I’ll admit I haven’t checked how often that would yield false negatives.
I find the name isKnownIdentical pretty confusing.
a.isKnownIdentical (to: b)
It seems to be saying that a is identical to b because we have done the check before and stored the result, and we made the decision based on that result?
As the title of the proposal strongly emphasizes the rapidness of the check, why not make this explicit in the function's name:
a.isFastIdentical (to: b)
a.isQuickyIdentical (to: b)
a.isSwiftlyIdentical (to: b)
Not having this named by a protocol is a mistake in my opinion. That is almost always true. There are just too many cases where you want to express some requirement or provide an extension where being able to name the constraint makes life much easier. It also lets types implement conditional conformance when they can and want to pass-through this support to their contents without unconditionally having to make a promise they can’t really keep.
The protocol also serves a documentary function: It can be the canonical place we hang all the exposition about what this really means, how it should be used, and what the axioms are - rather than smearing or duplicating that across a bunch of disparate concrete method signatures on specific types.
I’d also suggest some editing of the exposition text. A lot of words are written that repeat the same thing slightly differently. I don’t blame the proposal author - good documentation is very difficult. I would say the protocol should explain:
What it means to adopt the protocol and provide this method.
The axioms that must be preserved to adopt it.
WHY a type should adopt it (or not)
WHY a client would want to use this feature (or not)
Don’t bury the lede. Instead of
Returns a boolean value indicating whether this array is identical to `other`.
say
Returns a boolean value indicating whether this array is identical to other without paying the full cost of actually comparing self with other. This is primary useful for optimized algorithms that rely on equality checking to speed up an operation but the equality checking itself can negate that performance improvement. Most normal types should not need to adopt this and should instead rely on Equatable.
As an example type that use a uniquely referenced buffer to store their contents could check if the buffers point at the same memory address (see isKnownUniquelyReferenced).
There in the first paragraph we now give all the important details:
What this does
Why someone would want to do this
But you probably don’t want to do this.
The second paragraph gives us a very concrete idea of how you might do this and having a very concrete context really helps humans understand abstract concepts. When all you have is the abstract it is much more difficult to grasp the concept in a useful way.
"I know these two are identical" does indeed mean the same thing as "these two are identical". However, the reverse isn't true!
"I don't known if these two are identical" does not imply "these two are not identical".
This is crucial. Because if isKnownIdentical returns true, then you can rest assure that == also returns true. However if isKnownIdentical returns false, you need to investigate.
It strikes me that isKnown is not the right construct here, since isKnownIdentical == false could also be read as “is known as not identical.” hasKnown seems to convey the underlying meaning better, including the reverse. Maybe a different word altogether from “known” is more appropriate.
Naming has been a big debate, but do we even need a single name shared between all these instance methods, given that their implementations will rely on different “shortcuts”?
The proposal states it’s not going to be a protocol, so why try to lump all these disparate implementations under one single name to begin with. Just name them as their implementation conveys for that particular type.
I think everybody is right. That is, I think the logic of people who’ve argued in favor is convincing, and I think the unease of people who’ve argue against is reasonable. I can’t see any consensus coming out of this, and I think that’s a reason not to enshrine the feature in the language.
The proposal itself is technically good — complete and correct. I do have a small doubt about the need for a requirement that isIdentical should not always return false. To me, this suggests a vague conceptual difficulty (something like: “Why is the whole idea so fragile that it’s broken by a particular pattern of return values?”), and it seems wrong to legislate it away instead of identifying and resolving it.
The conditions for a true result are bipartite: (#1) some kind of fast path to check for potential sameness, and (#2) a true result on that fast path. I think it’s important for the function name to be double-barreled. For example, isKnownIdentical is better than isIdentical, though people have raised issues with the semantics of “known”.
The semantics of a false result from isIdentical remain problematic. Logically, “~((#1) & (#2))” is straightforward. Linguistically, “not (#1) and (#2)” can be a bit of a tangle that's difficult to unravel.
We have an implicit requirement on functions — at least those proposed via SE — that the semantics of the function name should match the behavior of the function. We would not, for example, accept a proposal for a plus function operating on two integers that returned the product of the values. (A more real-world example is that Swift no longer uses the + operator for string concatenation, basically for this reason.)
In the case of this proposal, “identical” is a pretty generic English world that is essentially co-opted here into a technical meaning that’s defined by the behavior of the isIdentical function itself. The semantics of “identical” are still floating free.
The proposal could get around this semantic vagueness by building on a base function returning a concrete value (some people think of this as the “representation”) that can be compared quickly. The `isIdentical` function could then be described as equality of these “identicality” values. However, the proposal has explicitly chosen not to do this. Well, OK, but that’s where it loses my support.
If we decide to go ahead with this anyway, I’d suggest that the function name at least be changed. Of names already mentioned, I like “isKnownIdentical” best, although I agree that this mysterious knowingness is a bit of an obstacle.
As an alternative, I’d like to suggest isTriviallyIdentical(to:):
This is suitably bipartite.
“Trivially” suggests that this is intended to be low-cost.
The English semantics of “not trivially identical to” are understandable enough to avoid any likely confusion. I can’t imagine that anyone would (for example) casually assume that a so-named function is somehow the lower-level underpinning of the == operator.
That’s not my concern when fileprivate exists if someone is not wanting to expose truly private state even to their internal module.
We’re talking about the primary author and provider of these functions being the Standard Library, at least until and if a protocol is created to formalize this concept.
This whole idea evolved over time. The original pitch thread was a new protocol. The second pitch thread was attaching a new requirement on Equatable. The proposal review we have here before us today is only for concrete types. There are some compelling reasons to ship a new protocol… but it did not look like a very impactful change at this point in time. We want to ship the concrete types first and if the community incubates a protocol in the wild we might then choose to come back and codify this with a new protocol in standard library.
There's pros and cons there. Yes it can be desirable to formally codify axioms generally across all concrete types that might adopt this protocol… but that's an extra problem that this proposal is adopting explicitly "weak" opinions on. Our strong opinions are on concrete types. We do present some weak opinions on what an "abstract" requirement might look like… but the impact of shipping a legit formal abstract requirement through a protocol does not look great enough for us to put that on our calendar at this specific point in time.
So I appreciate you taking a look at the proposed header doc comments for us. In general proposal reviews do not stress out too much about header doc comments. I added the header doc comments based on some feedback during our pitch thread that asked us to formally document the potential axioms and guarantees that concrete types will follow. I do want for engineers here in proposal review to highlight any problems with those axioms and guarantees… but as for the actual copy of the header doc comments I believe the library maintainers can and should have the ability to edit and improve those for their own library types during implementation diff review before landing on main. If you are available to also volunteer and help follow along with those implementation diffs — assuming this proposal is approved — I would appreciate the extra help there if you want to help us review.
You're not wrong… these are all good ideas and I have no problem improving the header doc comments with these ideas if we get to the implementation diff review stage. Thanks!
This is one of the reasons I proposed isKnownSubstitutable and isKnownInterchangeable as alternatives:
The advantage of isKnownSubstitutable over isKnownIdentical is that we improve the accuracy of the name: the false value is "more correct". The disadvantage is we now return less information: the true value is a "weaker" semantic guarantee than before.
I'll let LSG make the final decision here… but IMO the overwhelming majority of engineers that post here with suggestions for different names want the proposal to ship. They just want the proposal to ship with a different name. Which is different than saying engineers are arguing against the proposal.
If any engineers here are opposed to this proposal shipping under any name then I encourage you to share that feedback with LSG going into their decision next week.
The requirement that isIdentical should not always return false is not a "primitive" or "primary" axiom created for this proposal. The definition of Copyable — which all concrete types in this proposal adopt — tells us that copies are representations that are identical. This is a well-established definition that has shipped for years. Our requirement that isIdentical should not always return false is just an implication and derivation of the established requirement that a copy must be identical.
Respectfully: I believe you might be making a statement that is not accurate. Your number one requirement seems to imply that a condition of returning true is the "existence" of a fast path. With the implication being that the existence of this fast path might "go away" or "leave" based on runtime conditions. All the concrete types proposed here do have a fast path. The existence of that fast path is already there: we don't have to think through the semantic implications of "what do we return when our fast path goes away".
If this specific feedback is directed to the "general" guidelines and suggestions we make for library maintainers bringing this method to future types then I would recommend to keep the primary goal of this proposal review focused on the concrete types being presented. One of our reasons not for proposing a new protocol is because we don't want to prioritize for an extended debate discussion about abstract requirements at this time. It would be "nice to have" some abstract requirements in place, and a discussion of some "weak" opinions is useful enough that it was added to the proposal, but we should probably not be focusing too much of our energy there at the expense of our focus on the concrete types.
Respectfully: the idea that a representation can be "identical" — and that this has a legit technical meaning independent of a "generic" dictionary definition of "equal" — is a term of art that has shipped in Standard Library for ten plus years.
Now… you could very well make the argument that this proposal is bringing a term of art that was historically applied to reference types to now apply to value types. That would be an accurate observation. But that is different than implying that identical is a generic word and bringing that generic word to a programming context is now resulting in a "circular" definition that does not have the ability to reference back to an existing term of art.
This was discussed in our "Alternatives Considered". Bringing an "escapable identity" to value containers like Array is a very different proposal than what we have here. We are not considering that at this time.
Yes, I do understand what you mean and I don't disagree. I think I can recast my point like this:
In the context of a representation, "identical" is a meaningful term.
However, in the context of "identical" (but not in the context of SE-094, which context wouldn't be a typical point of reference when this function might be used after SE approval), it's not a priori obvious that a representation is meant.
The proposal has a paragraph titled “Exposing Identity” — but I think its more vague than it could be.
I have not checked all types which are mentioned in the text, but I guess most identity checks are performed by comparing pointers, right?
Whereas “isIdentical” (or even more verbose variants) will always have a potential of misinterpretation, I think a property named “address” is quite safe in this respect — and we could even choose a name like “identityWitness” in case that there are some types which use a different technique for the check.