On the elementsEqual problem, or “[Pitch] Set and Dictionary should not be Sequences”

dabrahams · April 28, 2018, 6:41pm

I have come to a new point of clarity on the topic that may move the discussion forward, so I thought I'd share it here. Some of what I'm about to say has already been covered in this old thread, which I don't want to revive, but am happy to reference. I think the fundamental problem is that:

elementsEqual is intended to reflect a salient aspect of the protocol on which it is defined
Set and Dictionary conform to that protocol
For a and b of Set or Dictionary type, both these conditions can hold:
- a == b
- a.elementsEqual(x) != b.elementsEqual(x)

I think the above is what people are really getting at when they say things like “sets are inherently unordered collections.” That statement is meaningless if you read “collection” as Collection, since Collection inherently has an ordering, but I take it to mean, “the ordering of a Set is not a salient aspect of its value.” With that, I wholeheartedly agree: ordering can't be a salient aspect of the value, since it is not relevant to equality!

Therefore, the cure is to remove Sequence (and therefore Collection) conformance from Set and Dictionary. I propose that instead the collection of elements be accessible through a (non-salient) property called members. We should not do anything special to support direct iteration until some of the open questions about how iteration should work in the long run are resolved. Uses of these types in Sequence context would be migrated to operate on their members property.

Documentation note (attn @nnnnnnnn): I propose that all non-salient aspects of a value be documented by noting that they do not contribute to determining equality.

gwendal.roue · April 28, 2018, 7:51pm

I don't agree at all of the removal of Colection from Set and Diictionary. This would only bring developers (I mean, real ones, not lawyers) to just cast their sets and dictionaries to Arrays in order to fulfill their actual needs. Just as NSSet has an allObjects NSArray method. Removing this conformance from Set and Dictionary is blind, and to says the least, completely stupid. I'm sorry.

hartbit · April 28, 2018, 7:54pm

Do you mean that, if this were implemented:

for (key, value) in dictionary {
    // do something
}

would be invalid and force users to do iterations through members?

for (key, value) in dictionary.members {
    // do something
}

If that's the case, I'm fairly negative about this proposal. We'd be going for more correctness at the expense of convenience/ergonomics, the same way String used to vend its collection on characters, which was later reverted. People just won't understand the change.

adamkemp · April 28, 2018, 10:15pm

Onc again, this is absolutely unnecessary. Stop trying to make Set and Dictionary harder to use just because of some philosophical hangup. Iterating over the elements of a set or dictionary is useful, and we should not break source compatibility and make that operation harder just because it makes you uncomfortable that you can call sequenceEqual on them as well.

I remain entirely unconvinced that this is a real problem that needs to be solved, especially when the solution requires breaking source compatibility.

Chris_Lattner3 · April 28, 2018, 10:23pm

I don't think this will be acceptable given our high bar for source compatibility.

-Chris

DeFrenZ · April 28, 2018, 11:34pm

While I agree that the current state of collection protocols is flawed and it led to all the recent discussions on elementsEqual and Set, and I'd like to see a protocol hierarchy that can be said "correct", I'm starting to see this issue under a different light:
String is not a Collection per-se, String.characters is. String conforms to Collection though because that's it's most common and expected usage. I think the same applies to the usual unordered collections in the stdlib. While not "correct", unordered collections are expected to be iterated upon and used in all sorts of methods that expect a sequence with an order. So this can be seen as Set actually having a members property, but that got promoted to the type itself for ease of usage.
The only relevant issue I see in this is that this fact is not clear enough and I don't know how to improve the awareness of it (apart from more explicitness in the docs maybe?).

I'd probably prefer an opposite solution of basically breaking everything up and making it all "correct" but nice to use at the same time, but that is quite unacceptable at the language level by now

taylorswift · April 29, 2018, 12:02am

This is exactly what Python does,, you iterate over kv pairs by calling .items()

Jens · April 29, 2018, 5:03am

This is not only a problem for pedantic language lawyers, but also for people new to Swift, and even for seasoned Swift developers aiming to add new functionality to the standard library:

Similarly to the case of elementsEqual, it probably seemed straight-forward to add a method that checks whether a collection contains another collection:

contains<T: Collection>(_ other: T) -> Bool where T.Element == Element

But as soon as we realize that Set and String will also get this method, and that it will sit side by side with Set's isSuperset(of: ) and NSStrings contains(other: StringProtocol), things get more complicated, and we try to resolve it by changing the name to sequentuallyContains or perhaps rather containsCollection or something.

gwendal.roue · April 29, 2018, 5:13am

I have been blunt in my first post, and I apologize to @dabrahams and other readers. And you raise a valid concern, @Jens.

dabrahams · April 29, 2018, 5:29am

Yes, in the short term at least. There is strong evidence that during the next year we will need to completely rethink the fundamental mechanism that supports for...in loops, since the whole Iterator model is incompatible with move-only element types. I am not at all opposed to bringing back support for direct for...in iteration but IMO it would be a terrible mistake to do anything special for Set and Dictionary at this point, with that known redesign coming down the pike.

This is a very different situation. Collection conformance was originally removed to avoid exposing a semantic quirk the behavior of concatenation, namely, that some characters will combine (e.g. "e" + "\u{301}" == "é", with count == 1). It was a mistake because:

Concatenation isn't even a feature of Collection: we should have only removed RangeReplaceableCollection conformance if we were going to go down this route.
The quirk wouldn't even be noticeable for most strings
The quirk has almost no knock-on effects: there are no known algorithms that depend on it not happening.
The quirk could easily be handled by documenting String's odd notion of concatenation

What we have here is a clash in what it means for Set and Dictionary to be Equatable (which fundamentally defines a type's value) with what it means for them to be Sequences. As @Jens has pointed out, the knock-on effects of this clash are serious and insidious. I fully expect that if we don't fix this, new ones will continue to be found.

People just won’t understand the change.

I believe they will understand it if it is explained the right way, but IMO that is not nearly as crucial as getting the language right. Many more people will use Swift in the future than will endure (and possibly misunderstand) any given change.

gwendal.roue · April 29, 2018, 5:31am

What I really like about the current conformance of Set and Dictionary to Collection/Sequence is the ability to write a method that accepts a container-whose-ordering-does-not-matter as an argument, which the user of that method can feed with sets, or arrays, or any other collection, without even thinking about it. The old way of doing it, feeding [set allObjects] instead of the set itself, looks so "pre-POP" in comparison.

Tino · April 29, 2018, 5:35am

This change seems to have much stronger opposition than the "deprecate Sequence" part...
But when people can accept to remove Sequence, they would probably also accept to slim it down drastically:
When we move everything except makeIterator to Collection, we could keep for (key, value) in dictionary, and only remove the problematic behavior.

Keeping Set conforming to Sequence (but not Collection) would definitely not be my favorite choice, because is think it's as natural to call a set a collection as it's wrong to call it a sequence -- but the names of protocols have only little impact on real-word coding, compared to the method list shown in autocompletion.

I see it as proven that the current design has fundamental flaws, and we better hurry to fix them (just not to much hurry, please ;-)

dabrahams · April 29, 2018, 5:47am

For that to make sense, though, you need a method-to-which-ordering-does-not-matter. Many such methods (e.g. firstIndex(of:), contains(_) are not Collection requirements, but have specialized implementations for Set and Dictionary such that calling them “without even thinking” turns an operation that should be O(1) into an O(N) one. Others work sometimes; for example, s.reduce(0, +) only makes sense because addition is commutative. Replace it with subtraction and we're back in semantic-clash-land.

hartbit · April 29, 2018, 5:59am

Perhaps we should define an Iterable protocol (basically Sequence but without the protocol extensions functions) and make Dictionary and Set conform to it but not Collection. We would keep the convenience of iterating over them without the dangers of having them as collections. And, we could define custom map functions for them that return the same type

Tino · April 29, 2018, 6:03am

As long as it would still be possible to iterate over an Array, and have Array.map return an array, I'm fine with that ;-)

dabrahams · April 29, 2018, 6:07am

With a necessary redesign of the fundamental mechanism for for...in coming down the pike, IMO introducing a new protocol to serve that purpose at this point would be a mistake. Also, if it's "Basically Sequence," it has all the problems of Sequence, regardless of whether the standard library happens to add protocol extensions to it.

Tino · April 29, 2018, 6:19am

I don't think "full protection" is feasible:
As long as Set and Dictionary provide a way to iterate through their elements (just to avoid questions: I'm convinced they should), you can build start(with:), reverse and other operations that depend on order for them.

We can't stop users building footguns, but we can stop delivering them.

gwendal.roue · April 29, 2018, 7:19am

dabrahams:

What I really like about the current conformance of Set and Dictionary to Collection/Sequence is the ability to write a method that accepts a container-whose-ordering-does-not-matter as an argument, which the user of that method can feed with sets, or arrays, or any other collection, without even thinking about it.

For that to make sense, though, you need a method-to-which-ordering-does-not-matter. Many such methods (e.g. firstIndex(of:), contains(_) are not Collection requirements, but have specialized implementations for Set and Dictionary such that calling them “without even thinking” turns an operation that should be O(1) into an O(N) one. Others work sometimes; for example, s.reduce(0, +) only makes sense because addition is commutative. Replace it with subtraction and we’re back in semantic-clash-land.

"For that to make sense": you go too far... People have been writing generic code on Collection for years now. Valid code that does not break when given set or dictionaries, because Set and Dictionary have flawless implementations of the Collection protocol.

And as soon as you make Set a painful container to deal with, people will just dump their contents in arrays, superbly ignore your semantic angst, and write buggy or correct code, as usual.

Soon enough, a proposal for a Set.elements property returning a collection would come. That's unavoidable: people will not find it very funny to copy an Array(set) initializer in all their projects, and others will complain about the memory overhead created by this "convenience" array.

And people would start using the Set.elements collection in ways that make you feel in a bad horror movie again. Your concern can't be satisfied.

I find @Jens' questions much more to the point: the fact that Set adopts Collection makes it painful to extend Collection with useful methods, because they may clash with the semantics of sets.

But there are many ways to answer this potential issue.

For example, the answer to elementsEqual was conditional conformance.

A possible answer to collection.contains(otherCollection) for sets is to make sure that the semantics of this method allow it to be implementable by Set through Set.isSuperSet(of:). All we'd then need would be to make Collection.contains(_:) a customization point. And maybe deprecate Set.isSuperSet(of:) eventually. End of the drama.

There are many answers. I wish they were examined with more scrutiny before we throw away years of work.

Edit: sorry about the redundant Set.elements paragraph above: you did introduce Set.members in your original post.

Panajev · April 29, 2018, 8:38am

Do you and @dabrahams see a bigger problem for people using the language to write applications/programs or for people trying to modify/add new features to Swift itself?

Can we prioritise both or do we need to choose in this case?

Jens · April 29, 2018, 8:54am

I'd say it's demonstrably an issue.

The answer was not conditional conformance, the answer was to decline that particular proposal. The problem that the proposal tried to solve is still unsolved.

Reading the post you link to, we see that "there was enough consensus on the review thread that the current name of the API can be misinterpreted", but the core team decided, after weighing a couple of factors, to keep the current name and thus the current situation, which is said to be "mitigated somewhat by the ability to consistently use == now that conditional conformances have been adopted (where appropriate) for Standard Library types".

So, the elementsEqual-issue, and the situation described in the original post of this thread:

has not been addressed.

This attempted solution is just further underlining the problem. Note the family of methods that isSuperset(of: )is part of:

isSubset(of: )
isStrictSubset(of: )
isSuperset(of: )
isStrictSuperset(of: )
...

So changing the name of one of those would just be adding more drama.

And what about String's contains(other: StringProtocol)?

I think the issue affects both activities (using the language to write apps and extending the language, or rather the standard library, with new functionality). I'm not sure that such a categorization/division of activities can/should be made. I'm sorry if I misunderstand your question.