On the elementsEqual problem, or “[Pitch] Set and Dictionary should not be Sequences”

I'd say it's demonstrably an issue.


The answer was not conditional conformance, the answer was to decline that particular proposal. The problem that the proposal tried to solve is still unsolved.

Reading the post you link to, we see that "there was enough consensus on the review thread that the current name of the API can be misinterpreted", but the core team decided, after weighing a couple of factors, to keep the current name and thus the current situation, which is said to be "mitigated somewhat by the ability to consistently use == now that conditional conformances have been adopted (where appropriate) for Standard Library types".

So, the elementsEqual-issue, and the situation described in the original post of this thread:

has not been addressed.


This attempted solution is just further underlining the problem. Note the family of methods that isSuperset(of: )is part of:

isSubset(of: )
isStrictSubset(of: )
isSuperset(of: )
isStrictSuperset(of: )
...

So changing the name of one of those would just be adding more drama.

And what about String's contains(other: StringProtocol)?


I think the issue affects both activities (using the language to write apps and extending the language, or rather the standard library, with new functionality). I'm not sure that such a categorization/division of activities can/should be made. I'm sorry if I misunderstand your question.

1 Like

I think there is a fundamental difference between arbitrary people writing horrible code versus having such code in a prominent place of Swift itself.

We still have access to pointers, but because of the concerns regarding raw memory access, those abilities are not exposed directly, similar to the Collection-methods of Set in this proposal.

As far as I can see, most people follow the guides and stay away from problematic techniques.

2 Likes

I think both concerns are valid (app writers and language writers) and can be and sometimes are at odds, similarly to how you then can see library writers vs library users. IMHO, it is quite naive to assume that you do not have a maker needs vs consumer needs divide and that you do not need to prioritise for one or the other at any time. You got my question right, but I was looking for some more data on how not making this change would hurt app developers (at Apple and outside of Apple) which are the real consumer and driver on the ecosystem.

All right, Jens, there are deeply rooted problems, since you want it. But I can't help thinking that many of the problems you see exist only because of the way you look at certain questions. A change of perspective could address most of them, or make them much less important than you think.

For example, you claim that the elementsEqual problem has been unsolved. I simply answer "who cares?", because the harm has disappeared since the introduction of conditional conformance.

This is maybe another way to understand @Panajev's question: in my last paragraph, two different points of view lead to different conclusions about the same topic.

OK, you're both right. : )

I think my suggestion from the remove Sequence thread is a possible solution here.

Basically, we need to rethink the hierarchy underlying Collection. I think I agree with removing sequence, but at the same time we also need protocols which deal with the aspects of collection of a container and collection as an ordering separately. If we do this, I think with a bit of work, we can unify a bunch of different protocols which are currently separate into a single hierarchy.

Let's think through our options carefully and make a single set of changes!

I really think we need a protocol with this concept of a container where the order of iteration doesn't matter (or isn't defined). Sequence definitely wasn't that (nor would the name imply it to be), but this protocol would be the proper place for several of the methods placed on Sequence (e.g. contains(where:), map, etc...).

1 Like

Imho there has been a big focus on the consumer side in this thread — but I think the problem for framework authors is even bigger:
When using a Set, you can learn easily to ignore the majority of methods offered by this type.

But as a framework writer, you cannot limit your algorithms to structures like arrays and lists without defining special protocols. So just like elementsEqual, we end up with functions that make no sense for Set and Dictionary, and all we can do about it is pointing that out in documentation.

1 Like

Here's a recent relevant thread that covers this issue. I still hold the following opinions, expressed in much greater detail in that thread:

  • The fact that you can iterate Set and Dictionary in a fixed order is inherent to these types being useful (see the implementation of the various Set methods, for example), not an accident.
  • The main problem with elementsEqual was the name, but it was decided that the name wouldn't be changed. If the name of this method made it clear that compared in sequence order then it would be somewhat useless for Set but not harmful.
  • Equatable and elementsEqual are entirely unrelated, so nobody should expect that a == b implies a.elementsEqual(b) OR vice versa. e.g. You can reasonably create a collection type with additional properties that should be compared in an Equatable conformance, so even if elementsEqual is true, == may not be.
  • There are many possibly useful methods you can call on a Set or Dictionary because of Collection conformance, and a small number of possibly useless methods.
  • As I said in that other thread, I wouldn't mind collection conformance being moved to a property that provides a Collection conforming view, which is the topic of this thread. But, as @DeFrenZ mentioned above, the opposite decision was made for String, so I would have to understand what is different about this case. I don't understand the reasoning that the situation is different because of the clash between Equatable and elementsEqual, because that clash is inherent as mentioned above (and we failed to make the change to the name required to make the difference clear).

Again, the majority of methods on Set provided by Collection conformance are not demonstrably useless, see the above link. If you shuffle an Array the number of useful Collection methods on that Array is similar to that on Set.

3 Likes

Well done, @jawbroken: I think you put the finger on the exact point that creates conceptual angst in many members of the forum. It may be worth exploring a protocol split right there.

+1 Could we come up with a different hierarchy where users who want to implement unordered collections have a protocol to conform to?

3 Likes

It's flawless when considered completely separately from their (also flawless) implementations of Equatable. The problem is what happens when these two aspects get together. In the elementsEqual case, some want to leave the name alone because it makes perfect sense for Sequence, where ordering is salient, and others need to change it because it makes no sense whatever for Set, where ordering isn't salient. But it's easy to find problems beyond sensibility for the reader.

This works fine for all Collections:

extension Collection where Element : Comparable {
    func isSorted() -> Bool { 
        return !zip(self, self.dropFirst()).contains { $0 > $1 }
    }
}

But memoize it, and it stops working for Set.

var memo: [AnyHashable : Bool] = [:]
extension Collection where Self : Hashable, Element : Comparable {
    func isSorted() -> Bool {
        if let r = memo[self] { return r }
        let r = !zip(self, self.dropFirst()).contains { $0 > $1 }
        memo[self] = r
        return r
    }
}

Now, you might say, “serves you right for asking whether a Set is sorted,” but of course nobody asks that question directly. It will happen as part of an order-independent algorithm, to which passing a Set is perfectly sensible:

// Avoid building a new array if x is already sorted.
return x.isSorted() ? something(x) : something(x.sorted())

Yes, this example is somewhat contrived, but it makes my point: the clash of the fundamental notion of equality with the properties of Sequence has effects that you can't reason about locally. Which programmer is to blame for the bug I just constructed? I can't blame the person who memoized isSorted; that's supposed to be a semantics-preserving transformation for all functions without side-effects. When you can't find a programmer to blame for a bug it's a good sign that there's something broken in the system they've been given to work with.

FWIW, I have been scrutinizing approaches to (and definitions of) this problem for years. Until now I didn't have the insight needed to have confidence in a solution.

4 Likes

I don't see any daylight between them. The problems faced by the language are ultimately the same as those faced by framework and application developers. People often try to argue for an artificial schism where none actually exists, because they look at the code of a trivial application and it doesn't feel like the code in the standard library. But you can't write an application of any size without treating its parts like library components, and as you scale an application up, all the same concerns arise.

Sorry, which of Set's methods depend on iteration order?

  • The main problem with elementsEqual was the name, but it was decided that the name wouldn’t be changed. If the name of this method made it clear that compared in sequence order then it would be somewhat useless for Set but not harmful.
  • Equatable and elementsEqual are entirely unrelated, so nobody should expect that a == b implies a.elementsEqual(b) OR vice versa. e.g. You can reasonably create a collection type with additional properties that should be compared in an Equatable conformance, so even if elementsEqual is true, == may not be.

I disagree strongly with both those bullets. In fact I opposed the name change to elementsEqual because it didn't address the fundamental problem. It is true that f(a) == f(b) does not imply a == b, but your conclusion about what that means is wrong because you looked at the implication backwards. Because a == b implies that a is substitutable for b, Equatable is entirely related to everything (except non-salient attributes like the capacity of an Array).

  • As I said in that other thread, I wouldn’t mind collection conformance being moved to a property that provides a Collection conforming view, which is the topic of this thread. But, as @DeFrenZ mentioned above, the opposite decision was made for String, so I would have to understand what is different about this case. I don’t understand the reasoning that the situation is different because of the clash between Equatable and elementsEqual, because that clash is inherent as mentioned above (and we failed to make the change to the name required to make the difference clear).

It's not a clash between Equatable and elementsEqual. It's a clash between

  • Set's conformance to Equatable, and
  • Set's conformance to Sequence

Let me simplify it further. With Set we have a == b && a.first != b.first. Equal things should be substitutable, but these are not. The fact that elementsEqual is a name on Sequence that implicitly considers ordering to be significant is just a symptom of the underlying issue.

7 Likes

I didn't make assertions about the usefulness of Set-methods, but I admit that in this context, the statement can easily be interpreted that way. What I meant is that it's normal to only use a small subset of the methods a "large" type offers:
Set has roundabout 100 methods, but you can get along fine with nothing but insert, contains and makeIterator. Your milage may vary, but I bet many developers never utilize most of the more advanced functionality, so even if we remove Sequence conformance from Set, they'll never use the majority of its methods (nonetheless, I think shrinking the surface of a type by nearly 40 questionable methods and properties is a good idea -- see https://developer.apple.com/documentation/swift/set/order_dependent_operations_on_set).

But that's again consumer perspective, and absolutely not the point of my last post:
The status quo makes it impossible for framework authors to really express their intentions.
elementsEqual is not a singular problem that can be fixed by a better name -- the name is just fine, and hardly anyone would be confused by array.elementsEqual(list).

So we have a problem, and one solution would be prefixing some methods with a variant of "be careful with Set and Dictionary", the other is harnessing the type system.
This is one of the rare occasions where I can wholeheartedly say that the second option feels much more swifty.

1 Like

Oh, I wasn't clear. I didn't mean that they depend on iteration order, I meant that they depend on being able to iterate them (and therefore there is a fixed order for any reasonable implementation), and that a hypothetical Set type without the ability to iterate through all the contained elements isn't very useful. In other words, access to the proposed members view will be very common in the standard library, though perhaps outside of it people tend to call SetAlgebra methods more than Collection ones.

Okay, that makes more sense, thanks for elaborating. Well, I'm convinced that moving Collection conformance to a view would probably be beneficial. It doesn't exactly solve the substitutability issue but the extra indirection is at least a reminder.

2 Likes

When evaluating substitutability we only consider the attributes of a type that are salient to its instances' values. For example, if you have an Equatable class instance x, you need to treat ObjectIdentity(x) as non-salient to its value (unless all instances are being “uniqued” somehow).

If we declare that members is not salient to the value of a Set, the substitutability problem is solved. Given what you point out about member iteration being important to the implementation of many Set operations, it may be difficult to see members as non-salient, but that's because it carries both extremely salient information (which elements are in the set) and non-salient information (the order in which they are traversed). There's just no way to get the salient information out of the set without exposing the rest of it.

1 Like

Yes, this is a great summary of the situation and really helped to clarify my thinking here. I support moving Collection conformance to a view on Set and Dictionary, but accept that source compatibility concerns are going to make this a difficult proposal process. I don't think that crippling the members view by making it conform to some new protocol like UnorderedCollection that removes a bunch of the API, as some have suggested, is necessary or beneficial here.

The Java equivalent of Set and Dictionary can be iterated but they don't have any 'ordered' methods; you need a SortedSet or SortedDictionary for those. In Java people write non-buggy generic code because they are used to assuming that iteration order is in general non-deterministic; the protocol structure and documentation reinforce this point. I don't see whey it would be any different in Swift if the protocol structure and documentation continuously emphasised that iteration order is non-deterministic.

Note: You could write buggy code in Java because iteration is available, but people don't do so often in practice.

6 Likes

Agreed. Equality for the resulting collection would be determined in the usual way (using the elementsEqual method), so there's no reason to reclassify it.

Maybe it wouldn't be different if we were to ever declare iteration order to be non-determinstic, but IMO that is never going to happen. Determinism and repeatability of iteration is an inherent part of the Collection protocol and the correctness of many algorithms depends on it.

I understood him to mean non-deterministic in a less formal sense: that it may be different for two collections of the same type with the same values. I might have misunderstood him though.

1 Like

I can agree with the app example you make because for reasons I won’t go own now our family of new apps is built on several private cocoapods (very thin main app layer and the rest of our code is spread in each of the actual libraries... dev pods locally for the development of the app and distributed as actual pods for others to integrate).

In a lot of cases, even in non trivial applications, there could be different choices because you are relying on the STD Library and Cocoa... and I suspect a lot of apps on iOS have that kind of complexity and not the responsibility the STD Library has, but fair point here... we will see the wider discussion when the proposal comes out. It may very well be an overreaction to fears the language is going to trade usability for formal correctness/trying to protect people for themselves and I think it is fair to ask if there are diminishing returns there.