On the elementsEqual problem, or “[Pitch] Set and Dictionary should not be Sequences”

+1 Could we come up with a different hierarchy where users who want to implement unordered collections have a protocol to conform to?

3 Likes

It's flawless when considered completely separately from their (also flawless) implementations of Equatable. The problem is what happens when these two aspects get together. In the elementsEqual case, some want to leave the name alone because it makes perfect sense for Sequence, where ordering is salient, and others need to change it because it makes no sense whatever for Set, where ordering isn't salient. But it's easy to find problems beyond sensibility for the reader.

This works fine for all Collections:

extension Collection where Element : Comparable {
    func isSorted() -> Bool { 
        return !zip(self, self.dropFirst()).contains { $0 > $1 }
    }
}

But memoize it, and it stops working for Set.

var memo: [AnyHashable : Bool] = [:]
extension Collection where Self : Hashable, Element : Comparable {
    func isSorted() -> Bool {
        if let r = memo[self] { return r }
        let r = !zip(self, self.dropFirst()).contains { $0 > $1 }
        memo[self] = r
        return r
    }
}

Now, you might say, “serves you right for asking whether a Set is sorted,” but of course nobody asks that question directly. It will happen as part of an order-independent algorithm, to which passing a Set is perfectly sensible:

// Avoid building a new array if x is already sorted.
return x.isSorted() ? something(x) : something(x.sorted())

Yes, this example is somewhat contrived, but it makes my point: the clash of the fundamental notion of equality with the properties of Sequence has effects that you can't reason about locally. Which programmer is to blame for the bug I just constructed? I can't blame the person who memoized isSorted; that's supposed to be a semantics-preserving transformation for all functions without side-effects. When you can't find a programmer to blame for a bug it's a good sign that there's something broken in the system they've been given to work with.

FWIW, I have been scrutinizing approaches to (and definitions of) this problem for years. Until now I didn't have the insight needed to have confidence in a solution.

4 Likes

I don't see any daylight between them. The problems faced by the language are ultimately the same as those faced by framework and application developers. People often try to argue for an artificial schism where none actually exists, because they look at the code of a trivial application and it doesn't feel like the code in the standard library. But you can't write an application of any size without treating its parts like library components, and as you scale an application up, all the same concerns arise.

Sorry, which of Set's methods depend on iteration order?

  • The main problem with elementsEqual was the name, but it was decided that the name wouldn’t be changed. If the name of this method made it clear that compared in sequence order then it would be somewhat useless for Set but not harmful.
  • Equatable and elementsEqual are entirely unrelated, so nobody should expect that a == b implies a.elementsEqual(b) OR vice versa. e.g. You can reasonably create a collection type with additional properties that should be compared in an Equatable conformance, so even if elementsEqual is true, == may not be.

I disagree strongly with both those bullets. In fact I opposed the name change to elementsEqual because it didn't address the fundamental problem. It is true that f(a) == f(b) does not imply a == b, but your conclusion about what that means is wrong because you looked at the implication backwards. Because a == b implies that a is substitutable for b, Equatable is entirely related to everything (except non-salient attributes like the capacity of an Array).

  • As I said in that other thread, I wouldn’t mind collection conformance being moved to a property that provides a Collection conforming view, which is the topic of this thread. But, as @DeFrenZ mentioned above, the opposite decision was made for String, so I would have to understand what is different about this case. I don’t understand the reasoning that the situation is different because of the clash between Equatable and elementsEqual, because that clash is inherent as mentioned above (and we failed to make the change to the name required to make the difference clear).

It's not a clash between Equatable and elementsEqual. It's a clash between

  • Set's conformance to Equatable, and
  • Set's conformance to Sequence

Let me simplify it further. With Set we have a == b && a.first != b.first. Equal things should be substitutable, but these are not. The fact that elementsEqual is a name on Sequence that implicitly considers ordering to be significant is just a symptom of the underlying issue.

7 Likes

I didn't make assertions about the usefulness of Set-methods, but I admit that in this context, the statement can easily be interpreted that way. What I meant is that it's normal to only use a small subset of the methods a "large" type offers:
Set has roundabout 100 methods, but you can get along fine with nothing but insert, contains and makeIterator. Your milage may vary, but I bet many developers never utilize most of the more advanced functionality, so even if we remove Sequence conformance from Set, they'll never use the majority of its methods (nonetheless, I think shrinking the surface of a type by nearly 40 questionable methods and properties is a good idea -- see https://developer.apple.com/documentation/swift/set/order_dependent_operations_on_set).

But that's again consumer perspective, and absolutely not the point of my last post:
The status quo makes it impossible for framework authors to really express their intentions.
elementsEqual is not a singular problem that can be fixed by a better name -- the name is just fine, and hardly anyone would be confused by array.elementsEqual(list).

So we have a problem, and one solution would be prefixing some methods with a variant of "be careful with Set and Dictionary", the other is harnessing the type system.
This is one of the rare occasions where I can wholeheartedly say that the second option feels much more swifty.

1 Like

Oh, I wasn't clear. I didn't mean that they depend on iteration order, I meant that they depend on being able to iterate them (and therefore there is a fixed order for any reasonable implementation), and that a hypothetical Set type without the ability to iterate through all the contained elements isn't very useful. In other words, access to the proposed members view will be very common in the standard library, though perhaps outside of it people tend to call SetAlgebra methods more than Collection ones.

Okay, that makes more sense, thanks for elaborating. Well, I'm convinced that moving Collection conformance to a view would probably be beneficial. It doesn't exactly solve the substitutability issue but the extra indirection is at least a reminder.

2 Likes

When evaluating substitutability we only consider the attributes of a type that are salient to its instances' values. For example, if you have an Equatable class instance x, you need to treat ObjectIdentity(x) as non-salient to its value (unless all instances are being “uniqued” somehow).

If we declare that members is not salient to the value of a Set, the substitutability problem is solved. Given what you point out about member iteration being important to the implementation of many Set operations, it may be difficult to see members as non-salient, but that's because it carries both extremely salient information (which elements are in the set) and non-salient information (the order in which they are traversed). There's just no way to get the salient information out of the set without exposing the rest of it.

1 Like

Yes, this is a great summary of the situation and really helped to clarify my thinking here. I support moving Collection conformance to a view on Set and Dictionary, but accept that source compatibility concerns are going to make this a difficult proposal process. I don't think that crippling the members view by making it conform to some new protocol like UnorderedCollection that removes a bunch of the API, as some have suggested, is necessary or beneficial here.

The Java equivalent of Set and Dictionary can be iterated but they don't have any 'ordered' methods; you need a SortedSet or SortedDictionary for those. In Java people write non-buggy generic code because they are used to assuming that iteration order is in general non-deterministic; the protocol structure and documentation reinforce this point. I don't see whey it would be any different in Swift if the protocol structure and documentation continuously emphasised that iteration order is non-deterministic.

Note: You could write buggy code in Java because iteration is available, but people don't do so often in practice.

6 Likes

Agreed. Equality for the resulting collection would be determined in the usual way (using the elementsEqual method), so there's no reason to reclassify it.

Maybe it wouldn't be different if we were to ever declare iteration order to be non-determinstic, but IMO that is never going to happen. Determinism and repeatability of iteration is an inherent part of the Collection protocol and the correctness of many algorithms depends on it.

I understood him to mean non-deterministic in a less formal sense: that it may be different for two collections of the same type with the same values. I might have misunderstood him though.

1 Like

I can agree with the app example you make because for reasons I won’t go own now our family of new apps is built on several private cocoapods (very thin main app layer and the rest of our code is spread in each of the actual libraries... dev pods locally for the development of the app and distributed as actual pods for others to integrate).

In a lot of cases, even in non trivial applications, there could be different choices because you are relying on the STD Library and Cocoa... and I suspect a lot of apps on iOS have that kind of complexity and not the responsibility the STD Library has, but fair point here... we will see the wider discussion when the proposal comes out. It may very well be an overreaction to fears the language is going to trade usability for formal correctness/trying to protect people for themselves and I think it is fair to ask if there are diminishing returns there.

I think this is a misunderstanding of what has been proposed. I don't think anyone is proposing "crippling" Sets or Dictionary. Instead we are proposing to provide a protocol (I call it "CountableContainer") which allows iteration over the members in an undefined order (and all the API which can be built on that). Collection would then inherit from that protocol and add guarantees of ordering. Collection would have all the API it has now. That would make it so that generic code can target whether it just needs to iterate over the members or whether it needs a defined order to work correctly. For...in would work with both protocols.

This is different from having a view on Set which is a collection. That worked on String because the elements of a String have a well defined order. Unless the view is sorting the elements, it is just shifting where the problem occurs. You would still have equivalent sets with unequal views and all the issues which come with that. All we have done is make for...in harder to use with Sets and break a bunch of code.

Okay, this is what I was colloquially referring to as “crippling”. From the survey I did of Sequence methods, linked in my post above, I concluded that most of these methods could be reasonably used on a Set in some circumstances, so I don't personally find splitting the protocol just to remove a small number of probably useless methods that compelling.

Thanks for pointing this out!

Will your envisioned set.members honor this rule? For example:

let a: Set = [1, 2, 3]
let b: Set = [1, 2, 3]
assert(a == b)
assert(a.members.elementsEqual(b.members))

If not, what protocols would Set.member adopt?

I agree with this, but how can these be declared? A primary example of this would be the random shuffling. In that case, even passing the same object multiple times would return different results...

Yes, because it will be declared non-salient to the set’s value. The code you wrote would be allowed to assert.

@dabrahams I'm getting convinced about the motivations for this proposal, but I'm not yet sure its feasible. Can we discuss details:

Migration

I'm worried about the churn it will put Swift users through. Can you elaborate more on the migration plan? I see two possible plans, both pretty bad on Swift users:

Plan 1: Smooth

  • Swift 5: Dictionary and Set remain Collections to ease migration but users start getting a deprecation warning that they should instead use the .items property to iterate and do collection operations on those types.
  • Swift X: Dictionary and Set stop being Collections.
  • Swift Y: We get a new iteration protocol that Dictionary and Set can conform to.

Problem: Wouldn't this break the Standard Library ABI post-Swift 5? Is this even possible?

Plan 2: Hard

  • Swift 5: Dictionary and Set stop being Collections. Users get an error to use the .items property to do collection operations on those types.
  • Swift Y: We get a new iteration protocol that Dictionary and Set can conform to.

Problem: This is a much harder transition to put users through, without any prior warning.

Conclusion

I feel pretty certain both plans will receive a lot of negative feedback from the greater Swift community because it will be a fairly large breaking change. In both plans, we also need to migrate users to now stop using the .items when iterating them in the Swift Y version. This might be confusing to users that are first asked to use .items, even when iterating (in Swift 5), only to be told not to use it when iterating in a later version of Swift.

Alternatives?

Have you pondered alternative solutions to this problem? Could we move all functions from Collection which make sense for unordered collections (including iteration) and move it to a new protocol that Collection would conform to?

protocol UnorderedCollection {
    associatedtype Element
    associatedtype OrderedView: Collection where Collection.Element == Element

    var orderedView: OrderedView { get }
    var count: Int { get }
    var isEmpty: Bool { get }
    var lazy: LazyUnorderedCollection<Self> { get }

    func contains() -> Element?
    func min() -> Element?
    func max() -> Element?
}

extension UnorderedCollection {
    var count: Int {
        return orderedView.count
    }

    var isEmpty: Int {
        return orderedView.isEmpty
    }

    var lazy: LazyUnorderedCollection<Self> {
        return LazyUnorderedCollectionWrapper(orderedView)
    }

    func contains() -> Element? {
        return orderedView.contains()
    }

    func min() -> Element? {
        return orderedView.min()
    }

    func max() -> Element? {
        return orderedView.max()
    }
}

We just can't get iteration to work out of the box without introducing functions on Sequence, which don't make sense for unordered collections (like reduce). Of course, if Sequence is typealiased to Collection as the other thread suggests, then we're stuck: we can't make any progress without a new iteration protocol.

2 Likes

Perhaps I missed something, but where did you read that the proposal was for a new iteration protocol to be introduced? As I understood it, Collection conformance would just be moved to a view-providing property (called .members in the original post). Then all iteration and API built on iteration would be done via this view instead of the original. So the migration plan is just to either deprecate or error on direct use of Collection/Sequence API on Set and Dictionary and move all these uses over to the .members view. There's no further planned step where .members is deprecated and removed, and no addition to the protocol hierarchy of dubious benefit.

It seems like iteration needs to be rethought to support the future Ownership language features:

But once those are resolved, it would make sense that a new iteration protocol would be introduced, seing how Sequence is probably disappearing to be type-aliased to Collection.

I find the ergonomics of having to iterate using that member fairly bad, and that's why once that new iteration protocol is introduced, it would make sense to me for Dictionary and Set to conform to it.

PS: If anybody is interested in discussing names, I like .items much better :slight_smile:

2 Likes

That is indeed what I meant. Should have expressed myself better!