On the elementsEqual problem, or “[Pitch] Set and Dictionary should not be Sequences”

Okay, this is what I was colloquially referring to as “crippling”. From the survey I did of Sequence methods, linked in my post above, I concluded that most of these methods could be reasonably used on a Set in some circumstances, so I don't personally find splitting the protocol just to remove a small number of probably useless methods that compelling.

Thanks for pointing this out!

Will your envisioned set.members honor this rule? For example:

let a: Set = [1, 2, 3]
let b: Set = [1, 2, 3]
assert(a == b)
assert(a.members.elementsEqual(b.members))

If not, what protocols would Set.member adopt?

I agree with this, but how can these be declared? A primary example of this would be the random shuffling. In that case, even passing the same object multiple times would return different results...

Yes, because it will be declared non-salient to the set’s value. The code you wrote would be allowed to assert.

@dabrahams I'm getting convinced about the motivations for this proposal, but I'm not yet sure its feasible. Can we discuss details:

Migration

I'm worried about the churn it will put Swift users through. Can you elaborate more on the migration plan? I see two possible plans, both pretty bad on Swift users:

Plan 1: Smooth

  • Swift 5: Dictionary and Set remain Collections to ease migration but users start getting a deprecation warning that they should instead use the .items property to iterate and do collection operations on those types.
  • Swift X: Dictionary and Set stop being Collections.
  • Swift Y: We get a new iteration protocol that Dictionary and Set can conform to.

Problem: Wouldn't this break the Standard Library ABI post-Swift 5? Is this even possible?

Plan 2: Hard

  • Swift 5: Dictionary and Set stop being Collections. Users get an error to use the .items property to do collection operations on those types.
  • Swift Y: We get a new iteration protocol that Dictionary and Set can conform to.

Problem: This is a much harder transition to put users through, without any prior warning.

Conclusion

I feel pretty certain both plans will receive a lot of negative feedback from the greater Swift community because it will be a fairly large breaking change. In both plans, we also need to migrate users to now stop using the .items when iterating them in the Swift Y version. This might be confusing to users that are first asked to use .items, even when iterating (in Swift 5), only to be told not to use it when iterating in a later version of Swift.

Alternatives?

Have you pondered alternative solutions to this problem? Could we move all functions from Collection which make sense for unordered collections (including iteration) and move it to a new protocol that Collection would conform to?

protocol UnorderedCollection {
    associatedtype Element
    associatedtype OrderedView: Collection where Collection.Element == Element

    var orderedView: OrderedView { get }
    var count: Int { get }
    var isEmpty: Bool { get }
    var lazy: LazyUnorderedCollection<Self> { get }

    func contains() -> Element?
    func min() -> Element?
    func max() -> Element?
}

extension UnorderedCollection {
    var count: Int {
        return orderedView.count
    }

    var isEmpty: Int {
        return orderedView.isEmpty
    }

    var lazy: LazyUnorderedCollection<Self> {
        return LazyUnorderedCollectionWrapper(orderedView)
    }

    func contains() -> Element? {
        return orderedView.contains()
    }

    func min() -> Element? {
        return orderedView.min()
    }

    func max() -> Element? {
        return orderedView.max()
    }
}

We just can't get iteration to work out of the box without introducing functions on Sequence, which don't make sense for unordered collections (like reduce). Of course, if Sequence is typealiased to Collection as the other thread suggests, then we're stuck: we can't make any progress without a new iteration protocol.

2 Likes

Perhaps I missed something, but where did you read that the proposal was for a new iteration protocol to be introduced? As I understood it, Collection conformance would just be moved to a view-providing property (called .members in the original post). Then all iteration and API built on iteration would be done via this view instead of the original. So the migration plan is just to either deprecate or error on direct use of Collection/Sequence API on Set and Dictionary and move all these uses over to the .members view. There's no further planned step where .members is deprecated and removed, and no addition to the protocol hierarchy of dubious benefit.

It seems like iteration needs to be rethought to support the future Ownership language features:

But once those are resolved, it would make sense that a new iteration protocol would be introduced, seing how Sequence is probably disappearing to be type-aliased to Collection.

I find the ergonomics of having to iterate using that member fairly bad, and that's why once that new iteration protocol is introduced, it would make sense to me for Dictionary and Set to conform to it.

PS: If anybody is interested in discussing names, I like .items much better :slight_smile:

2 Likes

That is indeed what I meant. Should have expressed myself better!

I guess I don't understand how a new iteration protocol based around future ownership features wouldn't expose all of the exact same issues that this thread is trying to address, so I don't understand why Set and Dictionary would be made to conform directly to it, rather than implementing it on the view proposed in this thread. What do you imagine would be different?

I'm not sure how useful discussing the name is at this point, but .items seems pretty unprecedented. .elements would probably make more sense given the extensive use of this term in the documentation and the Element associated type.

It would be different if that protocol's sole purpose is iteration and we don't define functions like first(where:), prefix(), etc... on it.

+1

1 Like

That implies an opinion that I've read a few times from different people, that iteration is somehow innocuous while generic methods defined in terms of iteration are not, and I can't say I understand it. Why do the exact same concerns not apply to the code you write in the loop body of your for-in loop? Why shouldn't I use prefix(_:), etc on my Set (or its Collection conforming view proposed in this thread) to e.g. split it into chunks that can be processed efficiently? There must be a reason why multiple people are convinced that for-in loops and iteration are fine but methods that you can easily write using them are dangerous, but I haven't been able to work it out.

From the other thread:

This would imply for me that you can't easily define first, last, indexing. etc for a random collection. Is this a misinterpretation or your true opinion? (there's no merit in writing a long statement if the fixed order isn't actually important for your opinion ;-)

I can't say I exactly understand the question or how it's related to this thread. What are you calling a “random collection”? Do you mean a single-pass random number generator?

No, I meant whatever satisfies your definition of a collection whose order isn't fixed.
The quote gave me the impression that this actually is a case where you agree that the methods in question are bad -- and that could be a big help for mutual understanding:

Imho your exact definition of a fixed order, and the difference compared to an order that isn't fixed is quite interesting for the debate, as it seem to mark the point where opinions diverge.

I'm still confused. What is a collection with an order that isn't fixed? Does fixed mean (handwaving some fine details) that you get the same sequence of elements every time you iterate through it without mutating it? Or does it mean that the order is either controllable (e.g. Array) or specified (e.g. SortedCollection)? As I understand it, the semantics of Collection require a type to adhere to the “same sequence in same order” definition.

I think I've been clear in the other thread that I agree it makes sense (source compatibility, etc aside) to relegate Sequence to an easy way to define a Collection, which takes care of the issues with single pass Sequences. And in this thread I've said that @dabrahams is correct about the issue with Equatable and salient/non-salient aspects (e.g. a == b but a.first != b.first), and I agree that the neatest way to solve this would be to move Collection conformance to a view on Set and Dictionary that was declared non-salient. Then Equatable on the view itself can be defined as elementsEqual which solves several issues. This also leads me to conclude that being able to directly iterate through elements of a Set has the same issues as being able to call first, prefix, whatever on it, so this ability is also non-salient and should also be confined to the same view and not implemented directly on `Set. Does that make sense?

1 Like

Perhaps we should expand our perspective by considering collections that aren’t currently in the standard library: trees. You can define multiple sensible iteration orders (i.e. linearizations into Sequence) for various tree traversals: pre-order, in-order, post-order and so on... These should be all exposed as IteratorProtocol/Sequence properties of the tree, because there is no single canonical Sequence conformance to be defined. By analogy, the accidental ordering of Set and Dictionary elements should also be exposed via .elements property. Expanding the naming pattern to trees would yield: .elementsPreOrder, etc. For me that definitely answers the question from the other thread: Set is not a Sequence. Nor is Dictionary or Tree... They can all provide sequential views, they have a Sequence of elements, but are not Sequences themselves.

(By this logic, SortedSet (TreeSet) is a Sequence because of the clear semantics of explicitly defined canonical order. Personally, exposing this fact via .elements property wouldn’t bother me, because I consider the SetAlgebra to be primary conformance.)

Regarding why accidentally leaking the internal ordering structure of Set/Dictionary through for..in iteration is considered to be less of an issue than providing a collage of methods polluting the type’s namespace: one is low level tool which allows you to do anything, the other should be focused palette of high-level operations for frequent use. In other words: nothing will ever prevent you from building bugs with hand crafted iteration, but the API surface shouldn’t be unnecessarily polluted with tangential operations, especially when they are tempting misuse. This is where composition clearly wins over inheritance.

6 Likes

That only makes the distinction even more confusing to me. A “low level tool which allows you to do anything” seems more prone to be misused not less, as you note. Why not confine iteration solely to the view along with all the higher-level operations that are defined using iteration? What do you gain by leaving it directly on a Set?

In other words, I understand either “leave everything on Set directly and let developers consider whether they’re using things in an order-dependent way” (source compatible, concise) or “move everything order-dependent into a view” (harder to misuse, better compatibility with Equatable semantics), but I don’t understand “move some of the order-dependent things to a view and leave some of them directly on Set”. That seems like a confusing model and I don’t understand the benefit of or principle behind it.

Sorry if I wasn’t clear enough. My point was precisely that Set isn’t a Sequence, which implies you can not for..in loop over it. Only explicitly through the .elements property.

I was trying to address this concern of yours:

I’m of the opinion, that we cannot prevent abuse of manual iteration, but that should not stop us from cleaning-up the API directly available on Set/Dictionary by tucking away the Sequence conformance on the .elements property: fully preserving all existing functionality, while grouping it semantically using composition.

Oh, okay. To clarify, in that post I was addressing several people in the thread who were suggesting that Collection conformance be moved to .elements but that a (possible future) iteration protocol that would allow use in for-in loops would be implemented directly on Set and Dictionary. So if you're happy to write for element in set.elements forever then we agree here.

1 Like

I’m fully content writing dictionary.elements, for the same reason I accept dictionary.keys and dictionary.values :stuck_out_tongue_winking_eye:

2 Likes