On the elementsEqual problem, or “[Pitch] Set and Dictionary should not be Sequences”

From the other thread:

This would imply for me that you can't easily define first, last, indexing. etc for a random collection. Is this a misinterpretation or your true opinion? (there's no merit in writing a long statement if the fixed order isn't actually important for your opinion ;-)

I can't say I exactly understand the question or how it's related to this thread. What are you calling a “random collection”? Do you mean a single-pass random number generator?

No, I meant whatever satisfies your definition of a collection whose order isn't fixed.
The quote gave me the impression that this actually is a case where you agree that the methods in question are bad -- and that could be a big help for mutual understanding:

Imho your exact definition of a fixed order, and the difference compared to an order that isn't fixed is quite interesting for the debate, as it seem to mark the point where opinions diverge.

I'm still confused. What is a collection with an order that isn't fixed? Does fixed mean (handwaving some fine details) that you get the same sequence of elements every time you iterate through it without mutating it? Or does it mean that the order is either controllable (e.g. Array) or specified (e.g. SortedCollection)? As I understand it, the semantics of Collection require a type to adhere to the “same sequence in same order” definition.

I think I've been clear in the other thread that I agree it makes sense (source compatibility, etc aside) to relegate Sequence to an easy way to define a Collection, which takes care of the issues with single pass Sequences. And in this thread I've said that @dabrahams is correct about the issue with Equatable and salient/non-salient aspects (e.g. a == b but a.first != b.first), and I agree that the neatest way to solve this would be to move Collection conformance to a view on Set and Dictionary that was declared non-salient. Then Equatable on the view itself can be defined as elementsEqual which solves several issues. This also leads me to conclude that being able to directly iterate through elements of a Set has the same issues as being able to call first, prefix, whatever on it, so this ability is also non-salient and should also be confined to the same view and not implemented directly on `Set. Does that make sense?

1 Like

Perhaps we should expand our perspective by considering collections that aren’t currently in the standard library: trees. You can define multiple sensible iteration orders (i.e. linearizations into Sequence) for various tree traversals: pre-order, in-order, post-order and so on... These should be all exposed as IteratorProtocol/Sequence properties of the tree, because there is no single canonical Sequence conformance to be defined. By analogy, the accidental ordering of Set and Dictionary elements should also be exposed via .elements property. Expanding the naming pattern to trees would yield: .elementsPreOrder, etc. For me that definitely answers the question from the other thread: Set is not a Sequence. Nor is Dictionary or Tree... They can all provide sequential views, they have a Sequence of elements, but are not Sequences themselves.

(By this logic, SortedSet (TreeSet) is a Sequence because of the clear semantics of explicitly defined canonical order. Personally, exposing this fact via .elements property wouldn’t bother me, because I consider the SetAlgebra to be primary conformance.)

Regarding why accidentally leaking the internal ordering structure of Set/Dictionary through for..in iteration is considered to be less of an issue than providing a collage of methods polluting the type’s namespace: one is low level tool which allows you to do anything, the other should be focused palette of high-level operations for frequent use. In other words: nothing will ever prevent you from building bugs with hand crafted iteration, but the API surface shouldn’t be unnecessarily polluted with tangential operations, especially when they are tempting misuse. This is where composition clearly wins over inheritance.

6 Likes

That only makes the distinction even more confusing to me. A “low level tool which allows you to do anything” seems more prone to be misused not less, as you note. Why not confine iteration solely to the view along with all the higher-level operations that are defined using iteration? What do you gain by leaving it directly on a Set?

In other words, I understand either “leave everything on Set directly and let developers consider whether they’re using things in an order-dependent way” (source compatible, concise) or “move everything order-dependent into a view” (harder to misuse, better compatibility with Equatable semantics), but I don’t understand “move some of the order-dependent things to a view and leave some of them directly on Set”. That seems like a confusing model and I don’t understand the benefit of or principle behind it.

Sorry if I wasn’t clear enough. My point was precisely that Set isn’t a Sequence, which implies you can not for..in loop over it. Only explicitly through the .elements property.

I was trying to address this concern of yours:

I’m of the opinion, that we cannot prevent abuse of manual iteration, but that should not stop us from cleaning-up the API directly available on Set/Dictionary by tucking away the Sequence conformance on the .elements property: fully preserving all existing functionality, while grouping it semantically using composition.

Oh, okay. To clarify, in that post I was addressing several people in the thread who were suggesting that Collection conformance be moved to .elements but that a (possible future) iteration protocol that would allow use in for-in loops would be implemented directly on Set and Dictionary. So if you're happy to write for element in set.elements forever then we agree here.

1 Like

I’m fully content writing dictionary.elements, for the same reason I accept dictionary.keys and dictionary.values :stuck_out_tongue_winking_eye:

2 Likes

I don't think the String precedent is relevant here — and even if it is: Just because a concept was discarded for one specific case, that concept still can be valid for other situations.
I don't know the motivation for the original design (and was irritated when I had to use ".characters" as many others), but it's probably once again the question wether a string is made up of bytes or some more complicated unicode entity.
However, there is and was little doubt that a string is a collection (and a sequence), and that all of the order-dependent operations make sense for strings: You just could run into situations where you either have to deal with "incomplete" characters or buffer sizes which are to small because count handles some combinations of bytes as single elements.

So whereas for String, we just had to agree what type Element actually is, Set does not have such kind of ambiguity, and it is the type itself which causes contradiction.

Afaics, there is little resistance to remove the order-dependend methods from Set and Dictionary (nearly no one uses them -- and when they are used, it's probably an error), but it's not as clear if iteration should stay possible without indirection.
Is that true, or is there anyone who absolutely wants to keep Dictionary.starts(with:) and similar methods?

It depends how you think about it, I guess. e.g. some (many?) reasonable generic algorithms for Collection can be severely broken when used with certain Strings now, while the conformance of Set is semantically fine for such methods, though perhaps the results will not be useful.

When you say “remove” do you actually mean “move to a Collection conforming view property”, which is the proposal being discussed in this thread, or do you mean removed? I don't think any of these methods should be removed, but I would be happy to see all iteration and iteration-derived API moved to a view.

Not that. I explained it all here I think. Was it unclear?

Not at all, but you could have a problem with some hypothetical RangeReplaceableCollection algorithms. Nobody’s come up with a realistic example, though.

Moved to a view, as proposed.

I might have been unclear -- with design, I was only referring to String.character. The quirks you mentioned are because of UTF, aren't they? (I can't see any serious problems with concatenation of an array of bytes... except maybe a C style string, where the null that signals its end is part of the collection).

I guess I was unclear. The quirks are not because of encodings (what I presume you mean by UTF), but because of the way some characters combine when you put them next to one another.

assert(("e" + "\u{301}").count == 2) // fails

Looks like we actually meant the same phenomenon: afaik, for ASCII and many other "old" encoding schemes, the number of characters always matches the number of bytes (but I wouldn't be surprised if there are exceptions -- problems with encodings have a long tradition ;-)

print("\r\n".count) // 1
4 Likes

I have also been wondering about this approach. This protocol could also support indices that can be discovered using index(where:) and index(of:) as well as sorted and maybe even reduce where combine is commutative. If Swift gets support for HKT in the future it could also support map, compactMap and filter.

I don't think this should be called UnorderedCollection though as I think Collection (or MaterializableCollection) should refine it if it gets added. If we were starting from scratch I might consider calling this Collection but I don't think that's a viable today. The name Container was mentioned in one of the recent threads related to this topic. That's the best available option I've been able to think of.

1 Like

We could call it Bag.

I don't think that would be a good choice as multisets are sometimes called bags: Set (abstract data type) - Wikipedia.

3 Likes

See also: Apple Developer Documentation

1 Like