Is a set a sequence?

Tino · April 9, 2018, 12:16pm

It is harmful because in many cases, you don't actually see or even know that you are calling the problematic methods of Set:
Swift can infer types in many situations, so when you work with the result of a method call, you may expect an array, and call methods that make a lot of sense for Array on it - without even realizing that it's actually a Set or Dictionary!

It's even worse when you try to write algorithms for sequences:

extension Sequence where Element: Equatable {
	static func ==(lhs: Self, rhs: Self) -> Bool {
		var left = lhs.makeIterator()
		var right = rhs.makeIterator()
		while let l = left.next(), let r = right.next() {
			if l != r {
				return false
			}
		}
		return left.next() == right.next()
	}
}

This (or "this" without the bugs I may have included, and some other improvements ;-) is a perfect implementation of an equality check for every true sequence, and it gives a right result even when there are specific solutions that are faster.
It just isn't compatible with types like Set which always needs special treatment.
Of course, I don't have to write Set.== -- but we face the same problem of special treatment whenever we build on top of Sequence, just because it got thrown into the same bucket as Set.

anthonylatsis · April 9, 2018, 1:36pm

Unlike with your position that a Swift Set isn't a Swift Sequence, I completely agree here. And I'm not trying to say Set has to cease being a Sequence – rather, the problem of lacking granularity of the current hierarchy, which will aggravate as Swift evolves, is something undoubtedly if not important, then at least worth attention and to be addressed sooner or later. And definitely not a topic that must be banned in the community by updating the commonly rejected proposals list.

Tino · April 9, 2018, 1:45pm

I hope I never said so, as it's definitely not my position: Set: Sequence is a fact. I try hard never to deny facts (but I sometimes question their reasonableness ;-).
I wouldn't even mind an update of the commonly rejected proposals list... as long as this includes a rationale that really illustrates the reason for rejection.

anthonylatsis · April 9, 2018, 1:58pm

You indeed haven't said so directly, at least to me. Is "Swift's Set has to drop conformance to Sequence" the right way to put it?

Apparently that's a bit of an ordeal right now.

Tino · April 9, 2018, 2:12pm

I'm rarely truly serious when I'm nitpicking -- that would be to much stress ;-)
But if I was asked for a summary... "I asked for a justification of the status quo, with a somewhat muddy vision that a slimmed down Iteratable would be superior to Sequence"

I had to look up "ordeal" -- and depending on which translation I pick, that might apply to all software development ;-)

jawbroken · April 10, 2018, 1:22am

If you expect some random Array from a method call that you seemingly have no idea about the semantics of (e.g. is it ordered in any reasonable way?) then I don't see why these order-dependent methods are any more helpful for that essentially unordered Array than they are for a Set.

Your definition of == here is a perfectly sensible operation for a generic context where you only know you have two Sequences. It is, however, poorly named (just like elementsEqual) because == is tied so closely to the Equatable protocol that it is likely to be confusing. It isn't just incompatible with our understanding of equality for “unordered” collections like Set and Dictionary, it's also incompatible for any type that conforms to Sequence, has a clear notion of order, but also has a notion of equality that isn't solely related to the iterated Sequence (e.g. it has other pertinent properties that also need to be compared). Perhaps you could separate all such types by introducing a EqualInSequenceOrderImpliesEqualInstancesSequence protocol…

Tino · April 10, 2018, 7:40am

It's like the case of the for loop:
If you can do something useful with it, imho it's fine to have it. I just see little value in things that can only be used for mischief.
Asking an array if it starts with a prefix may not always make sense -- but doing so with a Dictionary most likely never makes sense.

I say both method names are just fine, and that instead the requirements for Sequence are poorly defined (so far, every definition of sequence has a big emphasize on order).
It's straightforward to make arrays, vectors, lists and other variants conform to Equatable with a single default implementation (if their elements are equatable as well), and there are many other algorithms that can be implemented in the exactly same way for all those types.

I dislike when people claim that you should always favor composition over inheritance -- but when you have a sequence with additional properties, I'd probably say that this type shouldn't be a Sequence, but rather a separate type with a property that is a Sequence.

MutatingFunk · April 10, 2018, 9:30pm

(Replying to the original post, I've not been following the discussion very closely)

Does the suggestion (aside from adding Iterable), involve making Sequence require an ordering? Because as far as I can tell, none of the existing collection protocols have this requirement, making this a breaking change. Has adding a new protocol been considered?

It seems all the order-dependant functions on Sequence and inheriting protocols are there simply because no existing protocols offer insight into whether the conforming type maintains an order. Today's Sequence would be better described as Iterable, as the original post implies, and Collection is just a ‘bag of things’, with only RangeReplaceableCollection (one of the five Collection protocols) having ordering requirements via append.

The proper course of action, then, would be to introduce OrderedSequence as an ordered form of Sequence. Existing order-dependent extensions on Sequence would be moved to OrderedSequence, with the originals deprecated, and order-dependant extensions on Collection protocols would have an added where Self: OrderedSequence requirement. RangeReplaceableCollection would be inherit from OrderedSequence, and because OrderedSequence has no additional requirements, this wouldn't be a breaking change.

This way, all existing RangeReplaceableCollection constraints and types get OrderedSequence for free. Deprecation warnings would only appear when a user is using order-dependent functions on a type which doesn't provide an ordering guarantee, which seems almost beneficial.

Whether it's worth adding this level of detail is up for debate, but from what I can tell, the migration path is pretty straightforward. I regret not having the time to forward a proposal right now.

Tino · April 11, 2018, 6:23am

Yes, that is imho one big problem of the current system.
As I understand protocols, they are meant to be an abstraction -- and Sequence fails disastrous in this aspect:
Only a handful of its many methods can be used without risk, and there is no way to write algorithms that operate on ordered collections, so that they can be used with Array as well as with a list...

I don't think this would be sufficient, because of the many methods of Sequence that rely on order -- and it would be kind of a tautology (they often can even be used synonymous)

So far, nobody brought up any fundamental issues with that approach (except that it's work to do ;-) -- and there has been lots of time to come up reasons why it should not be feasible to change some protocols.
What I have seen in other threads is claims that the option already has been discussed and discarded... but I'd prefer Evolution to be evidence-based rather than claim-based.

gwendal.roue · April 11, 2018, 7:02am

I did the research work for you, since I know exactly what I was looking for.

You may not know it, but this thread is far from being the first discussion on the Swift Evolution mailing list/forums about sequences. There used to be many more. On many many aspects.

This quote from @xwu in this former version of the elementsEqual renaming proposal sums them up quite well:

A third solution is to dramatically overhaul the protocol hierarchy for Swift sequences and collections so that unordered collections no longer have members such as first and elementsEqual(_:). However, this would be a colossal and source-breaking undertaking, and it is unlikely to be satisfactory in addressing all the axes of differences among sequence and collection types:

Finite versus infinite

Single-pass versus multi-pass

Ordered versus unordered

Lazy versus eager

Forward/bidirectional/random-access

I know you're very focused on a specific problem. But please let this paragraph sink into you. Do make this effort. You may then understand this comment.

Tino · April 11, 2018, 7:27am

Then please, by all means share it! ;-)

I can't look into other peoples head, so if someone has a compelling argument that the protocols have to be as they are now, he has to write it out and post a link to it -- everything else is only hearsay.

Claiming that something is a colossal and source breaking undertaking doesn't convince me, especially as my prediction that renaming elementsEqual may break more code hasn't been challenged yet.

The other argument is even less compelling:
"Because there are many problems, we don't start solving them"
Is this a attitude that can bring Swift forwards?

gwendal.roue · April 11, 2018, 8:19am

The more I reply in this thread, the more I realize that I'm not really replying to you, but rather gather pieces of information that may help future readers of this thread understand the context and history in which a question like "Is a set a sequence?" has to be considered.

I'm happy I can do this for sets and sequences - I'm not sure I could do it for more involved abstractions like RangeReplaceableCollection. Let's keep humble: we're in an elementary POP classroom, here.

I'm sorry that you are not convinced by anything. But what matters to me is that the way you frame the topic is enlarged, so that people have the opportunity to build nuanced opinions, have a nice journey in the stdlib protocol hierarchy, and spend their energy wisely when they build educated and pragmatic enhancement proposals to the language and its stdlib.

Tino · April 11, 2018, 11:05am

So far, barely anyone did, and that's imho the main problem with this long debate that spans several threads...
That is especially pity as there are some questions that should be rather simple to answer, and that could make me quit discussion instantly:

This all started when I asked why an obvious solution to the problem of Set.elementsEqual isn't feasible, and when I did so, I honestly expected a link where someone demonstrates that this would have terrible consequences, like spawning a myriad of new protocols (still waiting for that response).

I also asked for use cases of methods like Set.elementsEqual that justify their existence (until now, not a single example was given, nor it was acknowledged that this method is indeed useless).

The fact that something as been discussed doesn't mean that it was resolved. I found similar discussions, and they all refer to a mythical thread where everything was explained -- just like this one.

Saying that there is a good explanation instead of simply showing it seems not very scientific to me.
If my standpoint is wrong, it is very easy to disprove it -- but it is impossible to prove the nonexistence of the illuminating thread that people use to back their justification of the status quo.

I guess this thread already got to tiring for many people, and I wouldn't be surprised if exactly that happened in the threads that happened before:
Those who wanted to talk about the issue have been bugged with sidetracking until they gave up silently.
Unfortunately, such an outcome is unavoidable for some discussions -- basically everything that boils down to a question of personal taste.
But this case is something where consensus is still absolutely possible: All that's required for this is fixing the debate to be productive.

jawbroken · April 11, 2018, 11:53am

“Only a handful” seems like a gross exaggeration, though perhaps you have large hands. If we return to your proposed criteria, being whether someone can think of a reasonable use for an API, then here are the parts of Sequence that I had no trouble thinking of a use for during a quick scan of the documentation:

// Creating an Iterator
func makeIterator() // and use in for-in loops

// Finding Elements
func contains(Self.Element)
func contains(where: (Self.Element) -> Bool)
func first(where: (Self.Element) -> Bool)
func min()
func min(by: (Self.Element, Self.Element) -> Bool)
func max()
func max(by: (Self.Element, Self.Element) -> Bool)

// Selecting Elements
func filter((Self.Element) -> Bool)
func prefix(Int)
func suffix(Int)

// Excluding Elements
func dropFirst()
func dropFirst(Int)
func dropLast()
func dropLast(Int)

// Transforming a Sequence
func map<T>((Self.Element) -> T)
func compactMap<ElementOfResult>((Self.Element) -> ElementOfResult?)
func flatMap<SegmentOfResult>((Self.Element) -> SegmentOfResult)
func reduce<Result>(Result, (Result, Self.Element) -> Result)
func reduce<Result>(into: Result, (inout Result, Self.Element) -> ())
var lazy: LazySequence<Self>

// Iterating Over a Sequence's Elements
func forEach((Self.Element) -> Void)
var underestimatedCount: Int

// Sorting Elements
func sorted()
func sorted(by: (Self.Element, Self.Element) -> Bool)

// Splitting and Joining Elements
func joined()
func joined(separator: String)
func joined<Separator>(separator: Separator)

and here are the ones I couldn't immediately think of a use for (though perhaps someone else could):

func prefix(while: (Self.Element) -> Bool)
func drop(while: (Self.Element) -> Bool)
func enumerated()
func reversed()
func split(separator: Self.Element, maxSplits: Int, omittingEmptySubsequences: Bool)
func split(maxSplits: Int, omittingEmptySubsequences: Bool, whereSeparator: (Self.Element) -> Bool)
func elementsEqual<OtherSequence>(OtherSequence)
func elementsEqual<OtherSequence>(OtherSequence, by: (Self.Element, OtherSequence.Element) -> Bool)
func starts<PossiblePrefix>(with: PossiblePrefix)
func starts<PossiblePrefix>(with: PossiblePrefix, by: (Self.Element, Self.Element) -> Bool)
func lexicographicallyPrecedes<OtherSequence>(OtherSequence)
func lexicographicallyPrecedes<OtherSequence>(OtherSequence, by: (Self.Element, Self.Element) -> Bool)

of which elementsEqual is the only one that seems obviously harmful to me, and the only one that I've seen evidence of harm.

I've posted many reasons why it may not be desirable, useful or worthwhile, and I'm starting to feel like they've been mostly ignored. Primarily, I've said I'm sympathetic to the view that the Sequence and Collection conformances should have been tucked away in a view on Set and Dictionary, but I don't really understand the desire to split the protocol hierarchy, leaving much of the order-dependent functionality (e.g. for-in loops) available on Set and Dictionary, while removing others. If someone thinks of a type with a different useful/useless divide to the one that I've posted above, should Sequence be split into 4 different protocols?

The rich history of String-as-Collection in Swift shows that even the violation of semantics/reasonable expectations may not be sufficient to split the protocol hierarchy. String even had a quarantined Collection of characters view that has parallels to a possible solution here. Pragmatism won out in that case, because it was considered too onerous to write .characters everywhere, even though generic algorithms can now break badly on some Strings. The situation here seems a lot less serious than that to me, since I don't believe Set or Dictionary violate any semantics/reasonable expectations of Sequence or Collection, and I can't think of any generic algorithm that would break badly when given one (though everyone accepts that you may write an algorithm that is not particular useful).

MutatingFunk · April 11, 2018, 12:05pm

jawbroken:

The rich history of String-as-Collection in Swift shows that even the violation of semantics/reasonable expectations may not be sufficient to split the protocol hierarchy. String even had a quarantined Collection of characters view that has parallels to a possible solution here. Pragmatism won out in that case, because it was considered too onerous to write .characters everywhere, even though generic algorithms can now break badly on some Strings. The situation here seems a lot less serious than that to me, since I don’t believe Set or Dictionary violate any semantics/reasonable expectations of Sequence or Collection, and I can’t think of any generic algorithm that would break badly when given one (though everyone accepts that you may write an algorithm that is not particular useful).

I think pragmatism is the key point - we could complicate the protocol heirachy with OrderedSequence, but instead ordered algorithms are currently defined as methods on Sequence. The user isn't going to call order-dependent algorithms on clearly unordered sequences, since the results aren't useful, so there is very little harm as long as order-dependent functions are clearly named.

A breaking change requires a correspondingly code-breaking level of harm to justify it, and I don't think that's the case here.

MutatingFunk · April 11, 2018, 12:14pm

Obviously it isn't very useful, but it is useful for enough for ordered sequences that it's been included it the standard library, even if there isn't a more specific protocol to prevent it turning up in places it doesn't make sense.

Tino · April 11, 2018, 12:37pm

Might be ;-) - but Sequence has risks besides those discussed here:
It can not only be ordered and "accidentally ordered", but also finite and infinite -- and for an infinite sequence, there's not much functionality left that's save to use.

I know what you are talking about ;-) -- but at least we can agree that moving the conformance to a property would solve the issue, but would probably cost to much.

I wouldn't call that ability to be dependent on order -- actually, for the types in question, this is the step that creates an order.

What I've been thinking of so far is just one additional protocol (Iterable), and I think that's not much:
Swift already has protocols for single and double linked lists (which don't even have an implementation in the stdlib), and imho an array has much more in common with a list than with a set.

I think it's reasonable for string to be a Sequence... but CharacterSet is a different beast ;-)

For me, it's a reasonable expectation that something that has a prefix (or is called sequence) also has a meaningful order, and that you can rely on this.
The generic algorithms in question themselves don't break when you feed them a Set -- but their output will be garbage.
Currently, I can define algorithms that require something like a double-linked list -- but there's nothing like Collection & Ordered, which imho is an abstraction that's much more useful.

Tino · April 11, 2018, 12:51pm

We are discussing wether something is useful enough for the stdlib -- under the premise that Swift is perfect, the whole concept of Evolution hardly makes sense ;-)

Jens · April 11, 2018, 1:05pm

I agree, and I think most people that are new to Swift, or new to the specifics of Swift’s Sequence, would expect that something that has prefix, first, last, reverse etc, also has a meaningful order that can be relied upon.

How can this not be a problem?

Erica_Sadun · April 11, 2018, 4:49pm

As others have described in more eloquent terms up-thread, I don't think that there should exist any contract or guarantee that Set iterates in the same order every time, even though it does in practice. It's an implementation detail at odds with Set and Dictionary semantics. Set equality makes sense to me. Set iteration order equality does not.

The changes suggested in-thread are potentially turbulent. What can be done for an admittedly flawed design with the least cost to the user base? (For example, making elementsEqual private, assuming it's required for the internal implementation of other methods that support collections or removing it should it not be.) I doubt any "do it over, do it right" redesign, breaking Sequence from Iterable, will gain much traction.

I applaud the design intent. I am discouraged by the hills to surmount. Have we seen any indication from core team members that major language changes, even those sourced from real world experience and principled design goals, would be welcomed?