Is a set a sequence?

Karl · April 7, 2018, 12:56pm

It’s a little unclear, but if you are suggesting some equivalent of “allObjects” which returns a Sequence (not necessarily an array), I’d +1 that.

A set’s contents can be viewed as a sequence, but that’s different from the set itself being a sequence.

Tino · April 7, 2018, 1:37pm

So we can do stupid things with Set.reversed ;-).
Most programming languages and especially Swift try to protect us from our own stupidity, and as elementsEqual demonstrated, that didn't work that well in this case.

To not only blame Set all the time I have another example:
Someone wrote a program for a sports and included a small new routine to print out the winners.

struct Marathon {
	var succesfulContestants: Array<(String, TimeInterval)>
 }
    // real Marathon type is bigger and defined elsewhere

func printPodium(of marathon: Marathon) {
	marathon.succesfulContestants.prefix(3).forEach {
		print("Contestant: \($0.0) (\($0.1 / 3600))")
	}
}

Thanks to autocompletion, it's very easy to write this down, and maybe there's even a unit test that said everything is ok.
Now, some time later, someone else has to change the software, and he has to add a feature so that he can easily look up the time for each runner.
So, he makes a little change:

var succesfulContestants: Dictionary<String, TimeInterval>

The compiler doesn't complain, the unit tests are green, but the developer silently broke an important functionality.

So, the question shouldn't be "is it harmful that Dictionary conforms to Sequence", but rather "is there any downside when I can only iterate through Dictionary's contents, and not call methods that depend on a meaningful order?"

Tino · April 7, 2018, 6:06pm

I guess I didn't explain enough what I'm thinking of:
Just because of POP, it would be rather stupid to dump a Set into an array.

extension Set: Sequence {}

would be everything that's required for tasks like asking a set for a prefix (I'm still looking for an explanation why somebody should want to do so).
Imho "we do because we can" is a poor justification, and removing those methods isn't patronizing - it just makes Sequence a more powerful abstraction.

A single "real use" would be a big step forward for the whole discussion ;-)

saagarjha · April 7, 2018, 10:12pm

Multi-pass iteration is a requirement that Collection imposes, and by conforming, Set must provide this guarantee as well.

Then this "Set" cannot conform to Collection, unless it clearly documents the departure (personally, I'd say that even documenting this doesn't make it right, but I digress)

As I mentioned, this guarantee only holds if you do not mutate the Set. Once you do that, all bets are off.

gwendal.roue · April 8, 2018, 6:11am

The answer to this good question:

(see the rest of that topic for more interesting guarantees)

cherrywoods · April 8, 2018, 6:23am

Ok, I didn‘t knew this requirement. Thanks for pointing it out.

CTMacUser · April 8, 2018, 6:55am

So, ...we would have

A new protocol Iterable, with just Element, Iterator, and makeIterator(). Swift will use this to base for-in loops on (instead of Sequence).
A refined protocol Sequence with the remaining interface of the old Sequence.
A now-directly refined protocol Collection, with the Sequence interface ripped out.
- Index wouldn't necessarily be Comparable. This could be helpful for multi-dimensional containers. (This could possibly include fixed-size arrays if we decide that structural types can follow protocols.)
- Maybe there would be a SubCollection, an analog of the no longer present-in-Collection Subsequence.
- Should Indices only conform to Iterable? Or should they also always conform to Collection too? I feel that Indices should definitely conform to Sequence whenever the owning collection type also does.
The "unordered" containers Set and Dictionary would conform to just Collection.
But general linear containers, like Array, could conform to both Collection and Sequence.
- SubSequence and SubCollection would have to be the same type.
- Index now would have to conform to Comparable.
I think that MutableCollection could refine just Collection....
...But BidirectionalCollection, RandomAccessCollection, and RangeReplaceableCollection would refine Sequence too.

Strangely, as someone who hasn't liked the recent talk about crippling interfaces so "not really sequences" Set and Dictionary would be denied, I like this idea.

saagarjha · April 8, 2018, 7:05am

I'd like for this Sequence to provide a multi-pass guarantee. See my comments on [Pitch] Remove the single-pass requirement on Sequence.

CTMacUser · April 8, 2018, 8:08am

Oh, Index for non-Sequence Collections still has to be at least Equatable.

We should do some name shuffling to retain some backwards compatibility.

protocol Iterable { /*...*/ }
protocol Sequence: Iterable { /*...*/ }
protocol UnorderedCollection: Iterable { /*...*/ }
protocol UnorderedMutableCollection: UnorderedCollection { /*...*/ }
typealias Collection = UnorderedCollection & Sequence where SubSequence == SubCollection, Index: Comparable, Indices: Sequence
typealias MutableCollection = UnorderedMutableCollection & Collection
protocol BidirectionalCollection: Collection { /*...*/ }
protocol RandomAccessCollection: BidirectionalCollection { /*...*/ }
protocol RangeReplaceableCollection: Collection { /*...*/ }

CTMacUser · April 8, 2018, 8:58am

And through that link, it seems that someone already came up with a similar idea. That link has at least one other attempted pass.

jawbroken · April 8, 2018, 1:48pm

If we're going to be bold about it, then let's just solve this once and for all by removing all order dependent operations from Set and Dictionary, including Collection conformance, use in for-in loops, etc. I still don't see why these are any less problematic than the other operations you've identified. This will leave only the ability to ask if they contain a given element, and remove all abilities to access all the contained elements as such access would inevitably reveal an order. These would then be truly unordered collections.

Unfortunately, that makes them essentially useless to anyone, because many algorithms are now either unimplementable or too computationally expensive to use. So instead I just accept that a Swift Set is also a Swift Sequence. This is inherent, and required to make them useful, not accidental, even if I can't think of a great use for a few Sequence methods. I also can't think of a great use for calling map on an infinite Sequence, which is probably more harmful, and if I dig around the whole Swift protocol hierarchy I'm sure I can find a lot of similar examples that, nonetheless, probably don't justify a more complex protocol hierarchy.

Now, the question remains about how exactly the fundamental Sequence nature should be exposed here. As I said in the other thread, perhaps Collection and Sequence conformance should have been provided as a view on Dictionary and Set instead of conforming them directly, but it's not clear to me that complexity would be worthwhile if we were starting from scratch, and I'm certainly not convinced it's worth breaking source compatibility now. I also definitely don't understand any solution here that retains a random assortment of functionality that reveals the ordering (e.g. for-in loops, Collection functionality), while removing or quarantining another random assortment (e.g. reversed, prefix). What would justify making that arbitrary distinction?

Tino · April 8, 2018, 6:27pm

I'm not in the position to complain about satire - but I'd prefer it to be funny ;-)
The confusion with elementsEqual is a real world problem, and there are several other examples that illustrate that the status quo is far from perfect.
On the other hand, I'm still waiting for a single example that shows that Set.elementsEqual can be useful, or that removing this method would cause any harm.

The for loop is a vehicle to process the contents of a collection, and you can use it for tasks that depend on a certain order, as well as to perform operations where you just need to do something with each element.
There is even the possibility to use loop-syntax to perform those operations in parallel. (just emphasized this, because I think that really hasn't come up yet - and this is a case where it makes a big difference if things don't have to be computed sequentially...)

The methods in question however just make no sense (or at least nobody could present it yet) for Set and Dictionary.
We can't prevent users from shooting into their own feet, but we don't have to equip them with a footgun.

jawbroken · April 8, 2018, 11:46pm

Kind of rude, but it's not satirical, it's just an outline of the thought process that led to my conclusions. elementsEqual is indeed a problem, mostly because it takes two Sequences and was an attractive nuisance before things were made Equatable, but that is already being discussed in its own thread so I presume this thread is about the remaining APIs that expose order. The other handful of methods are perhaps not particularly useful in a lot of situations, but I am not convinced they are particularly harmful.

Sure, but equally something like prefix could be used to get the first n elements of a collection that I know is in a given order, as in your marathon example, or to get an arbitrary n elements, perhaps because you have some efficient method of processing data in fixed sized chunks.

Your marathon example could have been reasonably written as

func printPodium(of marathon: Marathon) {
  var place = 1

  for contestant in marathon.succesfulContestants {
    guard place <= 3 else { break }
    print("Contestant: \(contestant.0) (\(contestant.1 / 3600))")
    place += 1
  }
}

which exhibits exactly the same issue as your prefix code. So why is prefix a problem but not for-in? You seem to have a strong opinion about which operations that expose an order are harmful and which ones aren't, but I don't understand the basis for those opinions.

Tino · April 9, 2018, 6:51am

The problematic methods are those which are there to reason about the order in Sequence, and they are problematic because order isn't meaningful for Set.

It is very straightforward to write extension Sequence where Element: Equatable, and that extension is perfect for most true sequences, and even when there are better specific implementations, it will always return the right answer.

Equality for Set and Dictionary needs a special implementation, because despite the common definition of sequence, order doesn't matter for those.
This is just a tiny example, but the general problem is that we can't write generic algorithms that work on sequences, because those can't exclude Set and Dictionary

This is definitely bad, but every language is full of tradeoffs, and there are situations where you have to choose the lesser evil.
But I wonder when this is such a situation, why can nobody tell us what's the "bigger evil" of changing the protocol hierarchy is?
I've read many posts about this issue, and the "best" argument for keeping the status quo that I've heard is "it doesn't bug me, and change means work" (and that's the path to mediocrity).

gwendal.roue · April 9, 2018, 7:14am

Somebody did:

What's the intended behavior of this program that uses a Set as the Sequence it currently is? - #6 by Ben_Cohen
What's the intended behavior of this program that uses a Set as the Sequence it currently is? - #8 by Ben_Cohen

Tino · April 9, 2018, 7:34am

As you see, even someone from Core can't give a single example... without details, this is just an opinion as well.
Even if it's the opinion of someone who probably has more insight about Swift than we all together, it's no real explanation that shows the benefits of the current design.
The downsides, on the other hand, can be demonstrated easily.

CTMacUser · April 9, 2018, 8:14am

I guess we could have had the relationship between Collection and Sequence be like Sequence and IteratorProtocol. Collection would return a sequence-accessor from a makeSequence() method. The core language would have to be slightly changed to recognize both Sequence instances to use makeIterator() and Collection instances to use makeSequence().makeIterator(). Sequence and UnorderedCollection would be completely distinct protocols without needing to create a common Iterable base.

Just like a sequence type can conform to both Sequence and IteratorProtocol and have instances be their own iterators, we could have the ordered collections conform to both Sequence and UnorderedCollection and let collections be their own sequences. We would need to enforce having SubSequence and SubCollection being equal and Index being Comparable (instead of just being Equatable).

(Hmm, a collection that conforms to both UnorderedCollection and Sequence couldn't really also triple-conform to IteratorProtocol, because that implies being mutli-pass and suicidally single-pass at the same time. But is there any way to prevent that? Technically, we could do that now, but what does that mean in practice?)

jawbroken · April 9, 2018, 10:26am

for-in, Collection conformance, etc also expose an order, and I showed that your example of why prefix could be harmful applies equally to for-in. I keep asking about how you are separating harmful API from benign here, but I haven't worked it out yet. I would understand if the argument was that all such order-exposing API be quarantined on a Collection and Sequence conforming view, and I'm not unsympathetic to that view, but I'm struggling to understand why some seemingly random assortment need to be removed. And which assortment should it be? I've given an example of how prefix can be perfectly useful on a Set, for example.

Tino · April 9, 2018, 10:43am

There is API that has no valid use case, and can only be used in a harmful way - and if it's possible to get rid of such functionality that does nothing but cause confusion, that's imho a strong motivation to remove it.

Note that I expose my position quite a lot with the first statement: A single example that shows the merit of Set.elementsEqual or Dictionary.starts(with:) would force me to back down at least a bit... but that didn't happen yet.

for-in has useful applications for Set, so I don't think it makes any sense to forbid it:
If someone iterates over a Set he probably knows what he is doing.
But Sequence does not only say "this can be looped over" - it also builds abstractions on top of this requirement, which don't work for all (current) implementers.

Would you say it's a good idea to "enrich" an array of bytes with string-methods, instead of having a dedicated String type?

jawbroken · April 9, 2018, 11:28am

Everyone has already agreed that elementsEqual is, at the very least, poorly named in a way that makes it harmful, sure. As you know, it's the subject of a proposal that is currently under review and I hope it is addressed in some way. I'm having trouble understanding how starts(with:) is confusing or harmful though, because the name makes it clear to me that it must apply to the iteration/Sequence order. Usefulness is another question, but I haven't thought about it much. So your criteria for creating a separate protocol hierarchy is “can I find some conforming type where I can't think of a use for one or more methods”?

Strings are a great example for you to raise here, because they were recently made to conform to Collection for pragmatic reasons despite violating some of the semantics/expectations of that protocol. Collection conformance used to be available on a .characters view instead, but this wasn't considered ergonomic. I suppose an alternative would have been to create an NotQuiteACollection protocol that somehow addressed the strange aspects of String as a Collection (appending two Strings may not result in a collection with a length that is the sum of the two, etc). Generic algorithms written for Collections can fail unexpectedly on certain input Strings. There are some good parallels here, I think.