Is a set a sequence?

To suggest radical overhaul of the established protocol hierarchy, it's not good enough to just say "this must be a problem". There needs to be clear evidence that it is a problem. We need to see that it's a likely that someone might, say, write a generally useful algorithm on Sequence using a method such as starts(with:), then have a reasonable expectation that algorithm would work with a Set yet get surprising results that could be considered "incorrect" given Set is unordered.

So far the only compelling example that has been found is the one of Set.elementsEqual – hence the proposal to rename it. Other than that single case of possible misnaming, I've seen no such evidence that there is an active problem being caused for Swift users. Pointing to conflicting definitions in other languages or fields is not enough. It might have been enough when the protocols were being designed and named originally, or even shortly after they were widely adopted, but not in Swift's fourth (nearly fifth) year. Even then, the alternative would have to be better: and a more technically correct but also more complex solution would not be. We learned this lesson with String's Collection non-conformance.

Note that the properties first and last are on Collection not Sequence. They are defined as self[startIndex] and self[index(before: endIndex)] respectively (note the latter needs the collection to be bidirectional, which Set is not). There are many useful algorithms that can be built on top of the ability to pull elements off a set in the same order over multiple passes, as you can with any collection.

Extracting these properties out and putting them on a different protocol, as Brent pointed out at the start, implies not just a separation of Sequence and some other protocol – but a creation of an entire parallel set of protocols alongside the existing collection hierarchy. This dramatic increase in complexity would need to be justified – which again leads us back to the need for specific, concrete, compelling examples to justify that complexity. And when I say compelling, they need to be very compelling to justify extensive, source-breaking overhaul this would require (far more than the simple single-method rename proposed by SE-0203).

This is not to say that the protocols are perfect, or not in need of further debate. The single/multi-pass nature of Sequence, and the way that SubSequence works, is still a topic worthy of exploration and potential change (so long as that change can be made in a source-(and soon ABI)-stable way). But revisiting naming based on theoretical concerns, or totally overhauling the hierarchy in a way that would increase complexity with little upside, is not useful and at this point should be considered a settled discussion.

11 Likes

The idea how to actually change the protocol hierarchy by introducing Iterable has not even been really considered by its opponents, so the concept may have flaws that might easily render it unfeasible - but I think in its current incarnation, it's not radical at all:

  • Collection wouldn't conform to Sequence anymore (but to Iterable instead), thus loose some methods
  • Set and Dictionary wouldn't conform to Sequence anymore (but to Iterable)

On the other hand, many things could be left untouched:

  1. There would be absolutely no change for Array (not even a single method rename)
  2. There would be absolutely no change for Sequence itself
  3. There is no need to change any of the collection-variants which aren't adopted by Set or Dictionary -- afaics, this includes BidirectionalCollection, MutableCollection, RangeReplaceableCollection and RandomAccessCollection
  1. This wouldn't change either. Afaics, there have been no serious proposals to remove that ability from Collection.

Considering that Array is our "bread and butter" collection, I don't think there's much Swift code out there in the wild that would break -- except, of course, the use of some Sequence-methods on Set and Dictionary (and as far as I can see, breaking those is most likely a good thing that will reveal broken designs).
Note that proper use of elementsEqual wouldn't be punished at all, so I expect that the idea to rename that method will cause much more churn, without actually revealing cases of misuse.

I can't tell what you consider compelling, but Set.elementsEqual is definitely not a singular issue, but a symptom of a deeper problem.
Because we have only a single bucket for two very different concepts, you can easily change code that depends on "real" order to operate on a Collection that can only yield nonsensical results (see Is a set a sequence? - #22 by Tino).

I recently discussed with a former NeXT-developer who I consider to be a competent Swift-user as well.
When I told him about Set.starts(with:), I had to do so three times:
First, he said it's fine, because when Element: Comparable, the order is obvious.
When I said there's no such requirement, he assumed it to be a bug in the compiler, because the result of the call can be considered to be random.
Then I showed him the output of

print(Set(arrayLiteral: 1, 2, 3, 4).reversed())

... and I had to assure him once again that this is accepted behavior.
He isn't a mathematician, and I've seen more evidence that the existence of Sequence methods for collections that are generally considered to be "unordered" is highly confusing.

Even if there's only one compelling example to illustrate the harm done by conforming Set to Sequence:
There hasn't been a single example that speaks against removing that conformance.

See above -- I have no indication that such duplication is required (Brent didn't respond yet ;-).

4 Likes

What are the proposed semantics for your protocol named Iterable? It would seem to be identical to what is called Sequence today.

I think it's problematic enough if it's likely that someone might write (what they believe to be) a generally useful algorithm on Sequence while forgetting that a Sequence in Swift doesn't have to have a meaningful order.

There is no straight forward way of finding such examples in the wild, but I got one on my first attempt:

I just searched for "accumulated sum swift", and the following implementation turned up (it's the second answer to this Stack Overflow question).

protocol Accumulatable {
    static func +(lhs: Self, rhs: Self) -> Self
}
extension Int : Accumulatable {}

struct AccumulateSequence<T: Sequence>: Sequence, IteratorProtocol
where T.Element: Accumulatable {
    var iterator: T.Iterator
    var accumulatedValue: T.Element?
    
    init(_ sequence: T) {
        self.iterator = sequence.makeIterator()
    }
    
    mutating func next() -> T.Element? {
        if let val = iterator.next() {
            if accumulatedValue == nil {
                accumulatedValue = val
            }
            else { defer { accumulatedValue = accumulatedValue! + val } }
            return accumulatedValue
        }
        return nil
    }
}

Note that neither the author nor the comments mention anything about the fact that this will produce nonsense results for non-sequential Sequence types, ie:

let mySet: Set<Int> = [1, 2, 3]
let myAccSeq = AccumulateSequence(mySet)
for e in myAccSeq { print(e) }

may print 3 4 6 or 2 5 6 or for that matter 1 3 6, and so on.

If I understand you correctly, this is not an example of a problem since we cannot "have a reasonable expectation that [this particular] algorithm would work with a Set".

I do think this is an example of a problem; it shows how people probably forgets that Swift's Sequences can be non-sequential, and that an AccumulatedSequence-type won't make sense for any Sequence, despite its name.

My guess is that there are many more examples like this out there. And at the very least, the documentation has to be improved in order to make sure that everybody understands that a Sequence (in Swift) can be non-sequential, which is not only confusing in and by itself, but it also means that eg otherwise good names like AccumulatedSequence (and sequentiallyEqual) are actually problematic.

But improving the documentation cannot do anything about the following problem:

There are many generic algorithms that, like accumulated sum or elementsEqual, only makes sense for sequential Sequence types, but in Swift, we have no choice but to let them be available for all non-sequential Sequence types too, even though they only add confusing noise to these types. This problem goes both ways of course, algorithms that makes sense only for non-sequential Sequence types pollutes the sequential Sequence types.

It would be very interesting to see some serious discussion about how this situation can be improved. It would be nice if Sequence meant "sequential/ordered sequence of things". The elements of non-sequential/unordered collections/bags of things could still be iterable, but unordered bags shouldn't include properties and methods that only make sense for ordered sequences and vice versa.

3 Likes

This idea is very intriguing. Thanks for pushing this discussion forward, Tino.

2 Likes

1

As you say, all these methods we are talking about don't break or produce harmful code on unordered sequences. Conversely, they run as usual and produce results that can vary across calls.
Just as a subscript behaves on Set. In other words, they have the right to exist and they naturally work on every sequence. They simply aren't useful for us in certain cases.

This is an example of misusing Set. If someone does misuse Set in a similar manner, it is safe to assume the person hasn't bothered reading the doc and doesn't know what he's dealing with. In particular, that Set has the semantics of an unordered collection. Is it the Standard Library's fault for providing a working method that isn't both useful or compulsory, or the consumer's fault for not reading what Set is?


2

The Insufficiently detailed (to get rid of these cases) taxonomy of the Sequence hierarchy requires a solution that implies a large breaking change to the whole taxon.

@xwu If you may,

That said, it is clear this issue is out of priority for at least Swift 5. @Ben_Cohen says it doesn't matter that some methods aren't useful – and he is right as long as their quantity is constant compared to the number of other API's, as it is at present. However, in the near future, considering recent proposals, this can and likely will change – hence the topic of complementing the Sequence clade might become relevant.

Let's discuss the changes needed.
If we add an Iterable protocol, we can't conform Set and Dictionary directly to it, since they are both Collections (no doubt about that). Sequence is now strictly ordered. But Collections can be either ordered or unordered. We will need an OrderedCollection protocol, make it inherit Collection and Sequence, while Collection with it's hierarchy moves on to Iterable. This is neither easy, neither convenient nor worth the gains, at least now. Let's see what happens when we have ~50 useless methods on Set.

Just a quick illustration:

Before

After

Not too much of a mess, yet. But how about further granulation? Infinite/finite, emphasized single-pass/multi-pass – what will this turn into? Not even close to a trivial design question.

4 Likes

When I was thinking about the oddities of Sequence during the switch to the forums,
I realized that an unordered collection, as a protocol, should have no distinguished iteration order, and Collection does define a distinguished iteration order, specifically (for c: Collection) c[c.startIndex], c[c.index(after: c.startIndex] etc. a an unordered collection therefore mustn't have startIndex, endIndex, index(after:) or have Index: Comparable.

From that, the current Collection would need to remain the ordered collection protocol, and most source and semantic compatibility issues are avoided.

The only protocol that should then inherit from UnorderedCollection instead of ordered Collection is MutableCollection, but a similar split could be made for it as well, and MutableCollection could even be a simple typealias for UnorderedMutableCollection & Collection`.

The inheritance chains would be: (no fancy drawings :disappointed:)
MutableCollection: UnorderedMutableCollection & Collection
RandomAccessCollection: BidirectionalCollection
BidirectionalCollection: Collection
RangeReplacableCollection: Collection
Collection: Sequence & UnorderedCollection,
UnorderedMutableCollection: UnorderedCollection
Sequence: Iterable,
UnorderedCollection: Iterable

In can be done both ways. Collection too can preserve order-insensitive functionality, while OrderedCollection introduces order-sensitive refinements. The downside of having UnorderedCollection is the naming semantics (Collection should be more abstract than [...]Collection). The downside of having OrderedCollection – as you said, the need to reconsider Collection.

Considering that Array is our “bread and butter” collection, I don’t think there’s much Swift code out there in the wild that would break

This statement is without basis in fact. There are a significant number of user-implemented sequences and collections. There are a significant number of extensions on Sequence, and extensions on Collection that assume Sequence conformance. This change would break code where an extension on Sequence was called from an extension on Collection.

I recommend rather than making generalizations about how source-breaking something is, that you try implementing your proposal in the Swift standard library, and then assess the impact of it using the source compatibility test suite. And even then, breaking the compat suite is proof of incompatibility, not proof of compatibility.

First, he said it’s fine, because when Element: Comparable, the order is obvious.

When someone mistakes an unordered set type for an ordered set, they are, in general, going to have a bad time. They are going to write for ... in code that makes invalid assumptions. The presence of starts(with:) on that type is going to be the least of their worries.

The full solution to this problem is to not make Set iterable at all. This is the approach python takes. This would be extremely source breaking, removing as it would all the benefits of the useful property of iteration that large quantities of code currently relies on (and thus has a much higher bar to clear, source-compatibility-wise). And even then, as @gwendal.roue points out, a really determined programmer can write bad code against Set.pop() or whatever one-by-one removal method you inevitably will need to vend them.

2 Likes

I disagree that this algorithm doesn't work on Set. It works perfectly well – as well as map does, since accumulate is basically a cross between map and reduce where it shows the intermediate reductions. But someone calling it on a Set should realize that, since the elements are in arbitrary order, so will the result be.

This version is specifically written to add the values up, but a more general version would just take an (T,Element)->T argument just like reduce does. I would love to add this to the standard library in fact. And it would apple to every Iterable type just like map and reduce would, not on this hypothetical OrderedIterable protocol.

3 Likes

I already showed that this exact example is just as problematic written using a for-in loop instead of prefix, so how does Iterable solve this at all? This is what I've been trying to work out throughout this whole thread, but I don't recall anyone directly addressing it. This is also perfectly illustrated by Jens example which doesn't even use any of the methods people are saying are problematic, it solely uses the iterator! Then the very same post ends with

which doesn't appear to help in the given example at all. This is why I believe the only solution that makes sense, if you think this problem needs a solution at all, would be to move Sequence conformance to a view/property on Set and Dictionary, which would possibly remind someone that they don't control the iteration order.

It's not my protocol -- but I think of Iterable as a very lean way to allow a type to be used in a for-loop, without the baggage that Sequence has.
It wouldn't burden implementing types with a bunch of methods that are neither needed nor credible for them, and at the same time, it would allow Sequence to gain at least a little bit of meaning.

I think it would even be ok to have no protocol requirements for loop iteration, and accept any type that offers an iterator -- but that would be rather uncommon for Swift, and I don't consider the ability to build on top of Iterable to be dangerous.

Thanks for the drawings -- maybe illustrations really help to understand the situation.

I myself often missed a holistic approach to Swift evolution -- but history tells us that it is practically impossible to get a "big" idea through the discussion phase (especially when you aren't a member of the core team).
So I won't lay out how I think the other open question could be tackled, as this would guarantee that there would never be consensus on how to handle the single issue of ordered vs unordered.

But I want to point out that I didn't say there has to be a separate protocol to express the concept of OrderedCollection...

I don't think Sequence has any semantic requirements that exceed just that. What "baggage" are you referring to that requires anything other than those semantics?

As an aside, I stumbled across the only (?) reasonable use of elementsEqual(_:) (hopefully soon to be renamed) on a Set or Dictionary in this commit while reading the Hashable proposal.

2 Likes

It's a general shortcoming of Swift evolution that many decisions have to happen without facts...

... and imho this isn't backed by facts either.
I don't know how many extensions for Sequence has been created, but I don't think that it's a good idea to extend Sequence as it's defined now at all:
Such code can be confronted with broad variety of Sequences with different characteristics, and even in the 44 methods on Sequence I counted in the stdlib, I found only five (so it is actually a handful ;-) that can be used without the danger of triggering an infinite loop or producing meaningless results.

I see this as a good argument to add a way to express that difference.

I'm not proposing a solution that offers full protection (imho there's no full protection whatsoever), and I absolutely don't think Set should forbid iteration.
It just should stop pretending to have a meaningful order.

When you say "A is a Sequence", you also say that A has prefixes of arbitrary size, that you can map over it, reverse it... even if 90% of those operations make absolutely no sense for you type.

Just to derail the conversation a bit further, I'd like to suggest that Collection is a bad name anyway, that it should have been called Indexable all along, and that Collection should be a protocol that has a contains(_ element: Element) function.

This, plus Iterable and OrderedIterable (modern day Sequence), gives us modern day Collection via Indexable & OrderedIterable and lets us add a pythonic if x in xs {} to the language, even if xs is not iterable, which is what we all really want.

2 Likes

I already considered our discussion to be finished, simply because of different opinions -- but maybe this specific example gave me the missing insight to understand your position:
In this case, it is not about what you can do, but rather what you should (or shouldn't) do.

Technically, all methods of Sequence need nothing besides an iterator. But conceptionally, things like AccumulateSequence depend a consciousness order.
This is a requirement that currently can't be expressed -- and even if it will always be easy to turn a "unordered" collection into a sequence, it's imho still a valuable distinction.

For comparison, no one stops me from shifting the bits of any byte in memory (except, of course, it's read only ;-). It doesn't matter if it's a pointer, a Float or a String: Technically, I can shift as much as I like.
But conceptionally, it's wiser to renounce this power, because for most types, shifting bits is as stupid as asking a set for its prefix (so I think it's good that most types don't expose that ability).

1 Like

Good catch!
I think it's an important addition that this example belongs to the test suite for Set, which isn't actually representative for production code ;-)