Is a set a sequence?


(Tino) #1

I’m disillusioned enough to not tag this thread as a pitch ;-), but as the surprisingly derailed discussion in SE-0203 — Rename Sequence.elementsEqual didn’t reveal a justification for the hierarchy of collection protocols, I hope a explicit headline might help to evoke some answers.

What do the elders say?

Luckily, sequence is a well established and old concept, so wikipedia has a good article for it - and the important parts are directly in the preview:

What do we do?

Of course, Set is not identical with the concept of a set in math - but it’s clearly modeled after it, so it’s surprising that it violates its key features (Dictionary has similar issues, but I’ll focus on Set).
This is not only unfortunate for mathematicians, but also the reason that Set has some methods that even the official documentation is so ashamed of that it puts them into a separate document ;-) (https://developer.apple.com/documentation/swift/set/order_dependent_operations_on_set)

What are others doing?

Java, for example, does not make Set conform to sequence, but to (Collection and) Iterable, which, afaics, is basically our Sequence without the problematic methods that imply a consciously applied order.

What if we do it differently?

If we would add Iterable and removed Sequence conformance from problematic datatypes, there wouldn’t be the need for complicated names like Sequence.elementsEqualInIterationOrder, which is a source breaking change.
Actually, I think breakage is good, because it would make people aware of cases where they use Set.elementsEqual, which is most likely never what they actually want.
But we would also break all code that uses elementsEqual in the right way.

How?

  1. Remove the problematic methods from Sequence
  2. Rename Sequence to Iterable
  3. Add a new Sequence protocol with the methods that have been removed in step 1
  4. Declare Sequence conformance for Array and other “truly” ordered collections

(feasibility isn’t confirmed ;-) - but I guess it’s harder to find flaws in concept when you don’t know what exactly the concept is)

What wouldn’t break?

Changing the protocol hierarchy is a much more fundamental change than tweaking method names, but that doesn’t necessarily mean that the consequences for existing Swift code are bigger.
Sequence would imply Iterable, so any abstractions build on top of Sequence would still work.
Common use of Set and Dictionary wouldn’t be affected either, because iteration is still possible, and testing a Set for a prefix or reversing a Dictionary hardly makes sense anyway.

So: What are the downsides of Iterable that outweighs the confusion the current hierarchy causes?


On the elementsEqual problem, or “[Pitch] Set and Dictionary should not be Sequences”
(Howard Lovatt) #2

Great pitch! +1


(Brent Royal-Gordon) #3

For one thing, Set conforms to Collection. Do you think we should make a parallel protocol to Collection, too? What about its sub-protocols?


(Tino) #4

[It’s not even a pitch ;-) - but maybe we can at least do this exercise until we reach the point of “that’s to much work”]

So, I think (or better to say: My intuition says ;-) Set: Collection is fine, but Collection: Sequence isn’t.
Most implementers of Collection would also be Sequences, though.
Some aspects of Collection might not be a good fit for Set, but the big contradiction is the emphasis of order implied by Sequence.

So, next step in this chain of thoughts:
What breaks when Collection does not imply Sequence?
(maybe I should repeat that my initial assumption was that all this already has been discussed to death - but I couldn’t find a thread with a rationale)

Update, because I didn’t take notifications into account :slight_smile:
Big parts of this are a reply for @brentdax


(Anthony Latsis) #5

IMO, it is a bit exaggerated to talk about sequences in a mathematical sense regarding Sequence. At least due to the fact that mathematical sequences are infinite. Sequence is differently defined in the documentation – A type that provides sequential, iterated access to its elements – and together with the whole hierarchy is quite consistent with the given definition. Sequences in mathematical analysis are primarily an introduction to the concept of limits, which are further ubiquitous and have a huge impact on the development of the theory. Having, mildly said, few things in common, altering conformances of existing types referring to mathematical sequences is rather irrelevant.


(Tino) #6

That is wrong - it’s even mentioned in the summary from Wikipedia.
Also, Sequence does not have to be finite.

There’s no contradiction for Sequence, but when I say “type a is defined in a wrong way”, arguing against this statement by citing the documentation of type a is circular reasoning ;-)


(Anthony Latsis) #7

That is something different. Not the classic mathematical definition of a sequence. There isn’t even a proper definition in the article.
A sequence in mathematical analysis is defined if for each n ∈ ℕ there exists a real number a(n). If a sequence is finite, the concept of it’s limit becomes senseless – and therefore, mathematics in general.

I’m not sure I follow here. If type A is defined thus, it can’t be an incorrect definition if the definition itself doesn’t involve contradictions.


(Howard Lovatt) #8

It is unfortunate that Sequence has the concept of order in it since that makes it incompatible with Set. The suggested fix of putting all the non-ordered stuff in Iterable and having Sequence extend Iterable and add back in the ordered stuff is good from a migration point of view. Sequence would still however be an unfortunate name.

Another alternative would be to remove all the ordered stuff out of Sequence and create a new protocol, Ordered, with it that is used as a mixin when order is known. This is a tougher migration but ultimately a better solution.

Which do people prefer?


(Tino) #9

I didn’t write a puplic response, but I don’t think you’ll find any mathematican who would deny that a sequence is ordered, just like Wikipedia says.
Swifts Set contradicts the mathematical definition, but Sequence is in harmony with it.


(Félix Fischer) #10

I agree with you in that Set shouldn’t imply an order. Unless I’m missing something in the Swift implementation, the traditional Set is basically a data structure optimized for a belongsTo S operation. Most of its operations are O(inverseAckermann(n)), which is O(1) in practice. That makes it great for algorithms such as Kruskal’s.

Having it be ordered breaks some of those optimizations. Unless there’s a strong use case for Set being ordered, I’d leave it unordered.


(Vladimir) #11

Probably I’m totally confused, then sorry. But Swift’s ‘Sequence’ now means(read ‘implemented as’) “ordered or unordered” bag of things, no constraint, so it is not in harmony with mathematical definition. But ‘Set’ currently is unordered bag of things.

As I understand the core team’s (and other’s) replies on that subject [please correct me if I’m wrong] - the already existed generic code written over Sequence protocol should still be allowed to accept Sets and Dictionaries.

I, as a developer, given current meaning of Sequence, and understanding that this most likely will not be changed[ABI stabilization, breaking changes etc], want at least be able to limit MY generic code to allow only “ordered” Sequences, if this is important for that particular code.

So, isn’t the solution(well, workaround) here is to be able to write something like:
func foo(…) where T: Sequence&Ordered {…}
, and don’t conform Set and Dictionary to that Ordered protocol, but other sequences should be conformed to it.
I.e. nothing will be changed internally, just new empty(?) protocol Ordered be introduced and each built-in “ordered” sequence will conform to it additionally. Probably even typealias OrderedSequence = Sequence&Ordered could be introduced.

But IIUC the rule we have regarding protocols is that they should not be just “markers” but should be useful. And such Ordered protocol is not useful by its own. Probably there could be an exception in that rule?
Dumb idea? :-)


(Tino) #12

Well, the documentation can be really confusing (see https://developer.apple.com/documentation/swift/sequence/2947204-fill as example ;-)
But I couldn’t find the word “unordered” in it, and as Sequence is just a protocol, there is no implementation of the core functionality.

Sequence can’t have any requirements how the order in a type implementing it is defined, but many operations just only make sense when the order is imposed by the creator of the Sequence:
How do you reverse a bag of unordered things, and why should you want to do so?

The motivation of this thread stems from the discussion about one particular method (elementsEqual), which has a confusing effect for Sets:
You can apply it on two Sets that have exactly the same content — and get false as a result!
This happens if the two Sets have a different ordering, and the order inside a Set is something you can only predict when you know the internals of that type.

So the only contradiction with Sequence is that it is implemented by types like Set and Dictionary - and the idea is to simply change that.


#13

Mimicking mathematical sets or sequences is not the job of a Swift set. Its job is to provide a set of operations, with guaranteed algorithmic complexity, that cover a set of related use cases, and that all together form a “data structure” which happens to be named “set” because that data structure looks like a mathematical set:

  • uniqued elements
  • containment test
  • element insertion
  • element removal
  • iteration

An interesting divergence with mathematical sets is that a Swift set requires its elements to conform to Hashable, when a mathematical set would have been OK with Equatable (I mean a way to distinguish different elements). That’s because Swift set wants its methods to have a low complexity. This divergence is interesting because it’s a reminder that we’re not talking mathematics, here.

An interesting convergence with mathematical sets and sequences is that a Swift set can be iterated. In maths, you can write “let F = { x in E | x * 2 }”, or “let V(n) = U(n) + 1”. OK, that’s not really the same iteration as the iteration of a Swift set. But none of those objects really want to hide their contents (at least when they’re countable).

When you iterate a Swift set, you materialize an ordering. Let’s call it the iteration order of the set.

But can I? Does this “iteration order” even exist? Is it a stable property of a set? I mean, can I write the assertions below?

let s = Set(...)
assert(s.elementsEqual(s))
assert(Array(s) == Array(s))
if let i = s.index(where: predicate) {
    assert(predicate(s[i]))
}

The shy answer is: WHY NOT, as long as the Swift set keeps on fulfilling its job at being an implementation of the data structure that is called “set”.

The pragmatic answer is: YES OF COURSE, because preventing users from relying on those assertions would 1. require to increase the complexity of the Swift Set implementation, and 2. make the Swift Set a painful data structure to work with.

The consequence is that a Swift set is an ordered data structure. You can’t control this order. But you can’t say it does not exist. And it’s a hard fact.

The sentence “sets are unordered” is a teacher’s sentence. It’s a way to tell that one can’t control the set order, and should not, generally, rely on it. You don’t know in which order elements will be outputted in for x in set { print(x) }. It may change on the next program execution. It may be different for two sets that contain the same elements. I find it very funny that the word “unordered” is used to describe… an ordering. It should not be taken too seriously.


(Tino) #14

As I wrote in the other thread: I don’t think that we can build any data structure that is truly unordered, and it would be futile to try to remove the ordering from Set.
What can be done is making a split between types that have their order because its user wants it that way, and those which “accidentally” become ordered, because there is a way to extract elements one after another.

Exactly. But many operations of sequence are there to query for properties of the order - which is always possible when there’s iteration, but diminishes the semantic of Sequence and degrades it to a bunch of methods.

Using the same logic that justifies Set having methods like reversed, every single type in Swift should conform to Error.


(cherrywoods) #15

To me, an ordering is something different. Just because I can iterate the elements and I do get these elements one after another, there is not necessarily ordering in the data structure itself.

Asserting that a Set itself has some kind of ordering isn’t a good idea, because to make some sense of the iteration order you get, you need to rely on the implementation details of Set that creates the iteration order.

We seem to have a different understanding of “order”, because “unordered” implies to me that there is no rule under which elements have a certain appearance in a row (in the current context), but not that I can’t iterate them. Certainly there may be some sort of rule in the implementation, but this should be an implementation detail in case of Set and I would therefor assert this order of appearance to be arbitrary.

In general I do support creating a separate protocol for Iteratable types, because to me iteration is clearly something different then ordering.

As long as addressing inside a computer is based on natural numbers, there is always some ordering in sense of this elements address is before that other elements address, but this is not a sencefull ordering in most use cases.


#16

The problem is that since iteration implies ordering, even if it is an implementation-driven ordering, separating those protocols will just have people dump their so-called unordered types into an array, and proceed with their tasks.

Just like they used to do, in the past:

Look at Objective-C NSSet. Its “unordered” smell is strong: it has no firstObject property, like NSArray, but instead anyObject. Much unordered! Of course, the illusion breaks as soon as you see that NSSet conforms to NSFastEnumeration (meaning it has its iteration order just like Swift sets). And, of course, it has the allObjects method which returns the set elements in an NSArray, because, come on, we sometimes want to leave the “unordered” game and do some actual work.

Enters Swift, a protocol-oriented language.

Protocol-oriented programming destroys the little game that NSSet was playing, the “unordered myth”. You can no longer pretend that you are unordered when you are iterable. As soon as you have makeIterator(), you get all the methods that you can build on top of the iteration order. That’s why Swift sets have prefix(while:), despite all teachers claiming that a Set is “unordered”.

Oh, and we no longer have to consume memory by creating an ad-hoc array, like -[NSSet allObjects] when we want to break the unordered illusion, and actually use the iteration order of the set. That’s pretty cool, IMHO.

The unordered set myth has been destroyed by POP.


#17

You can write a (degenerated) version of Gauss’ technique to find the sum of integers with reversed, and it works with sets :-)

func gaussSum<S: Sequence>(_ s: S) -> Int where S.Element == Int {
    return zip(s, s.reversed()).reduce(0) { $0 + $1.0 + $1.1 } / 2
}

gaussSum([1, 2, 3, 4, 5]) // 15
gaussSum(Set([1, 2, 3, 4, 5])) // 15
gaussSum(1..<6) // 15

(I’m kidding, this is not a “real use” of Set.reversed)


(cherrywoods) #18

As I tried to point out, Iteration does not really require (persistent) ordering. If you need to do order dependant operations on a set, in my opinion you should itterate and copy it to an array, even if it is slow, because you can (should) not assume for a Set what is assumable for an ordered collection.

The for loop style iteration needs some kind of ordering in the moment the iteration runs, in terms that you wont get one element twice. Using forEach, which is also iterating for me, does (in it‘s syntax) not require ordering.

It seems as if Sequence destroyed what you call unordered mythos and not protocol orientated programming in general, because Iteratable would just be a protocol too with the operations required for iteration, but without the order dependant ones.

The question is what ordering has a Set?
Array‘s ordering is defined by it‘s indices, such that the first element is before the second, etc. When using an array you may assert some kind of additional ordering implied by your use, e.g. as a Stack where in this example elements with smaller indices are older than those with higher indices.

Now a Set is something diffrent then an Array with unique elements and has no clear ordering.

In my understanding of ordering any arbitrary ordering implies no use cases since to me ordering implies a rule of order. Please correct me if there are use cases for such an (un)ordering, that does not abstract from a specific order but requires specifically an arbitrary ordering.

To summand this: My point of view is: A temporal ordering is no real ordering.

I don‘t know the current implementation of Set, but in a HashSet like java‘s the iteration order could be the order of the hash values, which is an ordering, but no real ordering of the elements and if you are required to use this ordering, you should not use HashSet because this isn‘t a feature of HashSet. This would be like using a TreeSet to implement in-order traversation because TreeSet randomly implements iteration this way.


(Saagar Jha) #19

If you don’t mutate a Set, I believe that it multiple iterations over it are guaranteed to be the same. There’s no need to copy to an Array in this case.


(cherrywoods) #20

I guess this is true for Swifts Set implementation, but speaking about all implementations of Sets, it is not required.

Assume a optimized Set implementation that decides to use another iteration strategy if memory is short or a device runs in low power mode. This implementation will not guarantee the same iteration order even if the Set isn‘t mutated.
The point is that this feature is not one of Sets features but follows from implementation details.

Note that the iteration order remains the same, (and also the implicit implementation order), but how can iteration order be kept if an element is inserted? Where should it be placed? At the end? Why? There is just no background rule in iteration ordering which makes it in my opinion no real ordering.