[Draft] Rename Sequence.elementsEqual

xwu · October 16, 2017, 10:52pm

Sets are values. If you add, remove, or mutate any elements you have a
different Set and thus a potentially different ordering of elements.

From the “value semantics” PoV, yes. But from the “unordered collection
of values” PoV, Sets/Dictionaries, being unordered, are semantically free
to rearrange the in-memory ordering of their elements *without* user
intervention.

No, they are not semantically free to do so. The semantics of Collection
forbid it, because the iteration order must be multi-pass. As long as the
value is unchanged, the iteration order is unchanged. That is a documented,
public guarantee of the API.

Even if a Set value has a fixed order, a copy of that value may have a
*different* order. How many generic algorithm implementations are going to
be confused by that?

Is that so? That would be, on reflection, either a hole in the semantic
guarantees of Collection itself or a problematic conformance of Set to
Collection.

Some months ago, Dave A. and others sent out a draft about some refinements
to `Equatable`. There, we enunciated the notion that `==` should compare
equivalence in all "salient" respects, but not necessarily all publicly
observable behaviors. This is why, for instance, it is semantically
acceptable that `+0.0 == -0.0`; we are declaring that the sign of zero is
not a "salient" respect for the purposes of equivalence of floating point
types. However, this relaxed notion of equivalence/substitutability
*cannot* be acceptable for the purposes of copying a value. Take, for
instance:

func f<T>(x: T) {
  print(x)
}

print(-0.0) // It is *not OK* if this prints "+0.0".

To my simplistic thinking, a copy of an instance of a value type must be
indistinguishable from the original value with respect to all public APIs
(with the only and obvious exception of APIs that inquire into memory
location/layout). When Collection guarantees that conforming types are
multi-pass sequences, it *ought* to guarantee (if it does not do so already
on a proper interpretation of the semantic requirements) that copies are
indistinguishable in iteration order. If it did not do so, then so many
non-trivial algorithms that already operate on generic Collections are
assuming semantics that aren't guaranteed and would in fact pervasively
blow up when given a specially crafted but conformant Collection. This
would apply whether or not Sequence and Collection are modified to exclude
Set and Dictionary, as an "intrinsically ordered" collection type may not
have an intrinsic *linear* order over its elements, and there is no
semantic requirement that the *iteration order* (necessarily linear)
corresponds in any particular way to the "intrinsic order."

Now, if we agree that Collection does in fact (or ought to) guarantee the
stability of iteration order over copies, then the question is, does
today's `Swift.Set` properly meet those guarantees? If indeed, as you say,
copies of an instance of `Set` may internally rearrange its iteration order
on copy, then we do have a problem with the conformance of `Set` to
`Collection`.

···

On Mon, Oct 16, 2017 at 15:31 Greg Parker <gparker@apple.com> wrote:

On Oct 16, 2017, at 1:08 PM, Xiaodi Wu via swift-evolution < > swift-evolution@swift.org> wrote:
On Mon, Oct 16, 2017 at 13:15 David Sweeris <davesweeris@mac.com> wrote:

On Oct 16, 2017, at 09:21, Michael Ilseman <milseman@apple.com> wrote:

--
Greg Parker gparker@apple.com Runtime Wrangler

xwu · October 17, 2017, 1:19am

Modeling is, by definition, imperfect. The question is, what imperfect
model is most useful _to Swift_. The idea is that conforming Set and
Dictionary to Collection is on balance more useful than not doing so; that
having two protocols, Sequence and Collection, is on balance more useful
than having one or three, and that the set of semantic guarantees of
Collection are on balance more useful than other possible sets of semantic
guarantees.

···

On Mon, Oct 16, 2017 at 8:03 PM, David Sweeris <davesweeris@mac.com> wrote:

On Oct 16, 2017, at 1:07 PM, Xiaodi Wu <xiaodi.wu@gmail.com> wrote:

On Mon, Oct 16, 2017 at 13:15 David Sweeris <davesweeris@mac.com> wrote:

On Oct 16, 2017, at 09:21, Michael Ilseman <milseman@apple.com> wrote:

On Oct 16, 2017, at 8:46 AM, David Sweeris via swift-evolution < >> swift-evolution@swift.org> wrote:

On Oct 16, 2017, at 07:20, Xiaodi Wu via swift-evolution < >> swift-evolution@swift.org> wrote:

On Mon, Oct 16, 2017 at 05:48 Jonathan Hull <jhull@gbis.com> wrote:

On Oct 15, 2017, at 9:58 PM, Xiaodi Wu <xiaodi.wu@gmail.com> wrote:

On Sun, Oct 15, 2017 at 8:51 PM, Jonathan Hull <jhull@gbis.com> wrote:

On Oct 14, 2017, at 10:48 PM, Xiaodi Wu <xiaodi.wu@gmail.com> wrote:

That ordering can be arbitrary, but it shouldn’t leak internal

representation such that the method used to create identical things affects
the outcome of generic methods because of differences in internal
representation.

It would be better to say that the iteration order is well-defined.
That will almost always mean documented, and usually predictable though
obviously e.g. RNGs and iterating in random order will not be predictable
by design.

That's actually more semantically constrained than what Swift calls
a `Collection` (which requires conforming types to be multi-pass and(?)
finite). By contrast, Swift's `SpongeBob` protocol explicitly permits
conforming single-pass, infinite, and/or unordered types.

I think you’re talking about Sequence here, I’ve lost track of your
nonsense by now. Yes, the current Swift protocol named Sequence allows
unordered types. You seem to keep asserting that but not actually
addressing my argument, which is *that allowing Sequences to be
unordered with the current API is undesired and actively harmful, and
should* *therefore** be changed*.

What is harmful about it?

After thinking about it, I think the harmful bit is that unordered
sequences are leaking internal representation (In your example, this is
causing people to be surprised when two sets with identical elements are
generating different sequences/orderings based on how they were created).
You are correct when you say that this problem is even true for for-in.

I would not say it is a problem. Rather, by definition, iteration
involves retrieving one element after another; if you're allowed to do that
with Set, then the elements of a Set are observably ordered in some way.
Since it's not an OrderedSet--i.e., order doesn't matter--then the only
sensible conclusion is that the order of elements obtained in a for...in
loop must be arbitrary. If you think this is harmful, then you must believe
that one should be prohibited from iterating over an instance of Set.
Otherwise, Set is inescapably a Sequence by the Swift definition of
Sequence. All extension methods on Sequence like drop(while:) are really
just conveniences for common things that you can do with iterated access;
to my mind, they're essentially just alternative ways of spelling various
for...in loops.

I think an argument could be made that you shouldn’t be able to
iterate over a set without first defining an ordering on it (even if that
ordering is somewhat arbitrary). Maybe we have something like a
“Sequenc(e)able” protocol which defines things which can be turned into a
sequence when combined with some sort of ordering. One possible ordering
could be the internal representation (At least in that case we are calling
it out specifically). If I had to say “setA.arbitraryOrder.
elementsEqual(setB.arbitraryOrder)” I would definitely be less
surprised when it returns false even though setA == setB.

Well, that's a totally different direction, then; you're arguing that
`Set` and `Dictionary` should not conform to `Sequence` altogether. That's
fine (it's also a direction that some of us explored off-list a while ago),
but at this point in Swift's evolution, realistically, it's not within the
realm of possible changes.

I am actually suggesting something slightly different. Basically, Set
and Dictionary’s conformance to Collection would have a different
implementation. They would conform to another protocol declaring that they
are unordered. That protocol would fill in part of the conformance to
sequence/collection using a default ordering, which is mostly arbitrary,
but guaranteed to produce the same ordering for the same list of elements
(even across collection types). This would be safer, but a tiny bit slower
than what we have now (We could also potentially develop a way for
collections like set to amortize the cost). For those who need to recover
speed, the new protocol would also define a property which quickly returns
a sequence/iterator using the internal ordering (I arbitrarily called it
.arbitraryOrder).

I believe it would not be source breaking.

That is indeed something slightly different.

In an ideal world--and my initial understanding of what you were
suggesting--Set and Dictionary would each have a member like `collection`,
which would expose the underlying data as a `SetCollection` or
`DictionaryCollection` that in turn would conform to `Collection`;
meanwhile, Set and Dictionary themselves would not offer methods such as
`prefix`, or indexing by subscript, which are not compatible with being
unordered. For those who want a particular ordering, there'd be something
like `collection(ordered areInIncreasingOrder: (T, T) -> Bool) ->
{Set|Dictionary}Collection`.

What you suggest here instead would be minimally source-breaking.
However, I'm unsure of where these guarantees provide benefit to justify
the performance cost. Certainly not for `first` or `dropFirst(_:)`, which
still yields an arbitrary result which doesn't make sense for something
_unordered_. We *could* have an underscored customization point named
something like `_customOrderingPass` that is only invoked from
`elementsEqual` or other such methods to pre-rearrange the internal
ordering of unordered collections in some deterministic way before
comparison. Is that what you have in mind?

Something like that. Whatever we do, there will be a tradeoff between
speed, correctness, and ergonomics.

My suggestion trades speed for correctness, and provides a way to
recover speed through additional typing (which is slightly less ergonomic).

You haven't convinced me that this is at all improved in "correctness."
It trades one arbitrary iteration order for another on a type that tries to
model an unordered collection.

We could do something like you suggest. I don’t think the method would
need to be underscored… the ordering pass could just be a method on the
protocol which defines it as unordered. Then we could provide a special
conformance for things where order really matters based on adherence to
that protocol. That might be an acceptable tradeoff. It would give us
speed at the cost of having the correct implementation being less ergonomic
and more error prone (you have to remember to check that it is unordered
and call the ordering method when it mattered).

I’d still be a bit worried that people would make incorrect generic
algorithms based on expecting an order from unordered things, but at least
it would be possible for them check and handle it correctly. I think I
could get behind that tradeoff/compromise, given where we are in the swift
process and Swift's obsession with speed (though I still slightly prefer
the safer default). At least the standard library would handle all the
things correctly, and that is what will affect the majority of programmers.

What is an example of such an "incorrect" generic algorithm that would
be made correct by such a scheme?

To start with, the one you gave as an example at the beginning of this
discussion: Two sets with identical elements which have different internal
storage and thus give different orderings as sequences. You yourself have
argued that the confusion around this is enough of a problem that we need
to make a source-breaking change (renaming it) to warn people that the
results of the ‘elementsEqual’ algorithm are undefined for sets and
dictionaries.

No, I am arguing that the confusion about ‘elementsEqual’ is foremost a
problem with its name; the result of this operation is not at all undefined
for two sets but actually clearly defined: it returns true if two sets have
the same elements in the same iteration order, which is a publicly
observable behavior of sets (likewise dictionaries).

How is the iteration order of an unordered set or dictionary “publicly
observable”? If either is implemented such that it can asynchronously
optimize its storage (maybe by rebalancing a tree or merging two
non-contiguous array segments or something), its iteration order could
change *without* changing what values it contains. Seems like
consecutive calls to “elementsEquals” (or whatever we’re calling it) should
return the same answer, if we don’t add, remove, or mutate elements.

Sets are values. If you add, remove, or mutate any elements you have a
different Set and thus a potentially different ordering of elements.

From the “value semantics” PoV, yes. But from the “unordered collection
of values” PoV, Sets/Dictionaries, being unordered, are semantically free
to rearrange the in-memory ordering of their elements *without* user
intervention.

No, they are not semantically free to do so. The semantics of Collection
forbid it, because the iteration order must be multi-pass. As long as the
value is unchanged, the iteration order is unchanged. That is a documented,
public guarantee of the API.

If the semantics of unordered collections (with a lowercase c) and of
`Collection` (with an uppercase C) differ on such a basic level, then why
are we trying to force them together? I mean, I understand that
source-compatibility is important, but so is correctly modeling that which
we claim to model.

Adam_Kemp1 · October 17, 2017, 6:25pm

Even in the simple case this leaks an implementation detail (that we’re using map) into the signature.

···

On Oct 17, 2017, at 11:20 AM, Kevin Nattinger <swift@nattinger.net> wrote:

Hmm, I'm not sure that would work with the covariant requirement. The associated type one could:
func firstNames<U: Unordered>(ofPeople people: U<MapResultType: Person>) -> U.MapResultType<Element: String> {
return people.map { $0.firstName }
}

If the sequence you put in maps to an ordered sequence, you get the ordered sequence out.
That said, I can see how the generics there could get out-of-hand as you add more operations… -> U.MapResultType.FilterResultType.MapResultType…<Element: String>

xwu · October 17, 2017, 12:46pm

To start with, the one you gave as an example at the beginning of this

discussion: Two sets with identical elements which have different internal
storage and thus give different orderings as sequences. You yourself have
argued that the confusion around this is enough of a problem that we need
to make a source-breaking change (renaming it) to warn people that the
results of the ‘elementsEqual’ algorithm are undefined for sets and
dictionaries.

No, I am arguing that the confusion about ‘elementsEqual’ is foremost a
problem with its name; the result of this operation is not at all undefined
for two sets but actually clearly defined: it returns true if two sets have
the same elements in the same iteration order, which is a publicly
observable behavior of sets (likewise dictionaries).

But that iteration order is undefined and could easily change due to
changes in the private/internal structure of sets/dictionaries. Algorithms
that rely on that “publicly observable behavior” (i.e. leaking of
internals) will suddenly break.

And an algorithm in which such “sudden breakage” would occur is…?

Here are a few off the top of my head:

func hasPrefix(Sequence)->Bool
func hasSuffix(Sequence)->Bool
func containsSubsequence(Sequence)->Bool

What do these methods mean with regards to Set’s “publicly observable
behavior”?

In what way do these algorithms break? They would continue to
determine--correctly--whether an instance of Set, when iterated, begins
with, ends with, or contains (respectively) a subsequence that matches the
argument.

Why do you not answe the question, what these methods *mean* for a Set?
Still waiting for a use case.

The method means exactly what I just said: the iteration order of one set
matches the iteration order of another sequence. I’ve given you one use
case and others have given more.

···

On Tue, Oct 17, 2017 at 01:03 Thorsten Seitz <tseitz42@icloud.com> wrote:

Am 17.10.2017 um 01:43 schrieb Xiaodi Wu via swift-evolution < > swift-evolution@swift.org>:
On Mon, Oct 16, 2017 at 6:10 PM, Jonathan Hull <jhull@gbis.com> wrote:

On Oct 16, 2017, at 1:05 PM, Xiaodi Wu <xiaodi.wu@gmail.com> wrote:
On Mon, Oct 16, 2017 at 10:49 Jonathan Hull <jhull@gbis.com> wrote:

On Oct 16, 2017, at 7:20 AM, Xiaodi Wu <xiaodi.wu@gmail.com> wrote:

-Thorsten

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution

xwu · October 17, 2017, 4:34pm

That ordering can be arbitrary, but it shouldn’t leak internal

representation such that the method used to create identical things affects
the outcome of generic methods because of differences in internal
representation.

It would be better to say that the iteration order is
well-defined. That will almost always mean documented, and usually
predictable though obviously e.g. RNGs and iterating in random order will
not be predictable by design.

That's actually more semantically constrained than what Swift
calls a `Collection` (which requires conforming types to be multi-pass
and(?) finite). By contrast, Swift's `SpongeBob` protocol explicitly
permits conforming single-pass, infinite, and/or unordered types.

I think you’re talking about Sequence here, I’ve lost track of
your nonsense by now. Yes, the current Swift protocol named Sequence allows
unordered types. You seem to keep asserting that but not actually
addressing my argument, which is *that allowing Sequences to be
unordered with the current API is undesired and actively harmful, and
should* *therefore** be changed*.

What is harmful about it?

After thinking about it, I think the harmful bit is that unordered
sequences are leaking internal representation (In your example, this is
causing people to be surprised when two sets with identical elements are
generating different sequences/orderings based on how they were created).
You are correct when you say that this problem is even true for for-in.

I would not say it is a problem. Rather, by definition, iteration
involves retrieving one element after another; if you're allowed to do that
with Set, then the elements of a Set are observably ordered in some way.
Since it's not an OrderedSet--i.e., order doesn't matter--then the only
sensible conclusion is that the order of elements obtained in a for...in
loop must be arbitrary. If you think this is harmful, then you must believe
that one should be prohibited from iterating over an instance of Set.
Otherwise, Set is inescapably a Sequence by the Swift definition of
Sequence. All extension methods on Sequence like drop(while:) are really
just conveniences for common things that you can do with iterated access;
to my mind, they're essentially just alternative ways of spelling various
for...in loops.

I think an argument could be made that you shouldn’t be able to
iterate over a set without first defining an ordering on it (even if that
ordering is somewhat arbitrary). Maybe we have something like a
“Sequenc(e)able” protocol which defines things which can be turned into a
sequence when combined with some sort of ordering. One possible ordering
could be the internal representation (At least in that case we are calling
it out specifically). If I had to say
“setA.arbitraryOrder.elementsEqual(setB.arbitraryOrder)” I would definitely
be less surprised when it returns false even though setA == setB.

Well, that's a totally different direction, then; you're arguing that
`Set` and `Dictionary` should not conform to `Sequence` altogether. That's
fine (it's also a direction that some of us explored off-list a while ago),
but at this point in Swift's evolution, realistically, it's not within the
realm of possible changes.

I am actually suggesting something slightly different. Basically,
Set and Dictionary’s conformance to Collection would have a different
implementation. They would conform to another protocol declaring that they
are unordered. That protocol would fill in part of the conformance to
sequence/collection using a default ordering, which is mostly arbitrary,
but guaranteed to produce the same ordering for the same list of elements
(even across collection types). This would be safer, but a tiny bit slower
than what we have now (We could also potentially develop a way for
collections like set to amortize the cost). For those who need to recover
speed, the new protocol would also define a property which quickly returns
a sequence/iterator using the internal ordering (I arbitrarily called it
.arbitraryOrder).

I believe it would not be source breaking.

That is indeed something slightly different.

In an ideal world--and my initial understanding of what you were
suggesting--Set and Dictionary would each have a member like `collection`,
which would expose the underlying data as a `SetCollection` or
`DictionaryCollection` that in turn would conform to `Collection`;
meanwhile, Set and Dictionary themselves would not offer methods such as
`prefix`, or indexing by subscript, which are not compatible with being
unordered. For those who want a particular ordering, there'd be something
like `collection(ordered areInIncreasingOrder: (T, T) -> Bool) ->
{Set|Dictionary}Collection`.

What you suggest here instead would be minimally source-breaking.
However, I'm unsure of where these guarantees provide benefit to justify
the performance cost. Certainly not for `first` or `dropFirst(_:)`, which
still yields an arbitrary result which doesn't make sense for something
_unordered_. We *could* have an underscored customization point named
something like `_customOrderingPass` that is only invoked from
`elementsEqual` or other such methods to pre-rearrange the internal
ordering of unordered collections in some deterministic way before
comparison. Is that what you have in mind?

Something like that. Whatever we do, there will be a tradeoff between
speed, correctness, and ergonomics.

My suggestion trades speed for correctness, and provides a way to
recover speed through additional typing (which is slightly less ergonomic).

You haven't convinced me that this is at all improved in "correctness."
It trades one arbitrary iteration order for another on a type that tries to
model an unordered collection.

We could do something like you suggest. I don’t think the method would
need to be underscored… the ordering pass could just be a method on the
protocol which defines it as unordered. Then we could provide a special
conformance for things where order really matters based on adherence to
that protocol. That might be an acceptable tradeoff. It would give us
speed at the cost of having the correct implementation being less ergonomic
and more error prone (you have to remember to check that it is unordered
and call the ordering method when it mattered).

I’d still be a bit worried that people would make incorrect generic
algorithms based on expecting an order from unordered things, but at least
it would be possible for them check and handle it correctly. I think I
could get behind that tradeoff/compromise, given where we are in the swift
process and Swift's obsession with speed (though I still slightly prefer
the safer default). At least the standard library would handle all the
things correctly, and that is what will affect the majority of programmers.

What is an example of such an "incorrect" generic algorithm that would
be made correct by such a scheme?

To start with, the one you gave as an example at the beginning of this
discussion: Two sets with identical elements which have different internal
storage and thus give different orderings as sequences. You yourself have
argued that the confusion around this is enough of a problem that we need
to make a source-breaking change (renaming it) to warn people that the
results of the ‘elementsEqual’ algorithm are undefined for sets and
dictionaries.

No, I am arguing that the confusion about ‘elementsEqual’ is foremost a
problem with its name; the result of this operation is not at all undefined
for two sets but actually clearly defined: it returns true if two sets have
the same elements in the same iteration order, which is a publicly
observable behavior of sets (likewise dictionaries).

But it is a behavior which has absolutely no meaning at all because the
order does not depend on the elements of the set but on the history of how
the set has been reached its current state.
So why should I ever use this method on a set?
What is the use case?

One example: you can use it to check an instance of Set<Float> to
determine if it has a NaN value. (The “obvious” way of doing it is not
guaranteed to work since NaN != NaN.)

How would I do that? I'd rather expect to use a property isNaN on Float
to do that.

set.elementsEqual(set)

I see why that would work (thanks to Set being a collection vs a
sequence), but it still feels like a hack. I definitely wouldn’t want to
be maintaining code with that in it. Especially when compared to something
like:

set.contains(where: {$0.isNaN})

The purpose of protocols is to enable useful generic code. You cannot use
isNaN for code that works on generic Collection, or for that matter, even
code that works on Set where Element : Numeric.

Much generic code (for example, generic sorting) easily blows up when
encountering NaN. If you want an algorithm that is robust enough to handle
(or at least reject) that scenario, then you will need to check for it. It
is not a “hack” but a very common and accepted way of determining the
presence of NaN.

I don’t see why a non-source-breaking change is suddenly off-limits.

But more than that, any generic algorithm which is assuming that the
sequence is coming from an ordered source (i.e. many things using
first/last). Some uses of first are ok because the programmer actually
means ‘any’, but anywhere where they actually mean first/last may be
problematic.

Such as...?

Currently, there is no way to test for ordered-ness, so there is no way

for even a careful programmer to mitigate this problem. By adding a
protocol which states that something is unordered, we can either branch on
it, or create a separate version of an algorithm for things which conform.

It is clearly the case that Swift’s protocol hierarchy fits sets and
collections imperfectly; however, it is in the nature of modeling that
imperfections are present. The question is not whether it is possible to
incur performance, API surface area, and other trade-offs to make the model
more faithful, but rather whether this usefully solves any problem. What is
the problem being mitigated? As I write above, Swift’s Set and Dictionary
types meet the semantic requirements for Collection and moonlight as
ordered collections. What is a generic algorithm on an ordered collection
that is “not OK” for Set and Dictionary? (“elementsEqual”, as I’ve said,
is not such an example.)

On the contrary, `elementsEqual` is exactly such an example, because it
makes no sense to use it on a Set.

let s1 = Set([1,2,3,4,5,6])
let s2 = Set([6,5,4,3,2,1])

Both sets have different iteration orders. Comparing those sets with
some other collection using `elementsEqual` will give no meaningful result
because the order - and therefore the result of `elementsEqual` - is in
effect random.

No, it is not such an example; it’s misleadingly named but works
correctly—that is, its behavior matches exactly the documented behavior,
which relies on only the semantic guarantees of Sequence, which Set
correctly fulfills.

Fulfills to the letter. Again, what can you do with it if the result is
random??

The result is not random.

It is undefined though. As you said earlier, by the guarantees we have
been given, it may shift over different builds/runs of a program. Thus in
one run, it might return true and then false in another (without changing
our code). As Greg pointed out, it is also possible with the guarantees we
are given, for set/dict to have different orderings with copies of
themselves. (It will happen for sure when deep copying a dictionary with
reference-type keys).

As I wrote to Greg, if that is a possibility for Set, then the
implementation is problematic for conformance to Collection and needs to be
re-examined.

I’m not sure why you claim the order of a dictionary will definitiely
change on deep copying. But in any case, we are not talking about deep
copying here, or at least, I’m not; we’re talking about the notional
copying involved in the semantics of passing a set as an argument.

As I keep saying, relying on the behavior of a leaking internal

implementation is a bad plan.

Again, it is not a leaking internal implementation. It is a public API
guarantee.

We should add an additional guarantee to set/dict that the order returned

will be the same for the same contents regardless of history (but can be
otherwise arbitrary). That will fix the behavior for algorithms like
elementsEqual (i.e. They will return the same result across builds/runs).
It will also implicitly provide as a result, the constraint you were
arguing is needed across copies of a collection type. I agree that that is
an important guarantee. Why not fix both issues with a single
non-source-breaking change?

As I wrote, the behavior of elementsEqual is not broken and does not
require fixing. What is the benefit of the guarantee you propose that
justifies the performance cost?

Why is the source-breaking change of renaming things better?

Because, in my analysis, the problem is that the method is incorrectly
named. The problem affects all types that conform to Sequence and not just
Set and Dictionary; elementsEqual is a distinct function from ==, and it
must either continue to be distinct or cease to exist, but its name does
nothing to clarify any distinction.

Thanks,

···

On Tue, Oct 17, 2017 at 09:42 Jonathan Hull <jhull@gbis.com> wrote:

On Oct 17, 2017, at 5:44 AM, Xiaodi Wu <xiaodi.wu@gmail.com> wrote:
On Tue, Oct 17, 2017 at 00:56 Thorsten Seitz <tseitz42@icloud.com> wrote:

Am 17.10.2017 um 00:13 schrieb Xiaodi Wu <xiaodi.wu@gmail.com>:
On Mon, Oct 16, 2017 at 14:21 Thorsten Seitz <tseitz42@icloud.com> wrote:

Am 16.10.2017 um 16:20 schrieb Xiaodi Wu via swift-evolution < >>> swift-evolution@swift.org>:
On Mon, Oct 16, 2017 at 05:48 Jonathan Hull <jhull@gbis.com> wrote:

On Oct 15, 2017, at 9:58 PM, Xiaodi Wu <xiaodi.wu@gmail.com> wrote:
On Sun, Oct 15, 2017 at 8:51 PM, Jonathan Hull <jhull@gbis.com> wrote:

On Oct 14, 2017, at 10:48 PM, Xiaodi Wu <xiaodi.wu@gmail.com> wrote:

Jon

trs · October 17, 2017, 6:24am

That ordering can be arbitrary, but it shouldn’t leak internal representation such that the method used to create identical things affects the outcome of generic methods because of differences in internal representation.

It would be better to say that the iteration order is well-defined. That will almost always mean documented, and usually predictable though obviously e.g. RNGs and iterating in random order will not be predictable by design.

That's actually more semantically constrained than what Swift calls a `Collection` (which requires conforming types to be multi-pass and(?) finite). By contrast, Swift's `SpongeBob` protocol explicitly permits conforming single-pass, infinite, and/or unordered types.

I think you’re talking about Sequence here, I’ve lost track of your nonsense by now. Yes, the current Swift protocol named Sequence allows unordered types. You seem to keep asserting that but not actually addressing my argument, which is that allowing Sequences to be unordered with the current API is undesired and actively harmful, and should therefore be changed.

What is harmful about it?

After thinking about it, I think the harmful bit is that unordered sequences are leaking internal representation (In your example, this is causing people to be surprised when two sets with identical elements are generating different sequences/orderings based on how they were created). You are correct when you say that this problem is even true for for-in.

I would not say it is a problem. Rather, by definition, iteration involves retrieving one element after another; if you're allowed to do that with Set, then the elements of a Set are observably ordered in some way. Since it's not an OrderedSet--i.e., order doesn't matter--then the only sensible conclusion is that the order of elements obtained in a for...in loop must be arbitrary. If you think this is harmful, then you must believe that one should be prohibited from iterating over an instance of Set. Otherwise, Set is inescapably a Sequence by the Swift definition of Sequence. All extension methods on Sequence like drop(while:) are really just conveniences for common things that you can do with iterated access; to my mind, they're essentially just alternative ways of spelling various for...in loops.

I think an argument could be made that you shouldn’t be able to iterate over a set without first defining an ordering on it (even if that ordering is somewhat arbitrary). Maybe we have something like a “Sequenc(e)able” protocol which defines things which can be turned into a sequence when combined with some sort of ordering. One possible ordering could be the internal representation (At least in that case we are calling it out specifically). If I had to say “setA.arbitraryOrder.elementsEqual(setB.arbitraryOrder)” I would definitely be less surprised when it returns false even though setA == setB.

Well, that's a totally different direction, then; you're arguing that `Set` and `Dictionary` should not conform to `Sequence` altogether. That's fine (it's also a direction that some of us explored off-list a while ago), but at this point in Swift's evolution, realistically, it's not within the realm of possible changes.

I am actually suggesting something slightly different. Basically, Set and Dictionary’s conformance to Collection would have a different implementation. They would conform to another protocol declaring that they are unordered. That protocol would fill in part of the conformance to sequence/collection using a default ordering, which is mostly arbitrary, but guaranteed to produce the same ordering for the same list of elements (even across collection types). This would be safer, but a tiny bit slower than what we have now (We could also potentially develop a way for collections like set to amortize the cost). For those who need to recover speed, the new protocol would also define a property which quickly returns a sequence/iterator using the internal ordering (I arbitrarily called it .arbitraryOrder).

I believe it would not be source breaking.

That is indeed something slightly different.

In an ideal world--and my initial understanding of what you were suggesting--Set and Dictionary would each have a member like `collection`, which would expose the underlying data as a `SetCollection` or `DictionaryCollection` that in turn would conform to `Collection`; meanwhile, Set and Dictionary themselves would not offer methods such as `prefix`, or indexing by subscript, which are not compatible with being unordered. For those who want a particular ordering, there'd be something like `collection(ordered areInIncreasingOrder: (T, T) -> Bool) -> {Set|Dictionary}Collection`.

What you suggest here instead would be minimally source-breaking. However, I'm unsure of where these guarantees provide benefit to justify the performance cost. Certainly not for `first` or `dropFirst(_:)`, which still yields an arbitrary result which doesn't make sense for something _unordered_. We *could* have an underscored customization point named something like `_customOrderingPass` that is only invoked from `elementsEqual` or other such methods to pre-rearrange the internal ordering of unordered collections in some deterministic way before comparison. Is that what you have in mind?

Something like that. Whatever we do, there will be a tradeoff between speed, correctness, and ergonomics.

My suggestion trades speed for correctness, and provides a way to recover speed through additional typing (which is slightly less ergonomic).

You haven't convinced me that this is at all improved in "correctness." It trades one arbitrary iteration order for another on a type that tries to model an unordered collection.

We could do something like you suggest. I don’t think the method would need to be underscored… the ordering pass could just be a method on the protocol which defines it as unordered. Then we could provide a special conformance for things where order really matters based on adherence to that protocol. That might be an acceptable tradeoff. It would give us speed at the cost of having the correct implementation being less ergonomic and more error prone (you have to remember to check that it is unordered and call the ordering method when it mattered).

I’d still be a bit worried that people would make incorrect generic algorithms based on expecting an order from unordered things, but at least it would be possible for them check and handle it correctly. I think I could get behind that tradeoff/compromise, given where we are in the swift process and Swift's obsession with speed (though I still slightly prefer the safer default). At least the standard library would handle all the things correctly, and that is what will affect the majority of programmers.

What is an example of such an "incorrect" generic algorithm that would be made correct by such a scheme?

To start with, the one you gave as an example at the beginning of this discussion: Two sets with identical elements which have different internal storage and thus give different orderings as sequences. You yourself have argued that the confusion around this is enough of a problem that we need to make a source-breaking change (renaming it) to warn people that the results of the ‘elementsEqual’ algorithm are undefined for sets and dictionaries.

No, I am arguing that the confusion about ‘elementsEqual’ is foremost a problem with its name; the result of this operation is not at all undefined for two sets but actually clearly defined: it returns true if two sets have the same elements in the same iteration order, which is a publicly observable behavior of sets (likewise dictionaries).

How is the iteration order of an unordered set or dictionary “publicly observable”? If either is implemented such that it can asynchronously optimize its storage (maybe by rebalancing a tree or merging two non-contiguous array segments or something), its iteration order could change without changing what values it contains. Seems like consecutive calls to “elementsEquals” (or whatever we’re calling it) should return the same answer, if we don’t add, remove, or mutate elements.

Sets are values. If you add, remove, or mutate any elements you have a different Set and thus a potentially different ordering of elements.

From the “value semantics” PoV, yes. But from the “unordered collection of values” PoV, Sets/Dictionaries, being unordered, are semantically free to rearrange the in-memory ordering of their elements without user intervention.

No, they are not semantically free to do so. The semantics of Collection forbid it, because the iteration order must be multi-pass. As long as the value is unchanged, the iteration order is unchanged. That is a documented, public guarantee of the API.

If the semantics of unordered collections (with a lowercase c) and of `Collection` (with an uppercase C) differ on such a basic level, then why are we trying to force them together? I mean, I understand that source-compatibility is important, but so is correctly modeling that which we claim to model.

Modeling is, by definition, imperfect. The question is, what imperfect model is most useful _to Swift_. The idea is that conforming Set and Dictionary to Collection is on balance more useful than not doing so; that having two protocols, Sequence and Collection, is on balance more useful than having one or three, and that the set of semantic guarantees of Collection are on balance more useful than other possible sets of semantic guarantees.

That is your idea which is disputed and underlined with arguments whereas you keep repeating that Set behaves as dictated by its conformance without giving use cases why this should be useful.

-Thorsten

···

Am 17.10.2017 um 03:20 schrieb Xiaodi Wu via swift-evolution <swift-evolution@swift.org>:

On Mon, Oct 16, 2017 at 8:03 PM, David Sweeris <davesweeris@mac.com> wrote:

On Oct 16, 2017, at 1:07 PM, Xiaodi Wu <xiaodi.wu@gmail.com> wrote:

On Mon, Oct 16, 2017 at 13:15 David Sweeris <davesweeris@mac.com> wrote:

On Oct 16, 2017, at 09:21, Michael Ilseman <milseman@apple.com> wrote:

On Oct 16, 2017, at 8:46 AM, David Sweeris via swift-evolution <swift-evolution@swift.org> wrote:
On Oct 16, 2017, at 07:20, Xiaodi Wu via swift-evolution <swift-evolution@swift.org> wrote:

On Mon, Oct 16, 2017 at 05:48 Jonathan Hull <jhull@gbis.com> wrote:

On Oct 15, 2017, at 9:58 PM, Xiaodi Wu <xiaodi.wu@gmail.com> wrote:

On Sun, Oct 15, 2017 at 8:51 PM, Jonathan Hull <jhull@gbis.com> wrote:

On Oct 14, 2017, at 10:48 PM, Xiaodi Wu <xiaodi.wu@gmail.com> wrote:

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution

knatt · October 17, 2017, 4:48pm

Because, in my analysis, the problem is that the method is incorrectly named. The problem affects all types that conform to Sequence and not just Set and Dictionary; elementsEqual is a distinct function from ==, and it must either continue to be distinct or cease to exist, but its name does nothing to clarify any distinction.

In my analysis, the problem is the method's implementation. As I see it, the only use for `elementsEqual` is as a replacement for `==` when two objects are different types (or not known to be the same)—equal elements, and IF the sequences have an order, in the same order. Could you provide an example where `elementsEqual` randomly returning either true or false depending on internal state alone is a legitimate and desirable result?

···

Thanks,
Jon
_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org <mailto:swift-evolution@swift.org>
https://lists.swift.org/mailman/listinfo/swift-evolution

David_Sweeris · October 17, 2017, 5:34pm

That's easy to fix:

public protocol Numeric {
  // whatever's there now, plus
  var isNaN: Bool {get}
}
public extension Numeric where Self: BinaryInteger {
  var isNaN: Bool { return false }
}

- Dave Sweeris

···

On Oct 17, 2017, at 9:34 AM, Xiaodi Wu via swift-evolution <swift-evolution@swift.org> wrote:

On Tue, Oct 17, 2017 at 09:42 Jonathan Hull <jhull@gbis.com <mailto:jhull@gbis.com>> wrote:

On Oct 17, 2017, at 5:44 AM, Xiaodi Wu <xiaodi.wu@gmail.com <mailto:xiaodi.wu@gmail.com>> wrote:

On Tue, Oct 17, 2017 at 00:56 Thorsten Seitz <tseitz42@icloud.com <mailto:tseitz42@icloud.com>> wrote:

Am 17.10.2017 um 00:13 schrieb Xiaodi Wu <xiaodi.wu@gmail.com <mailto:xiaodi.wu@gmail.com>>:

On Mon, Oct 16, 2017 at 14:21 Thorsten Seitz <tseitz42@icloud.com <mailto:tseitz42@icloud.com>> wrote:

Am 16.10.2017 um 16:20 schrieb Xiaodi Wu via swift-evolution <swift-evolution@swift.org <mailto:swift-evolution@swift.org>>:

On Mon, Oct 16, 2017 at 05:48 Jonathan Hull <jhull@gbis.com <mailto:jhull@gbis.com>> wrote:

On Oct 15, 2017, at 9:58 PM, Xiaodi Wu <xiaodi.wu@gmail.com <mailto:xiaodi.wu@gmail.com>> wrote:

On Sun, Oct 15, 2017 at 8:51 PM, Jonathan Hull <jhull@gbis.com <mailto:jhull@gbis.com>> wrote:

On Oct 14, 2017, at 10:48 PM, Xiaodi Wu <xiaodi.wu@gmail.com <mailto:xiaodi.wu@gmail.com>> wrote:

That ordering can be arbitrary, but it shouldn’t leak internal representation such that the method used to create identical things affects the outcome of generic methods because of differences in internal representation.

It would be better to say that the iteration order is well-defined. That will almost always mean documented, and usually predictable though obviously e.g. RNGs and iterating in random order will not be predictable by design.

That's actually more semantically constrained than what Swift calls a `Collection` (which requires conforming types to be multi-pass and(?) finite). By contrast, Swift's `SpongeBob` protocol explicitly permits conforming single-pass, infinite, and/or unordered types.

I think you’re talking about Sequence here, I’ve lost track of your nonsense by now. Yes, the current Swift protocol named Sequence allows unordered types. You seem to keep asserting that but not actually addressing my argument, which is that allowing Sequences to be unordered with the current API is undesired and actively harmful, and should therefore be changed.

What is harmful about it?

After thinking about it, I think the harmful bit is that unordered sequences are leaking internal representation (In your example, this is causing people to be surprised when two sets with identical elements are generating different sequences/orderings based on how they were created). You are correct when you say that this problem is even true for for-in.

I would not say it is a problem. Rather, by definition, iteration involves retrieving one element after another; if you're allowed to do that with Set, then the elements of a Set are observably ordered in some way. Since it's not an OrderedSet--i.e., order doesn't matter--then the only sensible conclusion is that the order of elements obtained in a for...in loop must be arbitrary. If you think this is harmful, then you must believe that one should be prohibited from iterating over an instance of Set. Otherwise, Set is inescapably a Sequence by the Swift definition of Sequence. All extension methods on Sequence like drop(while:) are really just conveniences for common things that you can do with iterated access; to my mind, they're essentially just alternative ways of spelling various for...in loops.

I think an argument could be made that you shouldn’t be able to iterate over a set without first defining an ordering on it (even if that ordering is somewhat arbitrary). Maybe we have something like a “Sequenc(e)able” protocol which defines things which can be turned into a sequence when combined with some sort of ordering. One possible ordering could be the internal representation (At least in that case we are calling it out specifically). If I had to say “setA.arbitraryOrder.elementsEqual(setB.arbitraryOrder)” I would definitely be less surprised when it returns false even though setA == setB.

Well, that's a totally different direction, then; you're arguing that `Set` and `Dictionary` should not conform to `Sequence` altogether. That's fine (it's also a direction that some of us explored off-list a while ago), but at this point in Swift's evolution, realistically, it's not within the realm of possible changes.

I am actually suggesting something slightly different. Basically, Set and Dictionary’s conformance to Collection would have a different implementation. They would conform to another protocol declaring that they are unordered. That protocol would fill in part of the conformance to sequence/collection using a default ordering, which is mostly arbitrary, but guaranteed to produce the same ordering for the same list of elements (even across collection types). This would be safer, but a tiny bit slower than what we have now (We could also potentially develop a way for collections like set to amortize the cost). For those who need to recover speed, the new protocol would also define a property which quickly returns a sequence/iterator using the internal ordering (I arbitrarily called it .arbitraryOrder).

I believe it would not be source breaking.

That is indeed something slightly different.

In an ideal world--and my initial understanding of what you were suggesting--Set and Dictionary would each have a member like `collection`, which would expose the underlying data as a `SetCollection` or `DictionaryCollection` that in turn would conform to `Collection`; meanwhile, Set and Dictionary themselves would not offer methods such as `prefix`, or indexing by subscript, which are not compatible with being unordered. For those who want a particular ordering, there'd be something like `collection(ordered areInIncreasingOrder: (T, T) -> Bool) -> {Set|Dictionary}Collection`.

What you suggest here instead would be minimally source-breaking. However, I'm unsure of where these guarantees provide benefit to justify the performance cost. Certainly not for `first` or `dropFirst(_:)`, which still yields an arbitrary result which doesn't make sense for something _unordered_. We *could* have an underscored customization point named something like `_customOrderingPass` that is only invoked from `elementsEqual` or other such methods to pre-rearrange the internal ordering of unordered collections in some deterministic way before comparison. Is that what you have in mind?

Something like that. Whatever we do, there will be a tradeoff between speed, correctness, and ergonomics.

My suggestion trades speed for correctness, and provides a way to recover speed through additional typing (which is slightly less ergonomic).

You haven't convinced me that this is at all improved in "correctness." It trades one arbitrary iteration order for another on a type that tries to model an unordered collection.

We could do something like you suggest. I don’t think the method would need to be underscored… the ordering pass could just be a method on the protocol which defines it as unordered. Then we could provide a special conformance for things where order really matters based on adherence to that protocol. That might be an acceptable tradeoff. It would give us speed at the cost of having the correct implementation being less ergonomic and more error prone (you have to remember to check that it is unordered and call the ordering method when it mattered).

I’d still be a bit worried that people would make incorrect generic algorithms based on expecting an order from unordered things, but at least it would be possible for them check and handle it correctly. I think I could get behind that tradeoff/compromise, given where we are in the swift process and Swift's obsession with speed (though I still slightly prefer the safer default). At least the standard library would handle all the things correctly, and that is what will affect the majority of programmers.

What is an example of such an "incorrect" generic algorithm that would be made correct by such a scheme?

To start with, the one you gave as an example at the beginning of this discussion: Two sets with identical elements which have different internal storage and thus give different orderings as sequences. You yourself have argued that the confusion around this is enough of a problem that we need to make a source-breaking change (renaming it) to warn people that the results of the ‘elementsEqual’ algorithm are undefined for sets and dictionaries.

No, I am arguing that the confusion about ‘elementsEqual’ is foremost a problem with its name; the result of this operation is not at all undefined for two sets but actually clearly defined: it returns true if two sets have the same elements in the same iteration order, which is a publicly observable behavior of sets (likewise dictionaries).

But it is a behavior which has absolutely no meaning at all because the order does not depend on the elements of the set but on the history of how the set has been reached its current state.
So why should I ever use this method on a set?
What is the use case?

One example: you can use it to check an instance of Set<Float> to determine if it has a NaN value. (The “obvious” way of doing it is not guaranteed to work since NaN != NaN.)

How would I do that? I'd rather expect to use a property isNaN on Float to do that.

set.elementsEqual(set)

I see why that would work (thanks to Set being a collection vs a sequence), but it still feels like a hack. I definitely wouldn’t want to be maintaining code with that in it. Especially when compared to something like:

set.contains(where: {$0.isNaN})

The purpose of protocols is to enable useful generic code. You cannot use isNaN for code that works on generic Collection, or for that matter, even code that works on Set where Element : Numeric.

Much generic code (for example, generic sorting) easily blows up when encountering NaN. If you want an algorithm that is robust enough to handle (or at least reject) that scenario, then you will need to check for it. It is not a “hack” but a very common and accepted way of determining the presence of NaN.

Jon_Hull · October 17, 2017, 5:54pm

That ordering can be arbitrary, but it shouldn’t leak internal representation such that the method used to create identical things affects the outcome of generic methods because of differences in internal representation.

It would be better to say that the iteration order is well-defined. That will almost always mean documented, and usually predictable though obviously e.g. RNGs and iterating in random order will not be predictable by design.

That's actually more semantically constrained than what Swift calls a `Collection` (which requires conforming types to be multi-pass and(?) finite). By contrast, Swift's `SpongeBob` protocol explicitly permits conforming single-pass, infinite, and/or unordered types.

I think you’re talking about Sequence here, I’ve lost track of your nonsense by now. Yes, the current Swift protocol named Sequence allows unordered types. You seem to keep asserting that but not actually addressing my argument, which is that allowing Sequences to be unordered with the current API is undesired and actively harmful, and should therefore be changed.

What is harmful about it?

After thinking about it, I think the harmful bit is that unordered sequences are leaking internal representation (In your example, this is causing people to be surprised when two sets with identical elements are generating different sequences/orderings based on how they were created). You are correct when you say that this problem is even true for for-in.

I would not say it is a problem. Rather, by definition, iteration involves retrieving one element after another; if you're allowed to do that with Set, then the elements of a Set are observably ordered in some way. Since it's not an OrderedSet--i.e., order doesn't matter--then the only sensible conclusion is that the order of elements obtained in a for...in loop must be arbitrary. If you think this is harmful, then you must believe that one should be prohibited from iterating over an instance of Set. Otherwise, Set is inescapably a Sequence by the Swift definition of Sequence. All extension methods on Sequence like drop(while:) are really just conveniences for common things that you can do with iterated access; to my mind, they're essentially just alternative ways of spelling various for...in loops.

I think an argument could be made that you shouldn’t be able to iterate over a set without first defining an ordering on it (even if that ordering is somewhat arbitrary). Maybe we have something like a “Sequenc(e)able” protocol which defines things which can be turned into a sequence when combined with some sort of ordering. One possible ordering could be the internal representation (At least in that case we are calling it out specifically). If I had to say “setA.arbitraryOrder.elementsEqual(setB.arbitraryOrder)” I would definitely be less surprised when it returns false even though setA == setB.

Well, that's a totally different direction, then; you're arguing that `Set` and `Dictionary` should not conform to `Sequence` altogether. That's fine (it's also a direction that some of us explored off-list a while ago), but at this point in Swift's evolution, realistically, it's not within the realm of possible changes.

I am actually suggesting something slightly different. Basically, Set and Dictionary’s conformance to Collection would have a different implementation. They would conform to another protocol declaring that they are unordered. That protocol would fill in part of the conformance to sequence/collection using a default ordering, which is mostly arbitrary, but guaranteed to produce the same ordering for the same list of elements (even across collection types). This would be safer, but a tiny bit slower than what we have now (We could also potentially develop a way for collections like set to amortize the cost). For those who need to recover speed, the new protocol would also define a property which quickly returns a sequence/iterator using the internal ordering (I arbitrarily called it .arbitraryOrder).

I believe it would not be source breaking.

That is indeed something slightly different.

In an ideal world--and my initial understanding of what you were suggesting--Set and Dictionary would each have a member like `collection`, which would expose the underlying data as a `SetCollection` or `DictionaryCollection` that in turn would conform to `Collection`; meanwhile, Set and Dictionary themselves would not offer methods such as `prefix`, or indexing by subscript, which are not compatible with being unordered. For those who want a particular ordering, there'd be something like `collection(ordered areInIncreasingOrder: (T, T) -> Bool) -> {Set|Dictionary}Collection`.

What you suggest here instead would be minimally source-breaking. However, I'm unsure of where these guarantees provide benefit to justify the performance cost. Certainly not for `first` or `dropFirst(_:)`, which still yields an arbitrary result which doesn't make sense for something _unordered_. We *could* have an underscored customization point named something like `_customOrderingPass` that is only invoked from `elementsEqual` or other such methods to pre-rearrange the internal ordering of unordered collections in some deterministic way before comparison. Is that what you have in mind?

Something like that. Whatever we do, there will be a tradeoff between speed, correctness, and ergonomics.

My suggestion trades speed for correctness, and provides a way to recover speed through additional typing (which is slightly less ergonomic).

You haven't convinced me that this is at all improved in "correctness." It trades one arbitrary iteration order for another on a type that tries to model an unordered collection.

We could do something like you suggest. I don’t think the method would need to be underscored… the ordering pass could just be a method on the protocol which defines it as unordered. Then we could provide a special conformance for things where order really matters based on adherence to that protocol. That might be an acceptable tradeoff. It would give us speed at the cost of having the correct implementation being less ergonomic and more error prone (you have to remember to check that it is unordered and call the ordering method when it mattered).

I’d still be a bit worried that people would make incorrect generic algorithms based on expecting an order from unordered things, but at least it would be possible for them check and handle it correctly. I think I could get behind that tradeoff/compromise, given where we are in the swift process and Swift's obsession with speed (though I still slightly prefer the safer default). At least the standard library would handle all the things correctly, and that is what will affect the majority of programmers.

What is an example of such an "incorrect" generic algorithm that would be made correct by such a scheme?

To start with, the one you gave as an example at the beginning of this discussion: Two sets with identical elements which have different internal storage and thus give different orderings as sequences. You yourself have argued that the confusion around this is enough of a problem that we need to make a source-breaking change (renaming it) to warn people that the results of the ‘elementsEqual’ algorithm are undefined for sets and dictionaries.

No, I am arguing that the confusion about ‘elementsEqual’ is foremost a problem with its name; the result of this operation is not at all undefined for two sets but actually clearly defined: it returns true if two sets have the same elements in the same iteration order, which is a publicly observable behavior of sets (likewise dictionaries).

But it is a behavior which has absolutely no meaning at all because the order does not depend on the elements of the set but on the history of how the set has been reached its current state.
So why should I ever use this method on a set?
What is the use case?

One example: you can use it to check an instance of Set<Float> to determine if it has a NaN value. (The “obvious” way of doing it is not guaranteed to work since NaN != NaN.)

How would I do that? I'd rather expect to use a property isNaN on Float to do that.

set.elementsEqual(set)

I see why that would work (thanks to Set being a collection vs a sequence), but it still feels like a hack. I definitely wouldn’t want to be maintaining code with that in it. Especially when compared to something like:

set.contains(where: {$0.isNaN})

The purpose of protocols is to enable useful generic code. You cannot use isNaN for code that works on generic Collection, or for that matter, even code that works on Set where Element : Numeric.

Much generic code (for example, generic sorting) easily blows up when encountering NaN. If you want an algorithm that is robust enough to handle (or at least reject) that scenario, then you will need to check for it. It is not a “hack” but a very common and accepted way of determining the presence of NaN.

Just because a hack is commonly accepted or common doesn’t mean it isn’t a hack. Using elementsEqual in this way (in a generic context which may or may not involve floats) is definitely a hack, and is likely to bite you at some point.

I don’t see why a non-source-breaking change is suddenly off-limits.

But more than that, any generic algorithm which is assuming that the sequence is coming from an ordered source (i.e. many things using first/last). Some uses of first are ok because the programmer actually means ‘any’, but anywhere where they actually mean first/last may be problematic.

Such as...?

Currently, there is no way to test for ordered-ness, so there is no way for even a careful programmer to mitigate this problem. By adding a protocol which states that something is unordered, we can either branch on it, or create a separate version of an algorithm for things which conform.

It is clearly the case that Swift’s protocol hierarchy fits sets and collections imperfectly; however, it is in the nature of modeling that imperfections are present. The question is not whether it is possible to incur performance, API surface area, and other trade-offs to make the model more faithful, but rather whether this usefully solves any problem. What is the problem being mitigated? As I write above, Swift’s Set and Dictionary types meet the semantic requirements for Collection and moonlight as ordered collections. What is a generic algorithm on an ordered collection that is “not OK” for Set and Dictionary? (“elementsEqual”, as I’ve said, is not such an example.)

On the contrary, `elementsEqual` is exactly such an example, because it makes no sense to use it on a Set.

let s1 = Set([1,2,3,4,5,6])
let s2 = Set([6,5,4,3,2,1])

Both sets have different iteration orders. Comparing those sets with some other collection using `elementsEqual` will give no meaningful result because the order - and therefore the result of `elementsEqual` - is in effect random.

No, it is not such an example; it’s misleadingly named but works correctly—that is, its behavior matches exactly the documented behavior, which relies on only the semantic guarantees of Sequence, which Set correctly fulfills.

Fulfills to the letter. Again, what can you do with it if the result is random??

The result is not random.

It is undefined though. As you said earlier, by the guarantees we have been given, it may shift over different builds/runs of a program. Thus in one run, it might return true and then false in another (without changing our code). As Greg pointed out, it is also possible with the guarantees we are given, for set/dict to have different orderings with copies of themselves. (It will happen for sure when deep copying a dictionary with reference-type keys).

As I wrote to Greg, if that is a possibility for Set, then the implementation is problematic for conformance to Collection and needs to be re-examined.

So why not fix both things at once with a single change?

I’m not sure why you claim the order of a dictionary will definitiely change on deep copying. But in any case, we are not talking about deep copying here, or at least, I’m not; we’re talking about the notional copying involved in the semantics of passing a set as an argument.

Because the ordering of some dictionaries depends on the memory address of the key. There is no way to copy the key in those cases without changing the ordering. You can watch their ordering shift in the debugger from run to run.

As I keep saying, relying on the behavior of a leaking internal implementation is a bad plan.

Again, it is not a leaking internal implementation. It is a public API guarantee.

The public API guarantee is that there will be some ordering that is stable for a particular instance on a particular run (as long as it is not mutated/copied). The actual ordering is undefined.

Any generic code which depends on a particular ordering will have output which is undefined. Just because a function returns a seemingly predictable value doesn’t mean it is entirely deterministic. There are lots of vulnerabilities in C/C++ due to reliance on undefined behavior.

We should add an additional guarantee to set/dict that the order returned will be the same for the same contents regardless of history (but can be otherwise arbitrary). That will fix the behavior for algorithms like elementsEqual (i.e. They will return the same result across builds/runs). It will also implicitly provide as a result, the constraint you were arguing is needed across copies of a collection type. I agree that that is an important guarantee. Why not fix both issues with a single non-source-breaking change?

As I wrote, the behavior of elementsEqual is not broken and does not require fixing.

Your argument for that is tautological though. You could argue for keeping any bug in the codebase using the same reasoning.

Why was elementsEqual created? Isn’t it meant to check equality of two sequences in a generic context where == isn’t available? Isn’t it problematic that two equivalent sets may not be equivalent in a generic context?

What is the benefit of the guarantee you propose that justifies the performance cost?

The benefit is that generic algorithms which rely on ordering of unordered things would be much more robust. It would solve your copy issue as well.

You are acting like there is a huge performance cost. There isn’t. For Set, it would still have the traditional performance characteristics for Sets. Any slowdown (by un-cutting corners) would be amortized over insertions. (note: I am talking option #3 from my summary here, not #5, which was the one with an actual performance cost)

Why is the source-breaking change of renaming things better?

Because, in my analysis, the problem is that the method is incorrectly named. The problem affects all types that conform to Sequence and not just Set and Dictionary; elementsEqual is a distinct function from ==, and it must either continue to be distinct or cease to exist, but its name does nothing to clarify any distinction.

It also won’t fix the original problem you mention as motivation. People will still be surprised at equivalent sets having different iteration orders regardless of what the function is named.

Thanks,
Jon

···

On Oct 17, 2017, at 9:34 AM, Xiaodi Wu <xiaodi.wu@gmail.com> wrote:
On Tue, Oct 17, 2017 at 09:42 Jonathan Hull <jhull@gbis.com <mailto:jhull@gbis.com>> wrote:

On Oct 17, 2017, at 5:44 AM, Xiaodi Wu <xiaodi.wu@gmail.com <mailto:xiaodi.wu@gmail.com>> wrote:
On Tue, Oct 17, 2017 at 00:56 Thorsten Seitz <tseitz42@icloud.com <mailto:tseitz42@icloud.com>> wrote:
Am 17.10.2017 um 00:13 schrieb Xiaodi Wu <xiaodi.wu@gmail.com <mailto:xiaodi.wu@gmail.com>>:

On Mon, Oct 16, 2017 at 14:21 Thorsten Seitz <tseitz42@icloud.com <mailto:tseitz42@icloud.com>> wrote:

Am 16.10.2017 um 16:20 schrieb Xiaodi Wu via swift-evolution <swift-evolution@swift.org <mailto:swift-evolution@swift.org>>:
On Mon, Oct 16, 2017 at 05:48 Jonathan Hull <jhull@gbis.com <mailto:jhull@gbis.com>> wrote:

On Oct 15, 2017, at 9:58 PM, Xiaodi Wu <xiaodi.wu@gmail.com <mailto:xiaodi.wu@gmail.com>> wrote:
On Sun, Oct 15, 2017 at 8:51 PM, Jonathan Hull <jhull@gbis.com <mailto:jhull@gbis.com>> wrote:

On Oct 14, 2017, at 10:48 PM, Xiaodi Wu <xiaodi.wu@gmail.com <mailto:xiaodi.wu@gmail.com>> wrote:

trs · October 17, 2017, 4:01pm

To start with, the one you gave as an example at the beginning of this discussion: Two sets with identical elements which have different internal storage and thus give different orderings as sequences. You yourself have argued that the confusion around this is enough of a problem that we need to make a source-breaking change (renaming it) to warn people that the results of the ‘elementsEqual’ algorithm are undefined for sets and dictionaries.

No, I am arguing that the confusion about ‘elementsEqual’ is foremost a problem with its name; the result of this operation is not at all undefined for two sets but actually clearly defined: it returns true if two sets have the same elements in the same iteration order, which is a publicly observable behavior of sets (likewise dictionaries).

But that iteration order is undefined and could easily change due to changes in the private/internal structure of sets/dictionaries. Algorithms that rely on that “publicly observable behavior” (i.e. leaking of internals) will suddenly break.

And an algorithm in which such “sudden breakage” would occur is…?

Here are a few off the top of my head:

func hasPrefix(Sequence)->Bool
func hasSuffix(Sequence)->Bool
func containsSubsequence(Sequence)->Bool

What do these methods mean with regards to Set’s “publicly observable behavior”?

In what way do these algorithms break? They would continue to determine--correctly--whether an instance of Set, when iterated, begins with, ends with, or contains (respectively) a subsequence that matches the argument.

Why do you not answe the question, what these methods *mean* for a Set?
Still waiting for a use case.

The method means exactly what I just said: the iteration order of one set matches the iteration order of another sequence. I’ve given you one use case and others have given more.

Sorry, the use case you gave was just a clever trick instead of using `contains` and Float.isNaN. No one else has provided a use case for `elementsEqual` yet.

-Thorsten

···

Am 17.10.2017 um 14:46 schrieb Xiaodi Wu <xiaodi.wu@gmail.com>:

On Tue, Oct 17, 2017 at 01:03 Thorsten Seitz <tseitz42@icloud.com> wrote:

Am 17.10.2017 um 01:43 schrieb Xiaodi Wu via swift-evolution <swift-evolution@swift.org>:

On Mon, Oct 16, 2017 at 6:10 PM, Jonathan Hull <jhull@gbis.com> wrote:

On Oct 16, 2017, at 1:05 PM, Xiaodi Wu <xiaodi.wu@gmail.com> wrote:

On Mon, Oct 16, 2017 at 10:49 Jonathan Hull <jhull@gbis.com> wrote:

On Oct 16, 2017, at 7:20 AM, Xiaodi Wu <xiaodi.wu@gmail.com> wrote:

-Thorsten

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution

gwendal.roue · October 17, 2017, 7:03am

Modeling is, by definition, imperfect. The question is, what imperfect model is most useful _to Swift_. The idea is that conforming Set and Dictionary to Collection is on balance more useful than not doing so; that having two protocols, Sequence and Collection, is on balance more useful than having one or three, and that the set of semantic guarantees of Collection are on balance more useful than other possible sets of semantic guarantees.

That is your idea which is disputed and underlined with arguments whereas you keep repeating that Set behaves as dictated by its conformance without giving use cases why this should be useful.

Hello,

You can't *control* the ordering of a set or a dictionary, but you can still *rely* on it.

For example, to find a key in a dictionary that is associated a given value, you can rely on the fact that a dictionary's order is guaranteed to be stable, and that on top of that its indexes can address the dictionary itself, but also its keys and values sequences. The code below has no bug;

let dict = ["a": "foo", "b": "bar", "c": "needle"]

// Find a key associated with "needle"
if let index = dict.values.index(of: "needle") {
let key = dict.keys[index]
print(key) // prints "c"
}

It's more difficult to find a use case for set's ordering and indexes. But since you ask, here is an example. The goal is to find any element which is not equal to another value, in any collection:

extension Collection where Element: Equatable {
    /// Returns any element which is not equal to the given element
    func anyElement(notEqualTo v: Element) -> Element? {
        if let i = index(of: v) {
            if let alt = index(i, offsetBy: 1, limitedBy: endIndex), alt != endIndex {
                return self[alt]
            }
            if i == startIndex {
                return nil
            }
            return first
        }
        return first
    }
}

Set([1, 2, 3]).anyElement(notEqualTo: 1) // 2 or 3
Set([1, 2]).anyElement(notEqualTo: 1) // 2
Set([1]).anyElement(notEqualTo: 1) // nil
Set([2]).anyElement(notEqualTo: 1) // 2
Set().anyElement(notEqualTo: 1) // nil

That *can* be useful, isn't it?

Gwendal Roué

Michael_Ilseman · October 17, 2017, 6:47pm

Because, in my analysis, the problem is that the method is incorrectly named. The problem affects all types that conform to Sequence and not just Set and Dictionary; elementsEqual is a distinct function from ==, and it must either continue to be distinct or cease to exist, but its name does nothing to clarify any distinction.

In my analysis, the problem is the method's implementation. As I see it, the only use for `elementsEqual` is as a replacement for `==` when two objects are different types (or not known to be the same)—equal elements, and IF the sequences have an order, in the same order. Could you provide an example where `elementsEqual` randomly returning either true or false depending on internal state alone is a legitimate and desirable result?

It doesn’t randomly return true or false, it consistently returns true or false for the *same* pair of Sequences. What *same* means, of course, is complicated and exists at two levels (as we have two ways of talking about *same*).

I apologize for not reading every email in depth in this thread (they are coming in faster than I can parse them), but let me try to present motivation for this and hopefully provide more shared understanding.

We have two forms of equality we’re talking about: equality of Sequence and equality of the elements of Sequences in their respective ordering. `==` covers the former, and I’ll use the existing (harmful) name of `elementsEqual` for the latter.

`==` conveys substitutability of the two Sequences. This does not necessarily entail anything about their elements, how those elements are ordered, etc., it just means two Sequences are substitutable. `elementsEqual` means that the two Sequences produce substitutable elements. These are different concepts and both are independently useful.

Cases:

1. Two Sequences are substitutable and produce substitutable elements when iterated. `==` and `elementsEqual` both return true.

Example: Two arrays with the same elements in the same order.

2. Two Sequences are substitutable, but do not produce substitutable elements when iterated. `==` returns true, while `elementsEqual` returns false.

Example: Two Sets that contain the same elements but in a different order.

Contrived Example: Two Lorem Ipsum generators are the same generator (referentially equal, substitutable for the purposes of my library), but they sample the user’s current battery level (global state) each time they produce text to decide how fancy to make the faux Latin. They’re substitutable, but don’t generate the same sequence.

3. Two Sequences are not substitutable, but produce substitutable elements when iterated. `==` returns false, while `elementsEqual` returns true.

Example: Consider two sequences that have differing identity. `==` operates on an identity level, `elementsEqual` operates at an element level.

Contrived Example: InfiniteMonkeys and Shakespeare both produce the same sonnet, but they’re not substitutable for my library’s purposes.

4. Two Sequences are not substitutable and don’t produce substitutable elements when iterated. `==` and `elementsEqual` both return false.

Example: `[1,2,3]` compared to `[4,5,6]`

It is true that situations #2 and #3 are a little harder to grok, but they are what illustrate the subtle difference at hand. I think situation #2 is the most confusing, and has been the primary focus of this thread as Set exists and exhibits it.

Now, onto naming. `elementsEqual` is a very poor choice of name for the concept of equality of elements in their respective orderings, as it doesn’t highlight the “in their respective orderings” part. `lexicographicallyEqual` highlights the ordering much better, as “abc” is not lexicographically equal to “cba” despite having equal elements. I think it is clearly an improvement over the status quo. I like something a little more explicit (e.g. `elementsOrderedEqual`), personally, but I don’t care that strongly. I’m just glad to see `elementsEqual` getting some clarification.

···

On Oct 17, 2017, at 10:15 AM, Kevin Nattinger via swift-evolution <swift-evolution@swift.org> wrote:

Thanks,
Jon
_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org <mailto:swift-evolution@swift.org>
https://lists.swift.org/mailman/listinfo/swift-evolution

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution

xwu · October 17, 2017, 12:52pm

That ordering can be arbitrary, but it shouldn’t leak internal

representation such that the method used to create identical things affects
the outcome of generic methods because of differences in internal
representation.

It would be better to say that the iteration order is
well-defined. That will almost always mean documented, and usually
predictable though obviously e.g. RNGs and iterating in random order will
not be predictable by design.

That's actually more semantically constrained than what Swift
calls a `Collection` (which requires conforming types to be multi-pass
and(?) finite). By contrast, Swift's `SpongeBob` protocol explicitly
permits conforming single-pass, infinite, and/or unordered types.

I think you’re talking about Sequence here, I’ve lost track of
your nonsense by now. Yes, the current Swift protocol named Sequence allows
unordered types. You seem to keep asserting that but not actually
addressing my argument, which is *that allowing Sequences to be
unordered with the current API is undesired and actively harmful, and
should* *therefore** be changed*.

What is harmful about it?

After thinking about it, I think the harmful bit is that unordered
sequences are leaking internal representation (In your example, this is
causing people to be surprised when two sets with identical elements are
generating different sequences/orderings based on how they were created).
You are correct when you say that this problem is even true for for-in.

I would not say it is a problem. Rather, by definition, iteration
involves retrieving one element after another; if you're allowed to do that
with Set, then the elements of a Set are observably ordered in some way.
Since it's not an OrderedSet--i.e., order doesn't matter--then the only
sensible conclusion is that the order of elements obtained in a for...in
loop must be arbitrary. If you think this is harmful, then you must believe
that one should be prohibited from iterating over an instance of Set.
Otherwise, Set is inescapably a Sequence by the Swift definition of
Sequence. All extension methods on Sequence like drop(while:) are really
just conveniences for common things that you can do with iterated access;
to my mind, they're essentially just alternative ways of spelling various
for...in loops.

I think an argument could be made that you shouldn’t be able to
iterate over a set without first defining an ordering on it (even if that
ordering is somewhat arbitrary). Maybe we have something like a
“Sequenc(e)able” protocol which defines things which can be turned into a
sequence when combined with some sort of ordering. One possible ordering
could be the internal representation (At least in that case we are calling
it out specifically). If I had to say
“setA.arbitraryOrder.elementsEqual(setB.arbitraryOrder)” I would definitely
be less surprised when it returns false even though setA == setB.

Well, that's a totally different direction, then; you're arguing that
`Set` and `Dictionary` should not conform to `Sequence` altogether. That's
fine (it's also a direction that some of us explored off-list a while ago),
but at this point in Swift's evolution, realistically, it's not within the
realm of possible changes.

I am actually suggesting something slightly different. Basically,
Set and Dictionary’s conformance to Collection would have a different
implementation. They would conform to another protocol declaring that they
are unordered. That protocol would fill in part of the conformance to
sequence/collection using a default ordering, which is mostly arbitrary,
but guaranteed to produce the same ordering for the same list of elements
(even across collection types). This would be safer, but a tiny bit slower
than what we have now (We could also potentially develop a way for
collections like set to amortize the cost). For those who need to recover
speed, the new protocol would also define a property which quickly returns
a sequence/iterator using the internal ordering (I arbitrarily called it
.arbitraryOrder).

I believe it would not be source breaking.

That is indeed something slightly different.

In an ideal world--and my initial understanding of what you were
suggesting--Set and Dictionary would each have a member like `collection`,
which would expose the underlying data as a `SetCollection` or
`DictionaryCollection` that in turn would conform to `Collection`;
meanwhile, Set and Dictionary themselves would not offer methods such as
`prefix`, or indexing by subscript, which are not compatible with being
unordered. For those who want a particular ordering, there'd be something
like `collection(ordered areInIncreasingOrder: (T, T) -> Bool) ->
{Set|Dictionary}Collection`.

What you suggest here instead would be minimally source-breaking.
However, I'm unsure of where these guarantees provide benefit to justify
the performance cost. Certainly not for `first` or `dropFirst(_:)`, which
still yields an arbitrary result which doesn't make sense for something
_unordered_. We *could* have an underscored customization point named
something like `_customOrderingPass` that is only invoked from
`elementsEqual` or other such methods to pre-rearrange the internal
ordering of unordered collections in some deterministic way before
comparison. Is that what you have in mind?

Something like that. Whatever we do, there will be a tradeoff between
speed, correctness, and ergonomics.

My suggestion trades speed for correctness, and provides a way to
recover speed through additional typing (which is slightly less ergonomic).

You haven't convinced me that this is at all improved in "correctness."
It trades one arbitrary iteration order for another on a type that tries to
model an unordered collection.

We could do something like you suggest. I don’t think the method would
need to be underscored… the ordering pass could just be a method on the
protocol which defines it as unordered. Then we could provide a special
conformance for things where order really matters based on adherence to
that protocol. That might be an acceptable tradeoff. It would give us
speed at the cost of having the correct implementation being less ergonomic
and more error prone (you have to remember to check that it is unordered
and call the ordering method when it mattered).

I’d still be a bit worried that people would make incorrect generic
algorithms based on expecting an order from unordered things, but at least
it would be possible for them check and handle it correctly. I think I
could get behind that tradeoff/compromise, given where we are in the swift
process and Swift's obsession with speed (though I still slightly prefer
the safer default). At least the standard library would handle all the
things correctly, and that is what will affect the majority of programmers.

What is an example of such an "incorrect" generic algorithm that would
be made correct by such a scheme?

To start with, the one you gave as an example at the beginning of this
discussion: Two sets with identical elements which have different internal
storage and thus give different orderings as sequences. You yourself have
argued that the confusion around this is enough of a problem that we need
to make a source-breaking change (renaming it) to warn people that the
results of the ‘elementsEqual’ algorithm are undefined for sets and
dictionaries.

No, I am arguing that the confusion about ‘elementsEqual’ is foremost a
problem with its name; the result of this operation is not at all undefined
for two sets but actually clearly defined: it returns true if two sets have
the same elements in the same iteration order, which is a publicly
observable behavior of sets (likewise dictionaries).

How is the iteration order of an unordered set or dictionary “publicly
observable”? If either is implemented such that it can asynchronously
optimize its storage (maybe by rebalancing a tree or merging two
non-contiguous array segments or something), its iteration order could
change *without* changing what values it contains. Seems like
consecutive calls to “elementsEquals” (or whatever we’re calling it) should
return the same answer, if we don’t add, remove, or mutate elements.

Sets are values. If you add, remove, or mutate any elements you have a
different Set and thus a potentially different ordering of elements.

From the “value semantics” PoV, yes. But from the “unordered collection
of values” PoV, Sets/Dictionaries, being unordered, are semantically free
to rearrange the in-memory ordering of their elements *without* user
intervention.

No, they are not semantically free to do so. The semantics of Collection
forbid it, because the iteration order must be multi-pass. As long as the
value is unchanged, the iteration order is unchanged. That is a documented,
public guarantee of the API.

If the semantics of unordered collections (with a lowercase c) and of
`Collection` (with an uppercase C) differ on such a basic level, then why
are we trying to force them together? I mean, I understand that
source-compatibility is important, but so is correctly modeling that which
we claim to model.

Modeling is, by definition, imperfect. The question is, what imperfect
model is most useful _to Swift_. The idea is that conforming Set and
Dictionary to Collection is on balance more useful than not doing so; that
having two protocols, Sequence and Collection, is on balance more useful
than having one or three, and that the set of semantic guarantees of
Collection are on balance more useful than other possible sets of semantic
guarantees.

That is your idea

It most certainly is not my idea.

which is disputed

Which part?

and underlined with arguments whereas you keep repeating that Set behaves

as dictated by its conformance without giving use cases why this should be
useful.

Where what is useful? Set’s conformance to Collection? Collection’s
semantic guarantees? Having two protocols named Sequence and Collection?

Set having elementsEqual is an emergent property of these three factors. It
doesn’t have to be independently useful (though it isn’t useless). The
overall design which entails its existence merely has to be superior to the
alternative design that doesn’t include it.

···

On Tue, Oct 17, 2017 at 01:24 Thorsten Seitz <tseitz42@icloud.com> wrote:

Am 17.10.2017 um 03:20 schrieb Xiaodi Wu via swift-evolution < > swift-evolution@swift.org>:
On Mon, Oct 16, 2017 at 8:03 PM, David Sweeris <davesweeris@mac.com> > wrote:

On Oct 16, 2017, at 1:07 PM, Xiaodi Wu <xiaodi.wu@gmail.com> wrote:
On Mon, Oct 16, 2017 at 13:15 David Sweeris <davesweeris@mac.com> wrote:

On Oct 16, 2017, at 09:21, Michael Ilseman <milseman@apple.com> wrote:
On Oct 16, 2017, at 8:46 AM, David Sweeris via swift-evolution < >>> swift-evolution@swift.org> wrote:
On Oct 16, 2017, at 07:20, Xiaodi Wu via swift-evolution < >>> swift-evolution@swift.org> wrote:
On Mon, Oct 16, 2017 at 05:48 Jonathan Hull <jhull@gbis.com> wrote:

On Oct 15, 2017, at 9:58 PM, Xiaodi Wu <xiaodi.wu@gmail.com> wrote:
On Sun, Oct 15, 2017 at 8:51 PM, Jonathan Hull <jhull@gbis.com> wrote:

On Oct 14, 2017, at 10:48 PM, Xiaodi Wu <xiaodi.wu@gmail.com> wrote:

-Thorsten

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution

xwu · October 17, 2017, 4:36pm

To start with, the one you gave as an example at the beginning of this

discussion: Two sets with identical elements which have different internal
storage and thus give different orderings as sequences. You yourself have
argued that the confusion around this is enough of a problem that we need
to make a source-breaking change (renaming it) to warn people that the
results of the ‘elementsEqual’ algorithm are undefined for sets and
dictionaries.

No, I am arguing that the confusion about ‘elementsEqual’ is foremost a
problem with its name; the result of this operation is not at all undefined
for two sets but actually clearly defined: it returns true if two sets have
the same elements in the same iteration order, which is a publicly
observable behavior of sets (likewise dictionaries).

But that iteration order is undefined and could easily change due to
changes in the private/internal structure of sets/dictionaries. Algorithms
that rely on that “publicly observable behavior” (i.e. leaking of
internals) will suddenly break.

And an algorithm in which such “sudden breakage” would occur is…?

Here are a few off the top of my head:

func hasPrefix(Sequence)->Bool
func hasSuffix(Sequence)->Bool
func containsSubsequence(Sequence)->Bool

What do these methods mean with regards to Set’s “publicly observable
behavior”?

In what way do these algorithms break? They would continue to
determine--correctly--whether an instance of Set, when iterated, begins
with, ends with, or contains (respectively) a subsequence that matches the
argument.

Why do you not answe the question, what these methods *mean* for a Set?
Still waiting for a use case.

The method means exactly what I just said: the iteration order of one set
matches the iteration order of another sequence. I’ve given you one use
case and others have given more.

Sorry, the use case you gave was just a clever trick instead of using
`contains` and Float.isNaN. No one else has provided a use case for
`elementsEqual` yet.

The purpose of protocols is to enable useful generic code. You cannot use
isNaN in generic code that operates on Collection (or even Set where
Element : Numeric) to prevent NaN from blowing up your sort. Comparing a
value to itself is not a “clever trick” but a common and ordinary method of
checking for NaN.

···

On Tue, Oct 17, 2017 at 11:01 Thorsten Seitz <tseitz42@icloud.com> wrote:

Am 17.10.2017 um 14:46 schrieb Xiaodi Wu <xiaodi.wu@gmail.com>:
On Tue, Oct 17, 2017 at 01:03 Thorsten Seitz <tseitz42@icloud.com> wrote:

Am 17.10.2017 um 01:43 schrieb Xiaodi Wu via swift-evolution < >> swift-evolution@swift.org>:
On Mon, Oct 16, 2017 at 6:10 PM, Jonathan Hull <jhull@gbis.com> wrote:

On Oct 16, 2017, at 1:05 PM, Xiaodi Wu <xiaodi.wu@gmail.com> wrote:
On Mon, Oct 16, 2017 at 10:49 Jonathan Hull <jhull@gbis.com> wrote:

On Oct 16, 2017, at 7:20 AM, Xiaodi Wu <xiaodi.wu@gmail.com> wrote:

-Thorsten

-Thorsten

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution

xwu · October 17, 2017, 6:46pm

That ordering can be arbitrary, but it shouldn’t leak internal

representation such that the method used to create identical things affects
the outcome of generic methods because of differences in internal
representation.

It would be better to say that the iteration order is
well-defined. That will almost always mean documented, and usually
predictable though obviously e.g. RNGs and iterating in random order will
not be predictable by design.

That's actually more semantically constrained than what Swift
calls a `Collection` (which requires conforming types to be multi-pass
and(?) finite). By contrast, Swift's `SpongeBob` protocol explicitly
permits conforming single-pass, infinite, and/or unordered types.

I think you’re talking about Sequence here, I’ve lost track of
your nonsense by now. Yes, the current Swift protocol named Sequence allows
unordered types. You seem to keep asserting that but not actually
addressing my argument, which is *that allowing Sequences to be
unordered with the current API is undesired and actively harmful, and
should* *therefore** be changed*.

What is harmful about it?

After thinking about it, I think the harmful bit is that unordered
sequences are leaking internal representation (In your example, this is
causing people to be surprised when two sets with identical elements are
generating different sequences/orderings based on how they were created).
You are correct when you say that this problem is even true for for-in.

I would not say it is a problem. Rather, by definition, iteration
involves retrieving one element after another; if you're allowed to do that
with Set, then the elements of a Set are observably ordered in some way.
Since it's not an OrderedSet--i.e., order doesn't matter--then the only
sensible conclusion is that the order of elements obtained in a for...in
loop must be arbitrary. If you think this is harmful, then you must believe
that one should be prohibited from iterating over an instance of Set.
Otherwise, Set is inescapably a Sequence by the Swift definition of
Sequence. All extension methods on Sequence like drop(while:) are really
just conveniences for common things that you can do with iterated access;
to my mind, they're essentially just alternative ways of spelling various
for...in loops.

I think an argument could be made that you shouldn’t be able to
iterate over a set without first defining an ordering on it (even if that
ordering is somewhat arbitrary). Maybe we have something like a
“Sequenc(e)able” protocol which defines things which can be turned into a
sequence when combined with some sort of ordering. One possible ordering
could be the internal representation (At least in that case we are calling
it out specifically). If I had to say
“setA.arbitraryOrder.elementsEqual(setB.arbitraryOrder)” I would definitely
be less surprised when it returns false even though setA == setB.

Well, that's a totally different direction, then; you're arguing
that `Set` and `Dictionary` should not conform to `Sequence` altogether.
That's fine (it's also a direction that some of us explored off-list a
while ago), but at this point in Swift's evolution, realistically, it's not
within the realm of possible changes.

I am actually suggesting something slightly different. Basically,
Set and Dictionary’s conformance to Collection would have a different
implementation. They would conform to another protocol declaring that they
are unordered. That protocol would fill in part of the conformance to
sequence/collection using a default ordering, which is mostly arbitrary,
but guaranteed to produce the same ordering for the same list of elements
(even across collection types). This would be safer, but a tiny bit slower
than what we have now (We could also potentially develop a way for
collections like set to amortize the cost). For those who need to recover
speed, the new protocol would also define a property which quickly returns
a sequence/iterator using the internal ordering (I arbitrarily called it
.arbitraryOrder).

I believe it would not be source breaking.

That is indeed something slightly different.

In an ideal world--and my initial understanding of what you were
suggesting--Set and Dictionary would each have a member like `collection`,
which would expose the underlying data as a `SetCollection` or
`DictionaryCollection` that in turn would conform to `Collection`;
meanwhile, Set and Dictionary themselves would not offer methods such as
`prefix`, or indexing by subscript, which are not compatible with being
unordered. For those who want a particular ordering, there'd be something
like `collection(ordered areInIncreasingOrder: (T, T) -> Bool) ->
{Set|Dictionary}Collection`.

What you suggest here instead would be minimally source-breaking.
However, I'm unsure of where these guarantees provide benefit to justify
the performance cost. Certainly not for `first` or `dropFirst(_:)`, which
still yields an arbitrary result which doesn't make sense for something
_unordered_. We *could* have an underscored customization point named
something like `_customOrderingPass` that is only invoked from
`elementsEqual` or other such methods to pre-rearrange the internal
ordering of unordered collections in some deterministic way before
comparison. Is that what you have in mind?

Something like that. Whatever we do, there will be a tradeoff
between speed, correctness, and ergonomics.

My suggestion trades speed for correctness, and provides a way to
recover speed through additional typing (which is slightly less ergonomic).

You haven't convinced me that this is at all improved in
"correctness." It trades one arbitrary iteration order for another on a
type that tries to model an unordered collection.

We could do something like you suggest. I don’t think the method
would need to be underscored… the ordering pass could just be a method on
the protocol which defines it as unordered. Then we could provide a
special conformance for things where order really matters based on
adherence to that protocol. That might be an acceptable tradeoff. It
would give us speed at the cost of having the correct implementation being
less ergonomic and more error prone (you have to remember to check that it
is unordered and call the ordering method when it mattered).

I’d still be a bit worried that people would make incorrect generic
algorithms based on expecting an order from unordered things, but at least
it would be possible for them check and handle it correctly. I think I
could get behind that tradeoff/compromise, given where we are in the swift
process and Swift's obsession with speed (though I still slightly prefer
the safer default). At least the standard library would handle all the
things correctly, and that is what will affect the majority of programmers.

What is an example of such an "incorrect" generic algorithm that would
be made correct by such a scheme?

To start with, the one you gave as an example at the beginning of this
discussion: Two sets with identical elements which have different internal
storage and thus give different orderings as sequences. You yourself have
argued that the confusion around this is enough of a problem that we need
to make a source-breaking change (renaming it) to warn people that the
results of the ‘elementsEqual’ algorithm are undefined for sets and
dictionaries.

No, I am arguing that the confusion about ‘elementsEqual’ is foremost a
problem with its name; the result of this operation is not at all undefined
for two sets but actually clearly defined: it returns true if two sets have
the same elements in the same iteration order, which is a publicly
observable behavior of sets (likewise dictionaries).

But it is a behavior which has absolutely no meaning at all because the
order does not depend on the elements of the set but on the history of how
the set has been reached its current state.
So why should I ever use this method on a set?
What is the use case?

One example: you can use it to check an instance of Set<Float> to
determine if it has a NaN value. (The “obvious” way of doing it is not
guaranteed to work since NaN != NaN.)

How would I do that? I'd rather expect to use a property isNaN on Float
to do that.

set.elementsEqual(set)

I see why that would work (thanks to Set being a collection vs a
sequence), but it still feels like a hack. I definitely wouldn’t want to
be maintaining code with that in it. Especially when compared to something
like:

set.contains(where: {$0.isNaN})

The purpose of protocols is to enable useful generic code. You cannot use
isNaN for code that works on generic Collection, or for that matter, even
code that works on Set where Element : Numeric.

Much generic code (for example, generic sorting) easily blows up when
encountering NaN. If you want an algorithm that is robust enough to handle
(or at least reject) that scenario, then you will need to check for it. It
is not a “hack” but a very common and accepted way of determining the
presence of NaN.

Just because a hack is commonly accepted or common doesn’t mean it isn’t a
hack.

It’s not a “hack.” NaN is required by IEEE fiat not to be equivalent to
itself. Asking whether a value is reflexively equivalent to itself is
guaranteed to be sufficient to detect the presence of NaN in all
IEEE-compliant contexts, which is to say all past, present, and future
versions of Swift.

Using elementsEqual in this way (in a generic context which may or may not

involve floats) is definitely a hack, and is likely to bite you at some
point.

How is it definitely a hack when it is blessed by IEEE, and how can it bite
me?

I don’t see why a non-source-breaking change is suddenly off-limits.

But more than that, any generic algorithm which is assuming that the
sequence is coming from an ordered source (i.e. many things using
first/last). Some uses of first are ok because the programmer actually
means ‘any’, but anywhere where they actually mean first/last may be
problematic.

Such as...?

Currently, there is no way to test for ordered-ness, so there is no way

for even a careful programmer to mitigate this problem. By adding a
protocol which states that something is unordered, we can either branch on
it, or create a separate version of an algorithm for things which conform.

It is clearly the case that Swift’s protocol hierarchy fits sets and
collections imperfectly; however, it is in the nature of modeling that
imperfections are present. The question is not whether it is possible to
incur performance, API surface area, and other trade-offs to make the model
more faithful, but rather whether this usefully solves any problem. What is
the problem being mitigated? As I write above, Swift’s Set and Dictionary
types meet the semantic requirements for Collection and moonlight as
ordered collections. What is a generic algorithm on an ordered collection
that is “not OK” for Set and Dictionary? (“elementsEqual”, as I’ve said,
is not such an example.)

On the contrary, `elementsEqual` is exactly such an example, because it
makes no sense to use it on a Set.

let s1 = Set([1,2,3,4,5,6])
let s2 = Set([6,5,4,3,2,1])

Both sets have different iteration orders. Comparing those sets with
some other collection using `elementsEqual` will give no meaningful result
because the order - and therefore the result of `elementsEqual` - is in
effect random.

No, it is not such an example; it’s misleadingly named but works
correctly—that is, its behavior matches exactly the documented behavior,
which relies on only the semantic guarantees of Sequence, which Set
correctly fulfills.

Fulfills to the letter. Again, what can you do with it if the result is
random??

The result is not random.

It is undefined though. As you said earlier, by the guarantees we have
been given, it may shift over different builds/runs of a program. Thus in
one run, it might return true and then false in another (without changing
our code). As Greg pointed out, it is also possible with the guarantees we
are given, for set/dict to have different orderings with copies of
themselves. (It will happen for sure when deep copying a dictionary with
reference-type keys).

As I wrote to Greg, if that is a possibility for Set, then the
implementation is problematic for conformance to Collection and needs to be
re-examined.

So why not fix both things at once with a single change?

I’m not sure why you claim the order of a dictionary will definitiely
change on deep copying. But in any case, we are not talking about deep
copying here, or at least, I’m not; we’re talking about the notional
copying involved in the semantics of passing a set as an argument.

Because the ordering of some dictionaries depends on the memory address of
the key. There is no way to copy the key in those cases without changing
the ordering. You can watch their ordering shift in the debugger from run
to run.

I’m fine with that. Instances of reference types have identity. A deep copy
of a dictionary *has different keys* from the original. This may not be so
from the perspective of a custom == operator, but it most certainly is so
from the perspective of reference type semantics. So to me it is fine
(correct, even?) that a deep copy of a dictionary has a different iteration
order—it has different keys.

Again, this is something entirely different from notional copies made in
the course of passing values as arguments.

As I keep saying, relying on the behavior of a leaking internal

implementation is a bad plan.

Again, it is not a leaking internal implementation. It is a public API
guarantee.

The public API guarantee is that there will be some ordering that is
stable for a particular instance on a particular run (as long as it is not
mutated/copied). The actual ordering is undefined.

Any generic code which depends on a particular ordering will have output
which is undefined.

Generic code is incorrect that “depends on a particular ordering” because
it assumes semantics that are not guaranteed.

Just because a function returns a seemingly predictable value doesn’t mean

it is entirely deterministic. There are lots of vulnerabilities in C/C++
due to reliance on undefined behavior.

We should add an additional guarantee to set/dict that the order returned

will be the same for the same contents regardless of history (but can be
otherwise arbitrary). That will fix the behavior for algorithms like
elementsEqual (i.e. They will return the same result across builds/runs).
It will also implicitly provide as a result, the constraint you were
arguing is needed across copies of a collection type. I agree that that is
an important guarantee. Why not fix both issues with a single
non-source-breaking change?

As I wrote, the behavior of elementsEqual is not broken and does not
require fixing.

Your argument for that is tautological though. You could argue for
keeping any bug in the codebase using the same reasoning.

Why was elementsEqual created? Isn’t it meant to check equality of two
sequences in a generic context where == isn’t available?

No no no no no no no no. That’s precisely why the name is misleading.
elementsEqual is *not* simply a mixed-type version of ==. Remember that ==
as implemented on a concrete sequence type *has no obligation* to use
elementwise comparison using the element’s implementation of ==. This is
not merely theoretical: [Float].== does *not* do an elementwise comparison
using Float.==. By contrast, you are guaranteed a true elementwise
comparison with elementsEqual regardless of how equivalence is defined for
the sequence.

Isn’t it problematic that two equivalent sets may not be equivalent in a

generic context?

Again, this shows why the name is unfortunate. elementsEqual is not
supposed to be a generic version of ==, and its name is misleading. Repeat:
elementsEqual is *not* an alternative for == in either the generic or
concrete context. Neither one implies the other.

What is the benefit of the guarantee you propose that justifies the

performance cost?

The benefit is that generic algorithms which rely on ordering of unordered
things would be much more robust. It would solve your copy issue as well.

You are acting like there is a huge performance cost. There isn’t. For
Set, it would still have the traditional performance characteristics for
Sets. Any slowdown (by un-cutting corners) would be amortized over
insertions. (note: I am talking option #3 from my summary here, not #5,
which was the one with an actual performance cost)

Why is the source-breaking change of renaming things better?

Because, in my analysis, the problem is that the method is incorrectly
named. The problem affects all types that conform to Sequence and not just
Set and Dictionary; elementsEqual is a distinct function from ==, and it
must either continue to be distinct or cease to exist, but its name does
nothing to clarify any distinction.

It also won’t fix the original problem you mention as motivation. People
will still be surprised at equivalent sets having different iteration
orders regardless of what the function is named.

That is not the motivating problem. The motivating problem is that people
think that elementsEqual is semantically equivalent to ==. It is a distinct
method that *does not test* equivalence of two sequences.

Thanks,

···

On Tue, Oct 17, 2017 at 12:54 Jonathan Hull <jhull@gbis.com> wrote:

On Oct 17, 2017, at 9:34 AM, Xiaodi Wu <xiaodi.wu@gmail.com> wrote:
On Tue, Oct 17, 2017 at 09:42 Jonathan Hull <jhull@gbis.com> wrote:

On Oct 17, 2017, at 5:44 AM, Xiaodi Wu <xiaodi.wu@gmail.com> wrote:
On Tue, Oct 17, 2017 at 00:56 Thorsten Seitz <tseitz42@icloud.com> wrote:

Am 17.10.2017 um 00:13 schrieb Xiaodi Wu <xiaodi.wu@gmail.com>:
On Mon, Oct 16, 2017 at 14:21 Thorsten Seitz <tseitz42@icloud.com> >>> wrote:

Am 16.10.2017 um 16:20 schrieb Xiaodi Wu via swift-evolution < >>>> swift-evolution@swift.org>:
On Mon, Oct 16, 2017 at 05:48 Jonathan Hull <jhull@gbis.com> wrote:

On Oct 15, 2017, at 9:58 PM, Xiaodi Wu <xiaodi.wu@gmail.com> wrote:
On Sun, Oct 15, 2017 at 8:51 PM, Jonathan Hull <jhull@gbis.com> wrote:

On Oct 14, 2017, at 10:48 PM, Xiaodi Wu <xiaodi.wu@gmail.com> wrote:

Jon

Jon_Hull · October 17, 2017, 7:48am

Modeling is, by definition, imperfect. The question is, what imperfect model is most useful _to Swift_. The idea is that conforming Set and Dictionary to Collection is on balance more useful than not doing so; that having two protocols, Sequence and Collection, is on balance more useful than having one or three, and that the set of semantic guarantees of Collection are on balance more useful than other possible sets of semantic guarantees.

That is your idea which is disputed and underlined with arguments whereas you keep repeating that Set behaves as dictated by its conformance without giving use cases why this should be useful.

Hello,

You can't *control* the ordering of a set or a dictionary, but you can still *rely* on it.

For example, to find a key in a dictionary that is associated a given value, you can rely on the fact that a dictionary's order is guaranteed to be stable, and that on top of that its indexes can address the dictionary itself, but also its keys and values sequences. The code below has no bug;

let dict = ["a": "foo", "b": "bar", "c": "needle"]

// Find a key associated with "needle"
if let index = dict.values.index(of: "needle") {
let key = dict.keys[index]
print(key) // prints "c"
}

You are using the index from one collection to index into another collection. Isn’t that something the documentation explicitly tells us not to do? Or is there a special (documented) guarantee that these collections will always sync indices?

I guess you could do:

  if let key = dict.first(where {$1 == “needle}).0 {
    print(key)
  }

…although note that which element is “first” if there are multiple keys with needle as the value may shift around on you as you mess with the dictionary (and also on separate runs even if you don’t mess with the dictionary).

It's more difficult to find a use case for set's ordering and indexes. But since you ask, here is an example. The goal is to find any element which is not equal to another value, in any collection:

extension Collection where Element: Equatable {
    /// Returns any element which is not equal to the given element
    func anyElement(notEqualTo v: Element) -> Element? {
        if let i = index(of: v) {
            if let alt = index(i, offsetBy: 1, limitedBy: endIndex), alt != endIndex {
                return self[alt]
            }
            if i == startIndex {
                return nil
            }
            return first
        }
        return first
    }
}

Set([1, 2, 3]).anyElement(notEqualTo: 1) // 2 or 3
Set([1, 2]).anyElement(notEqualTo: 1) // 2
Set([1]).anyElement(notEqualTo: 1) // nil
Set([2]).anyElement(notEqualTo: 1) // 2
Set().anyElement(notEqualTo: 1) // nil

That *can* be useful, isn't it?

Yes, although this could easily be provided using iteration without indexing.

Note that my current favorite solution is to simply provide an additional guarantee on set/dictionary that the iteration order will always be the same for the same contents (regardless of history), but the order would still be otherwise arbitrary. That is a non-source-breaking change (renaming IS source breaking). It won’t fix everything, but it will fix some of the biggest gotchas. As a bonus, elementsEqual will 'just work'™ for sets/dictionaries.

Thanks,
Jon

···

On Oct 17, 2017, at 12:04 AM, Gwendal Roué via swift-evolution <swift-evolution@swift.org> wrote:

gwendal.roue · October 17, 2017, 7:07am

Oops, I apologize for the buggy implementation of the Collection.anyElement(notEqualTo:) method I did provide.

My point was just to show that there are useful generic algorithms that can use the observable ordering of sets and dictionaries, despite the fact that those ordering can not be controlled by the programmer.

Gwendal

···

Le 17 oct. 2017 à 09:03, Gwendal Roué via swift-evolution <swift-evolution@swift.org> a écrit :

Modeling is, by definition, imperfect. The question is, what imperfect model is most useful _to Swift_. The idea is that conforming Set and Dictionary to Collection is on balance more useful than not doing so; that having two protocols, Sequence and Collection, is on balance more useful than having one or three, and that the set of semantic guarantees of Collection are on balance more useful than other possible sets of semantic guarantees.

That is your idea which is disputed and underlined with arguments whereas you keep repeating that Set behaves as dictated by its conformance without giving use cases why this should be useful.

Hello,

You can't *control* the ordering of a set or a dictionary, but you can still *rely* on it.

For example, to find a key in a dictionary that is associated a given value, you can rely on the fact that a dictionary's order is guaranteed to be stable, and that on top of that its indexes can address the dictionary itself, but also its keys and values sequences. The code below has no bug;

let dict = ["a": "foo", "b": "bar", "c": "needle"]

// Find a key associated with "needle"
if let index = dict.values.index(of: "needle") {
    let key = dict.keys[index]
    print(key) // prints "c"
}

It's more difficult to find a use case for set's ordering and indexes. But since you ask, here is an example. The goal is to find any element which is not equal to another value, in any collection:

extension Collection where Element: Equatable {
    /// Returns any element which is not equal to the given element
    func anyElement(notEqualTo v: Element) -> Element? {
        if let i = index(of: v) {
            if let alt = index(i, offsetBy: 1, limitedBy: endIndex), alt != endIndex {
                return self[alt]
            }
            if i == startIndex {
                return nil
            }
            return first
        }
        return first
    }
}

Set([1, 2, 3]).anyElement(notEqualTo: 1) // 2 or 3
Set([1, 2]).anyElement(notEqualTo: 1) // 2
Set([1]).anyElement(notEqualTo: 1) // nil
Set([2]).anyElement(notEqualTo: 1) // 2
Set().anyElement(notEqualTo: 1) // nil

That *can* be useful, isn't it?

Gwendal Roué

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution

Jon_Hull · October 17, 2017, 7:00pm

I agree that ‘==‘ conveys substitutability. Here is the issue:

let a = Set([1,2,3,4,5])
let b = Set([5,4,3,2,1])

a == b //True, they are substitutable

[1,2,3,4,5].elementsEqual(a) //True
[1,2,3,4,5].elementsEqual(b) //False… I guess they weren’t actually substitutable after all

Thanks,
Jon

···

On Oct 17, 2017, at 11:47 AM, Michael Ilseman via swift-evolution <swift-evolution@swift.org> wrote:

`==` conveys substitutability of the two Sequences. This does not necessarily entail anything about their elements, how those elements are ordered, etc., it just means two Sequences are substitutable. `elementsEqual` means that the two Sequences produce substitutable elements. These are different concepts and both are independently useful.

Jon_Hull · October 17, 2017, 7:41pm

That ordering can be arbitrary, but it shouldn’t leak internal representation such that the method used to create identical things affects the outcome of generic methods because of differences in internal representation.

It would be better to say that the iteration order is well-defined. That will almost always mean documented, and usually predictable though obviously e.g. RNGs and iterating in random order will not be predictable by design.

That's actually more semantically constrained than what Swift calls a `Collection` (which requires conforming types to be multi-pass and(?) finite). By contrast, Swift's `SpongeBob` protocol explicitly permits conforming single-pass, infinite, and/or unordered types.

I think you’re talking about Sequence here, I’ve lost track of your nonsense by now. Yes, the current Swift protocol named Sequence allows unordered types. You seem to keep asserting that but not actually addressing my argument, which is that allowing Sequences to be unordered with the current API is undesired and actively harmful, and should therefore be changed.

What is harmful about it?

After thinking about it, I think the harmful bit is that unordered sequences are leaking internal representation (In your example, this is causing people to be surprised when two sets with identical elements are generating different sequences/orderings based on how they were created). You are correct when you say that this problem is even true for for-in.

I would not say it is a problem. Rather, by definition, iteration involves retrieving one element after another; if you're allowed to do that with Set, then the elements of a Set are observably ordered in some way. Since it's not an OrderedSet--i.e., order doesn't matter--then the only sensible conclusion is that the order of elements obtained in a for...in loop must be arbitrary. If you think this is harmful, then you must believe that one should be prohibited from iterating over an instance of Set. Otherwise, Set is inescapably a Sequence by the Swift definition of Sequence. All extension methods on Sequence like drop(while:) are really just conveniences for common things that you can do with iterated access; to my mind, they're essentially just alternative ways of spelling various for...in loops.

I think an argument could be made that you shouldn’t be able to iterate over a set without first defining an ordering on it (even if that ordering is somewhat arbitrary). Maybe we have something like a “Sequenc(e)able” protocol which defines things which can be turned into a sequence when combined with some sort of ordering. One possible ordering could be the internal representation (At least in that case we are calling it out specifically). If I had to say “setA.arbitraryOrder.elementsEqual(setB.arbitraryOrder)” I would definitely be less surprised when it returns false even though setA == setB.

Well, that's a totally different direction, then; you're arguing that `Set` and `Dictionary` should not conform to `Sequence` altogether. That's fine (it's also a direction that some of us explored off-list a while ago), but at this point in Swift's evolution, realistically, it's not within the realm of possible changes.

I am actually suggesting something slightly different. Basically, Set and Dictionary’s conformance to Collection would have a different implementation. They would conform to another protocol declaring that they are unordered. That protocol would fill in part of the conformance to sequence/collection using a default ordering, which is mostly arbitrary, but guaranteed to produce the same ordering for the same list of elements (even across collection types). This would be safer, but a tiny bit slower than what we have now (We could also potentially develop a way for collections like set to amortize the cost). For those who need to recover speed, the new protocol would also define a property which quickly returns a sequence/iterator using the internal ordering (I arbitrarily called it .arbitraryOrder).

I believe it would not be source breaking.

That is indeed something slightly different.

In an ideal world--and my initial understanding of what you were suggesting--Set and Dictionary would each have a member like `collection`, which would expose the underlying data as a `SetCollection` or `DictionaryCollection` that in turn would conform to `Collection`; meanwhile, Set and Dictionary themselves would not offer methods such as `prefix`, or indexing by subscript, which are not compatible with being unordered. For those who want a particular ordering, there'd be something like `collection(ordered areInIncreasingOrder: (T, T) -> Bool) -> {Set|Dictionary}Collection`.

What you suggest here instead would be minimally source-breaking. However, I'm unsure of where these guarantees provide benefit to justify the performance cost. Certainly not for `first` or `dropFirst(_:)`, which still yields an arbitrary result which doesn't make sense for something _unordered_. We *could* have an underscored customization point named something like `_customOrderingPass` that is only invoked from `elementsEqual` or other such methods to pre-rearrange the internal ordering of unordered collections in some deterministic way before comparison. Is that what you have in mind?

Something like that. Whatever we do, there will be a tradeoff between speed, correctness, and ergonomics.

My suggestion trades speed for correctness, and provides a way to recover speed through additional typing (which is slightly less ergonomic).

You haven't convinced me that this is at all improved in "correctness." It trades one arbitrary iteration order for another on a type that tries to model an unordered collection.

We could do something like you suggest. I don’t think the method would need to be underscored… the ordering pass could just be a method on the protocol which defines it as unordered. Then we could provide a special conformance for things where order really matters based on adherence to that protocol. That might be an acceptable tradeoff. It would give us speed at the cost of having the correct implementation being less ergonomic and more error prone (you have to remember to check that it is unordered and call the ordering method when it mattered).

I’d still be a bit worried that people would make incorrect generic algorithms based on expecting an order from unordered things, but at least it would be possible for them check and handle it correctly. I think I could get behind that tradeoff/compromise, given where we are in the swift process and Swift's obsession with speed (though I still slightly prefer the safer default). At least the standard library would handle all the things correctly, and that is what will affect the majority of programmers.

What is an example of such an "incorrect" generic algorithm that would be made correct by such a scheme?

To start with, the one you gave as an example at the beginning of this discussion: Two sets with identical elements which have different internal storage and thus give different orderings as sequences. You yourself have argued that the confusion around this is enough of a problem that we need to make a source-breaking change (renaming it) to warn people that the results of the ‘elementsEqual’ algorithm are undefined for sets and dictionaries.

No, I am arguing that the confusion about ‘elementsEqual’ is foremost a problem with its name; the result of this operation is not at all undefined for two sets but actually clearly defined: it returns true if two sets have the same elements in the same iteration order, which is a publicly observable behavior of sets (likewise dictionaries).

But it is a behavior which has absolutely no meaning at all because the order does not depend on the elements of the set but on the history of how the set has been reached its current state.
So why should I ever use this method on a set?
What is the use case?

One example: you can use it to check an instance of Set<Float> to determine if it has a NaN value. (The “obvious” way of doing it is not guaranteed to work since NaN != NaN.)

How would I do that? I'd rather expect to use a property isNaN on Float to do that.

set.elementsEqual(set)

I see why that would work (thanks to Set being a collection vs a sequence), but it still feels like a hack. I definitely wouldn’t want to be maintaining code with that in it. Especially when compared to something like:

set.contains(where: {$0.isNaN})

The purpose of protocols is to enable useful generic code. You cannot use isNaN for code that works on generic Collection, or for that matter, even code that works on Set where Element : Numeric.

Much generic code (for example, generic sorting) easily blows up when encountering NaN. If you want an algorithm that is robust enough to handle (or at least reject) that scenario, then you will need to check for it. It is not a “hack” but a very common and accepted way of determining the presence of NaN.

Just because a hack is commonly accepted or common doesn’t mean it isn’t a hack.

It’s not a “hack.” NaN is required by IEEE fiat not to be equivalent to itself. Asking whether a value is reflexively equivalent to itself is guaranteed to be sufficient to detect the presence of NaN in all IEEE-compliant contexts, which is to say all past, present, and future versions of Swift.

The problem isn’t Nan != NaN, but the fact that you are comparing T != T (where T may not even be a floating point thing) and assuming it means NaN.

Using elementsEqual in this way (in a generic context which may or may not involve floats) is definitely a hack, and is likely to bite you at some point.

How is it definitely a hack when it is blessed by IEEE, and how can it bite me?

Because it may return false for reasons other than NaN.

I don’t see why a non-source-breaking change is suddenly off-limits.

But more than that, any generic algorithm which is assuming that the sequence is coming from an ordered source (i.e. many things using first/last). Some uses of first are ok because the programmer actually means ‘any’, but anywhere where they actually mean first/last may be problematic.

Such as...?

Currently, there is no way to test for ordered-ness, so there is no way for even a careful programmer to mitigate this problem. By adding a protocol which states that something is unordered, we can either branch on it, or create a separate version of an algorithm for things which conform.

It is clearly the case that Swift’s protocol hierarchy fits sets and collections imperfectly; however, it is in the nature of modeling that imperfections are present. The question is not whether it is possible to incur performance, API surface area, and other trade-offs to make the model more faithful, but rather whether this usefully solves any problem. What is the problem being mitigated? As I write above, Swift’s Set and Dictionary types meet the semantic requirements for Collection and moonlight as ordered collections. What is a generic algorithm on an ordered collection that is “not OK” for Set and Dictionary? (“elementsEqual”, as I’ve said, is not such an example.)

On the contrary, `elementsEqual` is exactly such an example, because it makes no sense to use it on a Set.

let s1 = Set([1,2,3,4,5,6])
let s2 = Set([6,5,4,3,2,1])

Both sets have different iteration orders. Comparing those sets with some other collection using `elementsEqual` will give no meaningful result because the order - and therefore the result of `elementsEqual` - is in effect random.

No, it is not such an example; it’s misleadingly named but works correctly—that is, its behavior matches exactly the documented behavior, which relies on only the semantic guarantees of Sequence, which Set correctly fulfills.

Fulfills to the letter. Again, what can you do with it if the result is random??

The result is not random.

It is undefined though. As you said earlier, by the guarantees we have been given, it may shift over different builds/runs of a program. Thus in one run, it might return true and then false in another (without changing our code). As Greg pointed out, it is also possible with the guarantees we are given, for set/dict to have different orderings with copies of themselves. (It will happen for sure when deep copying a dictionary with reference-type keys).

As I wrote to Greg, if that is a possibility for Set, then the implementation is problematic for conformance to Collection and needs to be re-examined.

So why not fix both things at once with a single change?

I’m not sure why you claim the order of a dictionary will definitiely change on deep copying. But in any case, we are not talking about deep copying here, or at least, I’m not; we’re talking about the notional copying involved in the semantics of passing a set as an argument.

Because the ordering of some dictionaries depends on the memory address of the key. There is no way to copy the key in those cases without changing the ordering. You can watch their ordering shift in the debugger from run to run.

I’m fine with that. Instances of reference types have identity. A deep copy of a dictionary *has different keys* from the original. This may not be so from the perspective of a custom == operator, but it most certainly is so from the perspective of reference type semantics. So to me it is fine (correct, even?) that a deep copy of a dictionary has a different iteration order—it has different keys.

Again, this is something entirely different from notional copies made in the course of passing values as arguments.

As I keep saying, relying on the behavior of a leaking internal implementation is a bad plan.

Again, it is not a leaking internal implementation. It is a public API guarantee.

The public API guarantee is that there will be some ordering that is stable for a particular instance on a particular run (as long as it is not mutated/copied). The actual ordering is undefined.

Any generic code which depends on a particular ordering will have output which is undefined.

Generic code is incorrect that “depends on a particular ordering” because it assumes semantics that are not guaranteed.

Just because a function returns a seemingly predictable value doesn’t mean it is entirely deterministic. There are lots of vulnerabilities in C/C++ due to reliance on undefined behavior.

We should add an additional guarantee to set/dict that the order returned will be the same for the same contents regardless of history (but can be otherwise arbitrary). That will fix the behavior for algorithms like elementsEqual (i.e. They will return the same result across builds/runs). It will also implicitly provide as a result, the constraint you were arguing is needed across copies of a collection type. I agree that that is an important guarantee. Why not fix both issues with a single non-source-breaking change?

As I wrote, the behavior of elementsEqual is not broken and does not require fixing.

Your argument for that is tautological though. You could argue for keeping any bug in the codebase using the same reasoning.

Why was elementsEqual created? Isn’t it meant to check equality of two sequences in a generic context where == isn’t available?

No no no no no no no no. That’s precisely why the name is misleading. elementsEqual is *not* simply a mixed-type version of ==. Remember that == as implemented on a concrete sequence type *has no obligation* to use elementwise comparison using the element’s implementation of ==. This is not merely theoretical: [Float].== does *not* do an elementwise comparison using Float.==. By contrast, you are guaranteed a true elementwise comparison with elementsEqual regardless of how equivalence is defined for the sequence.

elementsEqual shouldn’t fully imply == of the original type (only that the elements are “equivalent in the same order” which is what it means for generic sequences to be equivalent).

But ‘==‘ does NEED to imply elementsEqual. Otherwise you break the meaning of ‘=='

Isn’t it problematic that two equivalent sets may not be equivalent in a generic context?

Again, this shows why the name is unfortunate. elementsEqual is not supposed to be a generic version of ==, and its name is misleading. Repeat: elementsEqual is *not* an alternative for == in either the generic or concrete context. Neither one implies the other.

What are you basing this very strong belief on? Has the core team stated that elementsEqual should not be used to check the equality of sequences?

elementsEqual can only provide a notion of equivalence within the generic context, of course (and not equivalence of the original types overall), but it is still a notion of equivalence within the context. As I said above (and in another email), ‘==‘ has to imply elementsEqual. Otherwise you break substitutability.

What is the benefit of the guarantee you propose that justifies the performance cost?

The benefit is that generic algorithms which rely on ordering of unordered things would be much more robust. It would solve your copy issue as well.

You are acting like there is a huge performance cost. There isn’t. For Set, it would still have the traditional performance characteristics for Sets. Any slowdown (by un-cutting corners) would be amortized over insertions. (note: I am talking option #3 from my summary here, not #5, which was the one with an actual performance cost)

I see you skipped responding to this.

Why is the source-breaking change of renaming things better?

Because, in my analysis, the problem is that the method is incorrectly named. The problem affects all types that conform to Sequence and not just Set and Dictionary; elementsEqual is a distinct function from ==, and it must either continue to be distinct or cease to exist, but its name does nothing to clarify any distinction.

It also won’t fix the original problem you mention as motivation. People will still be surprised at equivalent sets having different iteration orders regardless of what the function is named.

That is not the motivating problem. The motivating problem is that people think that elementsEqual is semantically equivalent to ==. It is a distinct method that *does not test* equivalence of two sequences.

The motivating problem you originally gave was that people are surprised when Set([1,2,3]).elementsEqual(Set[3,2,1]) returns false. You believe it is because they don’t get that elementsEqual works sequentially (on Sequence), and I say it is because people expect equivalent sets to produce equivalent element orders.

Thanks,
Jon

···

On Oct 17, 2017, at 11:46 AM, Xiaodi Wu <xiaodi.wu@gmail.com> wrote:
On Tue, Oct 17, 2017 at 12:54 Jonathan Hull <jhull@gbis.com <mailto:jhull@gbis.com>> wrote:

On Oct 17, 2017, at 9:34 AM, Xiaodi Wu <xiaodi.wu@gmail.com <mailto:xiaodi.wu@gmail.com>> wrote:
On Tue, Oct 17, 2017 at 09:42 Jonathan Hull <jhull@gbis.com <mailto:jhull@gbis.com>> wrote:

On Oct 17, 2017, at 5:44 AM, Xiaodi Wu <xiaodi.wu@gmail.com <mailto:xiaodi.wu@gmail.com>> wrote:
On Tue, Oct 17, 2017 at 00:56 Thorsten Seitz <tseitz42@icloud.com <mailto:tseitz42@icloud.com>> wrote:
Am 17.10.2017 um 00:13 schrieb Xiaodi Wu <xiaodi.wu@gmail.com <mailto:xiaodi.wu@gmail.com>>:

On Mon, Oct 16, 2017 at 14:21 Thorsten Seitz <tseitz42@icloud.com <mailto:tseitz42@icloud.com>> wrote:

Am 16.10.2017 um 16:20 schrieb Xiaodi Wu via swift-evolution <swift-evolution@swift.org <mailto:swift-evolution@swift.org>>:
On Mon, Oct 16, 2017 at 05:48 Jonathan Hull <jhull@gbis.com <mailto:jhull@gbis.com>> wrote:

On Oct 15, 2017, at 9:58 PM, Xiaodi Wu <xiaodi.wu@gmail.com <mailto:xiaodi.wu@gmail.com>> wrote:
On Sun, Oct 15, 2017 at 8:51 PM, Jonathan Hull <jhull@gbis.com <mailto:jhull@gbis.com>> wrote:

On Oct 14, 2017, at 10:48 PM, Xiaodi Wu <xiaodi.wu@gmail.com <mailto:xiaodi.wu@gmail.com>> wrote:

knatt · October 17, 2017, 8:18pm

set.elementsEqual(set)

I see why that would work (thanks to Set being a collection vs a sequence), but it still feels like a hack. I definitely wouldn’t want to be maintaining code with that in it. Especially when compared to something like:

set.contains(where: {$0.isNaN})

The purpose of protocols is to enable useful generic code. You cannot use isNaN for code that works on generic Collection, or for that matter, even code that works on Set where Element : Numeric.

Much generic code (for example, generic sorting) easily blows up when encountering NaN. If you want an algorithm that is robust enough to handle (or at least reject) that scenario, then you will need to check for it. It is not a “hack” but a very common and accepted way of determining the presence of NaN.

Just because a hack is commonly accepted or common doesn’t mean it isn’t a hack.

It’s not a “hack.” NaN is required by IEEE fiat not to be equivalent to itself. Asking whether a value is reflexively equivalent to itself is guaranteed to be sufficient to detect the presence of NaN in all IEEE-compliant contexts, which is to say all past, present, and future versions of Swift.

- while `a.elementsEqual(a)` returning true precludes the presence of a nan, returning false is not a guarantee there is a NaN—at least the way you have insist this method should work. The isNaN version is correct in all cases.
- It's a great deal less readable and obvious than the `where { $0.isNaN }` version, so while the term "hack" is arguably not correct (assuming you only need to disprove a NaN and not prove one exists), it is certainly not the most readable way.
- IEEE says Float.nan == Float.nan must be false; I'm pretty sure the IEEE spec doesn't say anything about how Swift Sequences must implement `elementsEqual`.