That ordering can be arbitrary, but it shouldn’t leak internal
representation such that the method used to create identical things affects
the outcome of generic methods because of differences in internal
representation.
It would be better to say that the iteration order is
well-defined. That will almost always mean documented, and usually
predictable though obviously e.g. RNGs and iterating in random order will
not be predictable by design.
That's actually more semantically constrained than what Swift
calls a `Collection` (which requires conforming types to be multi-pass
and(?) finite). By contrast, Swift's `SpongeBob` protocol explicitly
permits conforming single-pass, infinite, and/or unordered types.
I think you’re talking about Sequence here, I’ve lost track of
your nonsense by now. Yes, the current Swift protocol named Sequence allows
unordered types. You seem to keep asserting that but not actually
addressing my argument, which is *that allowing Sequences to be
unordered with the current API is undesired and actively harmful, and
should* *therefore** be changed*.
What is harmful about it?
After thinking about it, I think the harmful bit is that unordered
sequences are leaking internal representation (In your example, this is
causing people to be surprised when two sets with identical elements are
generating different sequences/orderings based on how they were created).
You are correct when you say that this problem is even true for for-in.
I would not say it is a problem. Rather, by definition, iteration
involves retrieving one element after another; if you're allowed to do that
with Set, then the elements of a Set are observably ordered in some way.
Since it's not an OrderedSet--i.e., order doesn't matter--then the only
sensible conclusion is that the order of elements obtained in a for...in
loop must be arbitrary. If you think this is harmful, then you must believe
that one should be prohibited from iterating over an instance of Set.
Otherwise, Set is inescapably a Sequence by the Swift definition of
Sequence. All extension methods on Sequence like drop(while:) are really
just conveniences for common things that you can do with iterated access;
to my mind, they're essentially just alternative ways of spelling various
for...in loops.
I think an argument could be made that you shouldn’t be able to
iterate over a set without first defining an ordering on it (even if that
ordering is somewhat arbitrary). Maybe we have something like a
“Sequenc(e)able” protocol which defines things which can be turned into a
sequence when combined with some sort of ordering. One possible ordering
could be the internal representation (At least in that case we are calling
it out specifically). If I had to say
“setA.arbitraryOrder.elementsEqual(setB.arbitraryOrder)” I would definitely
be less surprised when it returns false even though setA == setB.
Well, that's a totally different direction, then; you're arguing
that `Set` and `Dictionary` should not conform to `Sequence` altogether.
That's fine (it's also a direction that some of us explored off-list a
while ago), but at this point in Swift's evolution, realistically, it's not
within the realm of possible changes.
I am actually suggesting something slightly different. Basically,
Set and Dictionary’s conformance to Collection would have a different
implementation. They would conform to another protocol declaring that they
are unordered. That protocol would fill in part of the conformance to
sequence/collection using a default ordering, which is mostly arbitrary,
but guaranteed to produce the same ordering for the same list of elements
(even across collection types). This would be safer, but a tiny bit slower
than what we have now (We could also potentially develop a way for
collections like set to amortize the cost). For those who need to recover
speed, the new protocol would also define a property which quickly returns
a sequence/iterator using the internal ordering (I arbitrarily called it
.arbitraryOrder).
I believe it would not be source breaking.
That is indeed something slightly different.
In an ideal world--and my initial understanding of what you were
suggesting--Set and Dictionary would each have a member like `collection`,
which would expose the underlying data as a `SetCollection` or
`DictionaryCollection` that in turn would conform to `Collection`;
meanwhile, Set and Dictionary themselves would not offer methods such as
`prefix`, or indexing by subscript, which are not compatible with being
unordered. For those who want a particular ordering, there'd be something
like `collection(ordered areInIncreasingOrder: (T, T) -> Bool) ->
{Set|Dictionary}Collection`.
What you suggest here instead would be minimally source-breaking.
However, I'm unsure of where these guarantees provide benefit to justify
the performance cost. Certainly not for `first` or `dropFirst(_:)`, which
still yields an arbitrary result which doesn't make sense for something
_unordered_. We *could* have an underscored customization point named
something like `_customOrderingPass` that is only invoked from
`elementsEqual` or other such methods to pre-rearrange the internal
ordering of unordered collections in some deterministic way before
comparison. Is that what you have in mind?
Something like that. Whatever we do, there will be a tradeoff
between speed, correctness, and ergonomics.
My suggestion trades speed for correctness, and provides a way to
recover speed through additional typing (which is slightly less ergonomic).
You haven't convinced me that this is at all improved in
"correctness." It trades one arbitrary iteration order for another on a
type that tries to model an unordered collection.
We could do something like you suggest. I don’t think the method
would need to be underscored… the ordering pass could just be a method on
the protocol which defines it as unordered. Then we could provide a
special conformance for things where order really matters based on
adherence to that protocol. That might be an acceptable tradeoff. It
would give us speed at the cost of having the correct implementation being
less ergonomic and more error prone (you have to remember to check that it
is unordered and call the ordering method when it mattered).
I’d still be a bit worried that people would make incorrect generic
algorithms based on expecting an order from unordered things, but at least
it would be possible for them check and handle it correctly. I think I
could get behind that tradeoff/compromise, given where we are in the swift
process and Swift's obsession with speed (though I still slightly prefer
the safer default). At least the standard library would handle all the
things correctly, and that is what will affect the majority of programmers.
What is an example of such an "incorrect" generic algorithm that would
be made correct by such a scheme?
To start with, the one you gave as an example at the beginning of this
discussion: Two sets with identical elements which have different internal
storage and thus give different orderings as sequences. You yourself have
argued that the confusion around this is enough of a problem that we need
to make a source-breaking change (renaming it) to warn people that the
results of the ‘elementsEqual’ algorithm are undefined for sets and
dictionaries.
No, I am arguing that the confusion about ‘elementsEqual’ is foremost a
problem with its name; the result of this operation is not at all undefined
for two sets but actually clearly defined: it returns true if two sets have
the same elements in the same iteration order, which is a publicly
observable behavior of sets (likewise dictionaries).
But it is a behavior which has absolutely no meaning at all because the
order does not depend on the elements of the set but on the history of how
the set has been reached its current state.
So why should I ever use this method on a set?
What is the use case?
One example: you can use it to check an instance of Set<Float> to
determine if it has a NaN value. (The “obvious” way of doing it is not
guaranteed to work since NaN != NaN.)
How would I do that? I'd rather expect to use a property isNaN on Float
to do that.
set.elementsEqual(set)
I see why that would work (thanks to Set being a collection vs a
sequence), but it still feels like a hack. I definitely wouldn’t want to
be maintaining code with that in it. Especially when compared to something
like:
set.contains(where: {$0.isNaN})
The purpose of protocols is to enable useful generic code. You cannot use
isNaN for code that works on generic Collection, or for that matter, even
code that works on Set where Element : Numeric.
Much generic code (for example, generic sorting) easily blows up when
encountering NaN. If you want an algorithm that is robust enough to handle
(or at least reject) that scenario, then you will need to check for it. It
is not a “hack” but a very common and accepted way of determining the
presence of NaN.
Just because a hack is commonly accepted or common doesn’t mean it isn’t a
hack.
It’s not a “hack.” NaN is required by IEEE fiat not to be equivalent to
itself. Asking whether a value is reflexively equivalent to itself is
guaranteed to be sufficient to detect the presence of NaN in all
IEEE-compliant contexts, which is to say all past, present, and future
versions of Swift.
Using elementsEqual in this way (in a generic context which may or may not
involve floats) is definitely a hack, and is likely to bite you at some
point.
How is it definitely a hack when it is blessed by IEEE, and how can it bite
me?
I don’t see why a non-source-breaking change is suddenly off-limits.
But more than that, any generic algorithm which is assuming that the
sequence is coming from an ordered source (i.e. many things using
first/last). Some uses of first are ok because the programmer actually
means ‘any’, but anywhere where they actually mean first/last may be
problematic.
Such as...?
Currently, there is no way to test for ordered-ness, so there is no way
for even a careful programmer to mitigate this problem. By adding a
protocol which states that something is unordered, we can either branch on
it, or create a separate version of an algorithm for things which conform.
It is clearly the case that Swift’s protocol hierarchy fits sets and
collections imperfectly; however, it is in the nature of modeling that
imperfections are present. The question is not whether it is possible to
incur performance, API surface area, and other trade-offs to make the model
more faithful, but rather whether this usefully solves any problem. What is
the problem being mitigated? As I write above, Swift’s Set and Dictionary
types meet the semantic requirements for Collection and moonlight as
ordered collections. What is a generic algorithm on an ordered collection
that is “not OK” for Set and Dictionary? (“elementsEqual”, as I’ve said,
is not such an example.)
On the contrary, `elementsEqual` is exactly such an example, because it
makes no sense to use it on a Set.
let s1 = Set([1,2,3,4,5,6])
let s2 = Set([6,5,4,3,2,1])
Both sets have different iteration orders. Comparing those sets with
some other collection using `elementsEqual` will give no meaningful result
because the order - and therefore the result of `elementsEqual` - is in
effect random.
No, it is not such an example; it’s misleadingly named but works
correctly—that is, its behavior matches exactly the documented behavior,
which relies on only the semantic guarantees of Sequence, which Set
correctly fulfills.
Fulfills to the letter. Again, what can you do with it if the result is
random??
The result is not random.
It is undefined though. As you said earlier, by the guarantees we have
been given, it may shift over different builds/runs of a program. Thus in
one run, it might return true and then false in another (without changing
our code). As Greg pointed out, it is also possible with the guarantees we
are given, for set/dict to have different orderings with copies of
themselves. (It will happen for sure when deep copying a dictionary with
reference-type keys).
As I wrote to Greg, if that is a possibility for Set, then the
implementation is problematic for conformance to Collection and needs to be
re-examined.
So why not fix both things at once with a single change?
I’m not sure why you claim the order of a dictionary will definitiely
change on deep copying. But in any case, we are not talking about deep
copying here, or at least, I’m not; we’re talking about the notional
copying involved in the semantics of passing a set as an argument.
Because the ordering of some dictionaries depends on the memory address of
the key. There is no way to copy the key in those cases without changing
the ordering. You can watch their ordering shift in the debugger from run
to run.
I’m fine with that. Instances of reference types have identity. A deep copy
of a dictionary *has different keys* from the original. This may not be so
from the perspective of a custom == operator, but it most certainly is so
from the perspective of reference type semantics. So to me it is fine
(correct, even?) that a deep copy of a dictionary has a different iteration
order—it has different keys.
Again, this is something entirely different from notional copies made in
the course of passing values as arguments.
As I keep saying, relying on the behavior of a leaking internal
implementation is a bad plan.
Again, it is not a leaking internal implementation. It is a public API
guarantee.
The public API guarantee is that there will be some ordering that is
stable for a particular instance on a particular run (as long as it is not
mutated/copied). The actual ordering is undefined.
Any generic code which depends on a particular ordering will have output
which is undefined.
Generic code is incorrect that “depends on a particular ordering” because
it assumes semantics that are not guaranteed.
Just because a function returns a seemingly predictable value doesn’t mean
it is entirely deterministic. There are lots of vulnerabilities in C/C++
due to reliance on undefined behavior.
We should add an additional guarantee to set/dict that the order returned
will be the same for the same contents regardless of history (but can be
otherwise arbitrary). That will fix the behavior for algorithms like
elementsEqual (i.e. They will return the same result across builds/runs).
It will also implicitly provide as a result, the constraint you were
arguing is needed across copies of a collection type. I agree that that is
an important guarantee. Why not fix both issues with a single
non-source-breaking change?
As I wrote, the behavior of elementsEqual is not broken and does not
require fixing.
Your argument for that is tautological though. You could argue for
keeping any bug in the codebase using the same reasoning.
Why was elementsEqual created? Isn’t it meant to check equality of two
sequences in a generic context where == isn’t available?
No no no no no no no no. That’s precisely why the name is misleading.
elementsEqual is *not* simply a mixed-type version of ==. Remember that ==
as implemented on a concrete sequence type *has no obligation* to use
elementwise comparison using the element’s implementation of ==. This is
not merely theoretical: [Float].== does *not* do an elementwise comparison
using Float.==. By contrast, you are guaranteed a true elementwise
comparison with elementsEqual regardless of how equivalence is defined for
the sequence.
Isn’t it problematic that two equivalent sets may not be equivalent in a
generic context?
Again, this shows why the name is unfortunate. elementsEqual is not
supposed to be a generic version of ==, and its name is misleading. Repeat:
elementsEqual is *not* an alternative for == in either the generic or
concrete context. Neither one implies the other.
What is the benefit of the guarantee you propose that justifies the
performance cost?
The benefit is that generic algorithms which rely on ordering of unordered
things would be much more robust. It would solve your copy issue as well.
You are acting like there is a huge performance cost. There isn’t. For
Set, it would still have the traditional performance characteristics for
Sets. Any slowdown (by un-cutting corners) would be amortized over
insertions. (note: I am talking option #3 from my summary here, not #5,
which was the one with an actual performance cost)
Why is the source-breaking change of renaming things better?
Because, in my analysis, the problem is that the method is incorrectly
named. The problem affects all types that conform to Sequence and not just
Set and Dictionary; elementsEqual is a distinct function from ==, and it
must either continue to be distinct or cease to exist, but its name does
nothing to clarify any distinction.
It also won’t fix the original problem you mention as motivation. People
will still be surprised at equivalent sets having different iteration
orders regardless of what the function is named.
That is not the motivating problem. The motivating problem is that people
think that elementsEqual is semantically equivalent to ==. It is a distinct
method that *does not test* equivalence of two sequences.
Thanks,
···
On Tue, Oct 17, 2017 at 12:54 Jonathan Hull <jhull@gbis.com> wrote:
On Oct 17, 2017, at 9:34 AM, Xiaodi Wu <xiaodi.wu@gmail.com> wrote:
On Tue, Oct 17, 2017 at 09:42 Jonathan Hull <jhull@gbis.com> wrote:
On Oct 17, 2017, at 5:44 AM, Xiaodi Wu <xiaodi.wu@gmail.com> wrote:
On Tue, Oct 17, 2017 at 00:56 Thorsten Seitz <tseitz42@icloud.com> wrote:
Am 17.10.2017 um 00:13 schrieb Xiaodi Wu <xiaodi.wu@gmail.com>:
On Mon, Oct 16, 2017 at 14:21 Thorsten Seitz <tseitz42@icloud.com> >>> wrote:
Am 16.10.2017 um 16:20 schrieb Xiaodi Wu via swift-evolution < >>>> swift-evolution@swift.org>:
On Mon, Oct 16, 2017 at 05:48 Jonathan Hull <jhull@gbis.com> wrote:
On Oct 15, 2017, at 9:58 PM, Xiaodi Wu <xiaodi.wu@gmail.com> wrote:
On Sun, Oct 15, 2017 at 8:51 PM, Jonathan Hull <jhull@gbis.com> wrote:
On Oct 14, 2017, at 10:48 PM, Xiaodi Wu <xiaodi.wu@gmail.com> wrote:
Jon