Regression in Set with Swift 5.5

Karoy, very enlightening analysis as usual, thank you. Topics around equivalence are mind-blowing: we may have values which are equal but not identical, or identical but not equal, or values which are not equal to themselves (nan != nan) or otherwise violating math axioms (rock < paper < scissors < rock).

whoever did the fix clearly haven't thought about the reference types case. if you go back in time, what would you do here, leave existing "intersection" API as it was and introduce a new one like so?

struct IntersectionOptions: OptionSet {
    init(rawValue: UInt)
    static let optimized = IntersectionOptions(rawValue: 1)
}

func intersection(_ other: Set<Element>, options: IntersectionOptions) -> Set<Element>

is this path still an option now or is it too late?

ps. if to start completely from scratch (alas something we can't do), the name intersected(with:) would be appropriate for the old unoptimised intersection behaviour and the name intersection is good for the new optimised behaviour, perhaps even in a static func form, or as a * operator.

Oh, I don’t think these operations deserve that much API. We don’t need to overthink this; the existing operations are fine, we just need to make sure that for items that are in both inputs, high-level Set operations always take their results from the first set, i.e., self.

(We should also explicitly document this, although that might require going through Swift Evolution, and it’s difficult to justify the expense of doing that. Related note: we still haven’t updated Sequence.sorted()‘s docs to guarantee stable ordering.)

To turn this into something positive, this would be an ideal excuse for landing the optimizations in PR #21300.

1 Like

yep, after i wrote my comment above i came up with the same idea (so it is both quick and preserving the old behaviour):

   @inlinable
   public __consuming func intersection(_ other: Set<Element>) -> Set<Element> {
	 var newSet = Set<Element>()
	 if count > other.count {
	   for member in other {
		 if let element = (first { member == $0 }) {
		   newSet.insert(element)
		 }
	   }
	 } else {
	   for member in self {
		 if other.contains(member) {
		   newSet.insert(member)
		 }
	   }
	}
	return newSet
  }

I'm surprised you think this is even something that needs fixing. I was under the impression as long as Set's behaviour respected the Hashable guarantees of its inputs, then all was dandy — and all of these operations do that. Is guaranteeing where the elements originate from really something that needs to be done?

(No criticism in tone intended, I'm just genuinely surprised by this.)

8 Likes

I'm not sure that this is worth doing--if it had been caught during beta, it might have been, but even if we "fix" this now, there will be lots of devices with the 5.5 behavior in the wild, so developers still won't be able to depend on the old behavior for the next few years.

We could, I suppose, add an alwaysEmitIntoClient version with the old behavior, but we'd have to maintain the old non-aEIC entry point as well, and I'm not sure this is all really worth the hassle (especially given that there's a "trivial" workaround for anyone bitten by this).

That said, I think people are increasingly aware that a lot of the keyed-collection API has some rough edges around equatable, and it's worth keeping track of these issues for a more comprehensive reworking down the line?

4 Likes

if i want an optimised version of:

func ordered_intersection(_ other: Set<Element>) -> Set<Element> {
	var newSet = Set<Element>()
	for member in self {
		if other.contains(member) {
			newSet.insert(member)
		}
	}
	return newSet
}

func quick_and_ordered_intersection(...) -> ... {
	// same as the above but iterates over a smaller set
	// still maintains the fact that elements are taken from "self"
}

that maintains its observable behaviour (takes elements from "self") but cleverly iterates over a smaller set to do this quickly, shall i implement such a method myself? if Set's intersect is not such a thing (now or ever) can this new method be added to standard library in addition? would be +1 from me in any form (whether this is a new method or a new documented behaviour of the old intercept - no particular preference).

1 Like

I would not necessarily consider this source compat issue a bug if the documentation explicitly called out that the selection is subject to change. (But even then, it would be counter-productive to deliberately change things only to mess with code that didn't read the docs.)

That we shipped this change is not a drop-everything-the-sky-is-falling issue (it isn't even necessarily something we must fix), but it's still an issue.

Looking past this immediate problem, I think Set.union and Set.intersection are underspecified. Given that Set's API is actively encouraging people to care about equal-but-not-identical elements, it would very much make sense for these operations to specify which input set the members of the result come from. (Just like OrderedSet's equivalent methods specify the ordering of the result.)

I see no practical reason these operations cannot always take dupes from their first input, so we should make sure they do that.

Specialization takes care of this objection reasonably well, I think. (As demonstrated by us not realizing this was an issue until long after 5.5 shipped.) This issue is an esoteric detail of a relatively rarely used API; we don't need to overthink the fix.

If we wanted to change the docs (which I don't think we should right now), then we could simply state that the behavior is only guaranteed from Swift 5.6 on (or whatever). The same goes for Sequence.sorted() and similar cases.

I agree, this clearly wouldn't be worth the hassle. We can't fix past releases, but we can at least make sure to do the right thing in future ones.

(People can always write extensions that polyfill the right behavior on older releases.)

Hm, I'm not entirely sure I know what these rough edges are -- the core Set and Dictionary APIs are very deliberate about preserving/maintaining key identity, and they have been for years.

(Besides us spuriously changing Set.intersect's behavior in 5.5, the one minor issue I'm aware of is that Dictionary doesn't provide efficient API that corresponds to Set.update(with:) -- dictionary keys are immutable. However, one can simply remove the key and re-add its value with the updated key. We also have some API wrinkles around in-place mutations of Dictionary values, but that's completely unrelated to Equatable.)

Equatable has some fundamental issues, of course -- we have the mostly nonsensical substitutability "requirement", we have floating point types violating reflexivity, and we transparently bridge ObjC/Swift types that have incompatible definitions for equality (NSString/String being the major offender). But how can we fix these by tweaking hashing collection APIs?

1 Like

I submitted PR #40012, which reimplements the dramatic Set improvements from my old PR #21300.

This also restores the original, proper Set.intersect behavior without losing the optimization we shipped in 5.5.

12 Likes

This is news to me. Does this mean that we do ensure a stable ordering?

1 Like

The stdlib’s sort algorithm has been stable since Swift 4.2 or Swift 5 or so.

We technically do not ensure a stable ordering, because the documentation does not guarantee this.

2 Likes

That this still hasn’t been documented is one of my biggest gripes with the std lib. It leaves users with two equally bad options:

a) implement their own stable sort (even though the std lib one is actually stable!), or
b) assume that the std lib one is going to stay stable even though it isn’t guaranteed to.

4 Likes

Are there more such documentation changes that would require a proposal? If so, we could collect them all and at least get it over with with one review :thinking:

5 Likes

That would be an excellent idea! Are you volunteering to write it up?

So right now we have:

  • Sequence.sorted(), .sorted(by:), MutableCollection.sort(), .sort(by:) should document that the sort algorithm is stable. (Starting from Swift x.y)
  • Set.union(_:), Set.intersection(_:), and Set.formIntersection(_:) (all overloads) should document that members of the result that appear in both inputs are taken from self. (Interestingly, formUnion already has this guarantee.) (Starting from Swift 5.6)

Anything else?

6 Likes

Well, I've never done a proposal or even a pitch before, but I suppose here I could pull of the implementation if it's just comments :grinning:

3 Likes

@nnnnnnnn brought up that Sequence.min/max implementations in the stdlib do not specify which element they return if there are duplicates. I think they should!

This would also be a chance to make this consistent. Currently, the top-level max functions return the last dupe, while Sequence.max returns the first. We should choose one behavior and consistently stick to it. (Given that the behavior of the top-level max is documented, Sequence.max should probably adopt it, too.)

9 Likes

So the Swift 5.6 release process was just published. Do you think there is any way this would make it into that release?

What would be the next step? A pitch thread to try to gather more changes? At first I thought this would just be changes in documentation, but if the idea is to also change the behavior at the same time, some actual implementation will be required :thinking:

The implementation changes for Set are done.

1 Like

True, but changes to Sequence.min/max are probably not.

I don't believe the documentation updates have been done yet, however.

I don't see why a narrow and relatively uncontroversial proposal that ties down behavior through doc updates couldn't fit into 5.6! I don't think changing the behavior of Sequence.max would complicate things, either.

S-E proposal scheduling bottlenecks could happen, of course -- it wouldn't be the end of the world if the updates get delayed by one release, as long as they eventually happen.

1 Like
Terms of Service

Privacy Policy

Cookie Policy