Pitch: Reform SetAlgebra

Hi everyone,

A proposal with similar motivations to removing first and last, to update the SetAlgebra protocol.

Implementation still in the works but any early feedback would be welcome.

GitHub PR here.

Reform SetAlgebra

Introduction

This proposal makes two changes to the SetAlgebra protocol, as well as corresponding changes to conforming types in the Standard Library (and Foundation):

  • amend insert to return the index to the current element in the set, not the element itself; and
  • split SetAlgebra into two protocols, for intensional and extensional sets.

These are source breaking changes. The primary motivation for breaking source now is ensuring that the ABI and API allow for future types and language features – specifically move-only types.

Motivation

Preparation for move-only types

As described in SE-0232, requirements on a protocol where an element is returned, but not removed, from a collection will be a problem for collections that want to be able to hold move-only types (because returning the element means it would have to be moved out of the collection, so it can't remain in the collection).
There is one such method on the SetAlgebra protocol, insert:

/// Inserts the given element in the set if it is not already present.
///
/// If an element equal to `newMember` is already contained in the set, this
/// method has no effect.
///
/// - Returns: `(true, newMember)` if `newMember` was not contained in the
///   set. If an element equal to `newMember` was already contained in the
///   set, the method returns `(false, oldMember)`, where `oldMember` is the
///   element that was equal to `newMember`. In some cases, `oldMember` may
///   be distinguishable from `newMember` by identity comparison or some
///   other means.
@discardableResult
mutating func insert(_ newMember: Element) -> (inserted: Bool, memberAfterInsert: Element)

The operation returns the new value if an equal value wasn't already present; and discards the new value and instead returns the existing old value if one was.

This will be a problem for a set of move-only types. If the inserted type is move-only, it could not possibly be both inserted into the set and returned.

Contrast this with Dictionary's updateValue(_:forKey:), which replaces the existing value if there is one and returns that – something that would work fine for move-only types:

/// Updates the value stored in the dictionary for the given key, or adds a
/// new key-value pair if the key does not exist.
///
/// - Returns: The value that was replaced, or `nil` if a new key-value pair
///   was added.
@discardableResult
public mutating func updateValue(_ value: Value, forKey key: Key) -> Value?

Set.insert returns the new value for good reason. Imagine a use case where you want to intern strings – that is, make sure all occurrences of a string use the same buffer, to save memory and speed up equality comparison:

// start with some constant strings
var internedStrings: Set<String> = ["foo","bar"]

func intern(_ s: String) -> String {
  let (_,interned) = internedStrings.insert(s)
  return interned
}

// create a new string dynamically,
// with its own buffer
let s = String("oof".reversed())
// get back a constant string, freeing memory
// and enabling fast-path bitwise comparison
let i = intern(s)

The usual solution to avoid the move-only type problem is to return an index instead:

/// - Returns: `(true, newMember)` if `newMember` was not contained in the
///   set. If an element equal to `newMember` was already contained in the
///   set, the method returns `(false, oldMember)`, where `oldMember` is the
///   element that was equal to `newMember`. In some cases, `oldMember` may
///   be distinguishable from `newMember` by identity comparison or some
///   other means.
@discardableResult
mutating func insert(_ newMember: Element) -> (inserted: Bool, location: Index)

Move-only types can still be accessed via a borrow using the subscript operator. And the above interning code could be rewritten:

func intern(_ s: String) -> String {
  let (_,idx) = internedStrings.insert(s)
  // strings are copyable so this is still ok
  return internedStrings[idx]
}

This would be an appropriate solution, but for two problems:

  1. It is source-breaking.
  2. SetAlgebra doesn't conform to Collection

Which leads to...

Countable vs Uncountable Sets

SetAlgebra deliberately does not conform to Collection. This is to allow for conformance by types that can provide all the functions of a set (union, intersection,constant-time contains) but that don't provide the ability to iterate over their elements.

CharacterSet is an example. You can check if a CharacterSet contains an entry, but it doesn't allow you to enumerate its contents because sometimes it defines membership by rules instead of by a list of elements. While it's technically possible to enumerate all the members of CharacterSet.alphanumerics, it doesn't make much sense to, so this feature isn't made available.

Unfortunately while this approach is ok for CharacterSet, this noble goal breaks down with other use cases, because some requirements still force the ability to enumerate.

For example, suppose you wanted to create a PredicateSet, a set that defined membership via a closure:

struct PredicateSet<Element> {
  let _predicate: (Element)->Bool

  init(_ predicate: @escaping (Element)->Bool) {
    _predicate = predicate
  }

  func contains(_ member: Element) -> Bool {
    return _predicate(member)
  }
}

let evenNumbers = PredicateSet { $0%2 == 0 }
evenNumbers.contains(42) // true
evenNumbers.contains(13) // false

You can conform this type to many of the requirements of SetAlgebra:

extension PredicateSet {
  // empty set contains nothing
  init() { self = PredicateSet { _ in false } }
  
  // union means both predicates return true
  func union(_ other: PredicateSet) -> PredicateSet {
    return PredicateSet {
      self._predicate($0) || other._predicate($0)
    }
  }
  
  // In-place operations can just be implemented
  // with self-assignment
}

But some of them are impossible to implement.

Some requirements effectively require the ability to enumerate the elements. isEmpty requires you to know contains will always return false. This is impossible for an opaque closure. Equatable conformance has the same problem.

init<S : Sequence>(_ sequence: S) where S.Element == Element requires the ability to initialize from a sequence. At a minimum, this would require that Element be Equatable (and even then, testing would take O(n)Comparable or Hashable is probably the minimum). In most cases, this is taken care of by the confoming type (e.g. Set requires Element: Hashable). But a predicate set might not need this requirement – the predicate the user supplied could use some other means. Ideally, SetAlgebra wouldn't impose any requirement on Element, but also wouldn't require an unconstrained init from a sequence. ExpressibleByArrayLiteral has the same problem.

isSubset(of:), isSuperset(of:) and isDisjoint(with:) are kind of a combination of both problems.

Proposed solution

Split SetAlgebra into two protocols:

  • IntensionalSet will capture the minimal requirements of any type that
    can perform:
    • contains in O(log N) (see below)
    • union, intersection, and symmetricDifference
    • the in-place variants of these operations
    • no conformances
  • ExtensionalSet will capture the rest:
    • and add Collection conformance
    • it will also add an index(of: Element) customization point
  • insert will only be present on ExtensionalSet, and will return an Index
    not an Element

SetAlgebra does not currently require a complexity guarantee for contains, but the lack of one is problematic for many generic algorithms built on top of it if they are to give their own guarantees.

Detailed design

TBD

Source compatibility

This is a source-breaking change.

The main driver is to ensure that SetAlgeba is ready for move-only types, which requires fixing insert. This will unavoidably break conformances. Once that possibility is admitted, it is not significantly worse to introduce a second break for implementors.

Some problems can be mitigated by keeping SetAlgebra as a typealias for ExtensionalSet. This will cover any extensions on SetAlgebra.

Callers of insert that make use of the Element return type will have to update their code to add the subscript call.

There are no instances of either of these in the compatability suite currently.

Effect on ABI stability

This is an ABI-breaking change. The motivation is to get the ABI right for sets of move-only types, which cannot implement insert currently.

Effect on API resilience

None

Alternatives considered

It is reasonable to argue that the generic SetAlgebra is not pulling its weight – that it isn't common to want to write generic algorithms using SetAlgebra. The lack of use in the compatability suite supports this.

Given this, it would be reasonable to leave it in its current state. But this still leaves the problem that conformance is impossible for move-only types, and we want Set to be able to contain move-only types.

Another alternative is to leave SetAlgebra as-is, and only conform Set to it when it contains copyable types. This could become a pain point if Swift increases it's set-like types to include ordered and sorted sets. Leaving the current protocol as-is would rule out move-only versions conforming to it forever following ABI stability.

Naming

Instead of IntensionalSet/ExtensionalSet, we could go with SetAlgebra and CountableSetAlgebra. The main problem here is this exacerbates the source compatibility issues.

UncountableSet isn't a good alternative for IntensionalSet because it implies it is disjoint from CountableSet, rather than being a superset of countable sets.

21 Likes

This looks good to me, except that there will be no way to construct an IntensionalSet from elements in a generic way. Would it be too restrictive to add an init(element: Element) requirement for creating single element sets? (And then larger sets can be built up via unions.)

That would require Equatable on the Element type in order for it to be implementable by a theoretical PredicateSet or types like it, but I don't see how you can really have set semantics at all without some notion of equatability (semantically, if not protocol conformance-ly).

Even if you had Equatable, if you did hundreds of inserts in a row, your PredicateSet would end up degenerating into a super-inefficient linked list in the form of single-element closures. You'd need to add a side-table to the PredicateSet with individual elements to check as well. And you really need Comparable or Hashable at a minimum to make searching that table efficient. Which I guess is OK but doesn't feel right. The beauty of the predicate set is the user supplies the predicates.

2 Likes

Even if there are not many third-party conformances, SetAlgebra very much pulls its weight by being the entry point for learning Set-alike types, particularly the massively improved interface OptionSet provides over plain bit twiddling. Set logic is taught in primary school, “intensional” and “extensional” are not. I know that sounds purely like surface-level bikeshedding, but I think “let’s go ahead and make it more complicated while we’re at it” is not an appropriate pairing for what should be a very concrete discussion around preparing sets for move-only types.

6 Likes

+1

I’ve been using a similar group of stand‐ins for years because SetAlgebra was inoperable with intensional sets like Range, CharacterSet, pure predicates, etc.

I like how the move-only type problem comes down to increasing the flexibility of the hierarchy. A natural and intuitive evolutionary step, where an upcoming language feature merely acts as a means of further strengthening the motivation and justifying the source compatibility issue – an artificial factor – in a positive way, of course.

It's not easy to part with SetAlgebra and to some degree I prefer a simpler naming variant. On the other hand, IntensionalSet and ExtensionalSet are sensible namings that emphasize the absence or presence of init<S : Sequence>(_ sequence: S), isEmpty and other collection requirements. There is no hierarchy contradiction either: any extensional set can be intensionally defined, while the opposite is clearly false.

A previous discussion

@Ben_Cohen, relative to Dictionary, Set probably has more important use cases when returning the new value is convenient, but I wonder if the proposal takes into account index validity over time. Will this be addressed by the index(of: Element) customization point?

1 Like

I don’t have any opinion on the design, but I think the proposal would benefit from introducing and explaining the terms intensional and extensional.

5 Likes

surely we can come up with less abstruse names for these protocols? other than that, excellent proposal

3 Likes

From Wikipedia’s article about “intensional” and “extensional” definitions:

[A]n intensional definition gives the meaning of a term by specifying necessary and sufficient conditions for when the term should be used.

An extensional definition of a concept or term formulates its meaning by specifying its extension, that is, every object that falls under the definition of the concept or term in question.

2 Likes

This pitch makes a lot of sense. But I also think the naming needs more work.

I'm just going to throw a suggestion: SetFunctional & SetCollection. (It's a pity we can't reuse SetAlgebra for the minimal protocol, I really liked that name.)

I think the IntensionalSet and ExtensionalSet spellings are very good—they capture the proposed splitting-up of SetAlgebra remarkably well. It's actually kind of mind-blowing to me that these distinctions align so naturally with the design constraints imposed by move-only types.

Here's a philosophy of language perspective on intensional and extensional sets, in case anyone finds it helpful for contextualizing the proposal. I think these points are applicable here more or less unchanged, although they might be interpreted differently for programming contexts depending on how / how much one reads into them.

  • An intensional set is one where every member can be determined implicitly. In other words, the set can necessarily be defined by some membership rule, and its members aren't necessarily countable.
  • An extensional set is one where every member can be determined explicitly. In other words, the set can't necessarily be defined by some membership rule, and its members are necessarily countable.

Intensional set and extensional set are established terms in a variety of contexts, so I think the proposed spellings are ultimately clearer than, for example, ImplicitSet and ExplicitSet.

8 Likes

This looks like a good design. I just have one question.

This seems to imply that ExtensionalSet refines IntensionalSet but does not explicitly state that. I think that refinement makes sense in terms of refinement, but it feels awkward to have types that are actually extensional in structure conform to IntensionalSet. I don’t have any specific suggestions, but I’m wondering if you have considered this.

I agree, as well as intensional/extensional being a bit more obscure as names, you have the same problem as uncountable/countable where the names imply that these are opposites rather than one extending the other. I think that SetAlgebra and CountableSetAlgebra are the way to go.

We're already breaking source compatibility, and I'm not sure how keeping the old name makes things any worse. If anything it seems like users of types that are SetAlgebra but not CountableSetAlgebra (e.g. CharacterSet) will have an easy time with:

extension SetAlgebra {
  @available(*, unavailable, message: "this method is now part of CountableSetAlgebra")
  func insert() {}
}

Could you go into more detail about what you see as the exacerbated source compatibility issues?

5 Likes

Admittedly this may be pretty subtle, but I’d argue intensional and extensional are opposites in the same way implicit and explicit are, which does not imply mutual exclusivity.

5 Likes

I think I’m most of the way convinced this is the right approach. It broadens the source breakage, but based on the compat suite, that is probably not significantly worse.

5 Likes

I think that we need to give developers more credit than this. Using the terms that represent what we are actually trying to represent is useful. Countable isn't so much more approachable that it is a clear win. I really think that Intensional and Extensional should win out against SetAlgebra and CountableSetAlgebra.

2 Likes

why can’t we just call them what they are and use the names PredicateSet and CollectionSet?

3 Likes

Those sound like concrete types to me. In particular, PredicateSet is exactly what I'd call a set type based on a predicate.

at my uni (where i’ve taken all the theory classes i’m required to take) they teach “countable” but they don’t teach “intensional” and “extensional”.

1 Like

You are able to learn these terms, though, is my point. In the same way that you refined your understanding of “countable” in school, you can add new terms. I wasn’t familiar with extensional and intensional until recently

1 Like
Terms of Service

Privacy Policy

Cookie Policy