Pitch: Reform SetAlgebra

David_Sweeris · October 21, 2018, 10:24pm

Do we have concurrency yet? If so couldn’t PredicateSet be implemented by a linked list of buffers, and we’d just spawn a background thread that merges everything into the first buffer whenever an element gets inserted?

gregtitus · October 22, 2018, 2:48am

The core of what we are trying to represent here, though, is not two different definitions of sets, per se, but that there is a minimal set of requirements/accessors suitable for some algorithms and a more complete set of requirements/accessors that depends upon the enumerability (the countability) of the set contents.

I’d have no problem with SetAlgebra and ExtensionalSetAlgebra. The benefit being aimed for is not that Countable is a more understandable word, the benefit is that the more specific protocol has the more specific name.

anthonylatsis · October 22, 2018, 3:51am

Definitions may vary from place to place, but if it's this countable you are talking about, it's slightly different. The set of natural numbers is countable, but isn't finite and hence cannot possibly be extensionally defined. So the question whether either of those spellings make sense roughly comes down to "what are we trying to express with the refined protocol?"

I believe the ability to simply enumerate the elements is sufficient (I don't want to even start talking about actual countable sets and infinite sequences in this context). In this case, both spellings are OK, except that it would be somewhat misleading to those that refer to the mathematical definition if we were to use countable. That said, I second Greg's idea of having SetAlgebra and ExtensionalSetAlgebra, but not particularly against countable.

taylorswift · October 22, 2018, 9:50pm

i’d be wary of anything containing the word Extensional considering how important and unrelated a concept extensions are in Swift

jrose · October 23, 2018, 11:02pm

I really, really don't like the source break. SetAlgebra is a parent protocol of OptionSet today, and I can imagine people are using the return value of insert with option sets. If you can figure out how to keep that in some form (as an extension method, not a requirement) then I'd be a lot happier.

(It's possible that SetAlgebra.insert and IterableSetAlgebraOrWhatever.insert can coexist just fine because the latter has a "more constrained Self type".)

Ben_Cohen · October 23, 2018, 11:33pm

Removing it from SetAlgebra doesn't mean OptionSet has to remove insert. It can still implement it, even with the current Element return type (though for option sets that is pretty pointless, so I'd be amazed if anyone is ever not discarding it). It just won't be available in a generic context when extending SetAlgebra, in order that Set can conform and still be move-only in the future.

Implementing the element-returning version via a generic extension built on top of the index-returning version would be fine, except then we'd need to rename the index-returning insert, because they can't differ only by return type as that will cause ambiguity (especially because the return type is discarded nearly every time, so type context won't even save you).

edit: removing "concretely" on suggestion OptionSet implement insert, because I forgot it was also a protocol not a type.

gregtitus · October 24, 2018, 7:05pm

Somewhat off-topic: we could do this if we taught the type checker to prefer Void returning overloads in cases where the return type is ignored. (You could conceivably also add a function attribute for either use-this-if-result-ignored or don't-care-which-overload-if-result-ignored instead, but preferring a Void return type seems like it fits the language better.) Then we can just add a convenience extension to CountableSetAlgebra (or whatever the name ends up being):

    mutating func insert(_ newMember: Element) -> Void {
        let (_, _): (Bool, Index) = insert(newMember)
    }

... and that would resolve the ambiguity in the vast majority of cases where the return type of insert gets discarded, and in the remaining cases, where the coder is actually using the return value, the error is likely to be a lot better than just ambiguous use of 'insert'.

This is a proof-of-concept of what it would take to add prefer-Void-when-ignored to the type checker, if you want to try it out:

Chris_Lattner3 · October 25, 2018, 4:50am

The best things that I can say about these names is that they are technically correct and googlable. I don't think a reasonable person would say they are communicative or evocative of their meaning and behavior (to most swift devs).

IMO, even though it is a pain to get right, naming really does matter. Please include a list of alternative's considered in terms of naming in the proposal.

Specific question: why not go with a much longer/descriptive name? Is there any reason these protocols have to have a terse name?

-Chris

griotspeak · October 25, 2018, 4:34pm

No amount of added "description" will be more correct than the term that already exists, is technically correct, and is googlable.

Every "longer and more descriptive" name will be less precise and, in the longer term, will allow us to convince ourselves that the 'technically correct' term is not, in fact, what we are representing.

Chris_Lattner3 · October 25, 2018, 5:07pm

I don't want to get into a big argument about this, but IMO that isn't true (synonyms exist!). We also have a track record of picking names that are longer and more relatable even if short and obscure would also work. For example, we use "value of protocol type" instead of "existential" in diagnostics.

-Chris

griotspeak · October 25, 2018, 5:35pm

How about a little bit of discourse, then?

We're not really talking about synonyms. Phrases that approach the technical meaning are not terms that are interchangeable. Using "value of protocol type" in diagnostics is different because we are already in prose and you have already chosen not to use the term 'existential' elsewhere.

Another benefit of using established terms is that people who have knowledge of the term from elsewhere can leverage that knowledge and people moving from our community to other fields will have some familiarity to bring with them.

EDIT: Additionally, using the technically correct term opens up the possibility for people to be exposed to the wider understanding and literature around an abstraction. So long as we provide a good, well scoped, contextual explanation in our literature, this doesn't have to be overwhelming. It can be connective.

AlexanderM · October 25, 2018, 5:40pm

Can we get clever about this, conditionally extending ExtensionalSet where Element: NonMoveType with the current implementation of insert (i.e. returning memberAfterInsert: Element, instead of location: Index)?

I think it's admirable goal to try to hide indices from the every-day use of the API, because most people who aren't in the know wouldn't expect a Set to have indices.

Nevin · October 25, 2018, 6:37pm

Anecdotal data point: I have a degree in mathematics, and I have never heard the terms “intensional set” and “extensional set” before this thread.

If I were explaining the concept to a friend, I would probably say something like “A set defined by its indicator function” and “A set defined by enumerating its elements”.

As a programmer (and API naming expert :-), I recognize that “enumerating” (or “enumerable”) is not acceptable here because it sounds too much like enum. Similarly, “extensional” sounds too much like extension.

We have precedent for using Countable to distinguish between things that can and cannot be iterated (CountableRange before conditional conformances). So I would lean toward SetAlgebra and CountableSetAlgebra.

anandabits · October 25, 2018, 7:36pm

FWIW, the distinction we're considering in this thread is related to the notion of materializability of collections I pitched in this thread: Introduce a `MaterializableCollection` protocol. With that in mind, another option would be MaterializableSet: SetAlgebra, Collection.

This name makes some sense: a user of these sets isn't really concerned with how they are defined (that's an implementation detail). Instead they are concerned with what they can do with the set. The capability introduced by the proposed distinction is precisely the ability to materialize the elements of the set rather than simply check for membership.

Aside from naming, I think there are additional details that need to be considered while we're reforming SetAlgebra. I would expect to see these covered in detail in a draft proposal. For example, Equatable and ExpressibleByArrayLiteral refinement should probably not live on the base SetAlgebra protocol. PredicateSet cannot conform to Equatable. In theory it could conform to ExpressibleByArrayLiteral, but should it? This requirement may be more problematic for other kinds of intensional sets.

davedelong · October 25, 2018, 10:46pm

One small wish for a reformed SetAlgebra protocol would be the addition of an intersects(_:) → Bool method.

yxckjhasdkjh · October 26, 2018, 9:50am

Another anecdotal data point: I have a degree in linguistics, and the terms "intensional" and "extensional" (although not in relationship to mathematical sets) come up in semantics. I suspect that philosophy students would also encounter such words, maybe also mathematicians studying logic.

So probably the terms are opaque to most people (even mathematicians) at first, but they're also definitely not ad-hoc and could be looked up.

Not that I think this implies "intensional" and "extensional" should be used (maybe there's terms that capture the semantics better), but I think the argument "the terms have some tradition and can be looked up" is valid.

subjectOriented · October 26, 2018, 11:18am

I think this argument would be pretty strong if we intended to use the terms elsewhere in the standard library, since consistency could aid reasoning about unfamiliar types by analogy with familiar ones. However, this thread makes me think that as a one-off occurrence, we'd be imposing enough overhead on users that it'd outweigh these benefits.

algal · October 27, 2018, 3:18am

Obviously the big question is whether it's wise to add these protocols.

But I got curious and did a little research just on the naming question (sorry!), in order to see if "intensional set" and "extensional set" are well-established mathematical terms for different kinds of sets.

It seems like they're probably not.

I checked two reference works -- The Princeton Companion to Mathematics and the Encyclopedic Dictionary of Mathematics (2nd edition). I looked in their tables of contents, their indexes, and in their articles discussing set theory, and there is no mention of the idea of an "intentional set" or an "extensional set". Also, anecdotally, I did some math in grad school and never ran into the terms.

The terms "extensional" and "intensional" originate in philosophy of language and logic, where they describe different kinds of definitions, not of mathematical sets, but of any kind of object. (This Stanford Encyclopedia of Philosophy article on intensional logic was a fun primer.) The idea of an "extensional set" therefor seems to require us to treat how a set was defined, or might have been defined, as a property of the set itself. This has odd consequences. For instance, here is an extensional definition of the set of the counting numbers from 1 to 3 inclusive: X = {1,2,3}. And here is an intensional definition of exactly the same set: X = { x | x in N, x < 4 }. Is the set X extensional, intentional, or both? Does the answer change if I erase one of the sentences above? If I can’t think of one of the definitions?

But the bigger issue, I think, is that the names are misleading. The protocol names IntensionalSet and ExtensionalSet describe ways one might define a mathematical set. But the actual protocols themselves specify different operations required by the protocols. The problem is that the sets of operations are only very loosely related to how a set might be defined.

Basically, the functions required for IntensionalSet represent a plain old mathematical set plus a computational complexity guarantee, and have nothing to do with intensionality per se.

And the functions required for ExtensionalSet represent a lot of structure beyond a mathematical set: an implicit mapping between indexes and elements, an implicit ordering of the the indexes and therefor of the elements, and insertability. (Even insertability does not guarantee extensionality, because you could still define a type which allowed inserts but was partially defined by a predicate rule, and so was not a completely extensionally-defined set.)

Since “intensional” vs “extensional” do not cleanly map to what the protocols do, I don't think these terms should figure in the protocol names at all. I think the protocol names should be chosen to describe what the protocols do. For the first protocol, I would suggest a name like MathematicalSet.

For the second protocol, I'd suggest something that alludes to the additional operations which have nothing to do with a mathematical set, something like CollectionSet.

Of course, all this begs the question of whether they're worth introducing, whatever their names.

Tino · October 27, 2018, 6:40am

I can't say much about weather the change itself is a good move*, but I think the names would be fine:
As a mathematician, you have a hard time with the Swift collections anyways ("=" doesn't check for equality, and "+" performs concatenation? Sets have order?), so I don't see an issue with choosing terms that aren't established in math.

Actually, I think it's better to choose names that will make people reach for their dictionary instead of picking well known terms whose interpretation may lead people to wrong assumptions.

* This whole memory ownership story is a huge one, and I just hope that somewhere there is already a comprehensive plan for it.

griotspeak · October 27, 2018, 7:24am

To clarify my positions…

SetAlgebra and ExtensionalSet seem fine to me. I 'liked' the suggestion when it was proposed.

The whole "Don't' avoid technically correct terms" thing is a theme that I've seen going through many Swift-Evo conversations and am fully against.

How does 'mathematical' describe what the first one 'does', here? All sets that we're dealing with are 'mathematical' in this context and the performance guarantee is the defining characteristic in your description. 'MathematicalSet' seems even stranger to me when I consider/compare it to Set in the standard library.