SE-0270: Add Collection Operations on Noncontiguous Elements

Karl · November 13, 2019, 4:15pm

I could also imagine this as some kind of Bound -> Bool collection, where Bound is the index and the subscript gets/sets whether or not the item is included. We already have operators for ranges of indexes (i.e. Range<Bound>), and we could add some more for setting them all to the same value (in this case, true/false to add/subtract them from the RangeSet).

xwu · November 13, 2019, 4:17pm

The reason would be that there is no preview package: that proposal has been returned for revision.

Karl · November 13, 2019, 4:19pm

yeah, but we know it's coming very soon. Or to get technical:

"The core team is, by and large, persuaded that this approach has many advantages"

xwu · November 13, 2019, 4:31pm

It’s not coming very soon at all, as far as I can tell. The core team was persuaded that a design other than that which was proposed “has many advantages,” and for that reason sent the proposal back for revision.

ben-cohen · November 13, 2019, 5:45pm

I really like this description, which I think is a succinct way of explaining the characteristics like no overlaps or empty ranges that are causing confusion when people instead come at it from the angle of an empty collection into which ranges are inserted.

That said, I don't think any name we can come up with will serve to do that same job as that sentence. I feel like this is an anti-pattern into which this community has fallen: trying desperately to replace documentation with one or two words that will spark an understanding of exactly what this thing is. My fear is that this is leading to some really bad outcomes, where one way that a beginner might characterize Swift is "oh yeah, that's the language where they have weird names for stuff". formIndex, allSatisfy, DiscontiguousRange.insert(allIn:). These are just strange names with an unpleasant mouth-feel that were settled on because of this quixotic desire to solve problems with naming that cannot be solved that simply. At the end of the day, people need to read docs and write tests, not just randomly guess what APIs do.

Having said that, I'd suggest one more simple name for this thing: Ranges. It has the benefit of brevity, and avoids the potentially misleading use of Set, without introducing some other obscure term that will only mean something to someone who already knows what the type does.

(to preempt the most likely objection, which is that it's 1 letter away from Range: I suspect the type system will immediately make it clear that you made the error within seconds of you trying to use the incorrect type, as they are very different)

ben-cohen · November 13, 2019, 5:50pm

I think this is a misinterpretation of the decision text. This was expected to be a quick revision not a punt. The difference the core team envisioned is not major: to instead create a package that aggregates multiple packages, one per SE, which can be semantically versioned, rather than just a single package that can't. I believe @nnnnnnnn has already revised the proposal to accommodate this so the re-review should be scheduled soon.

However, I'd suggest taking this discussion to a different thread, as it isn't directly relevant to the proposal in question.

xwu · November 13, 2019, 6:22pm

I think I like this less than RangeSet. It suggests that we have some specialized form of Array<Range>, which seems to move us away from rather than towards what's being described.

What do other languages call analogous types?

Edit:

Other thoughts on naming--

Would it be accurate to consider this type to represent a union of ranges? Hence, RangeUnion?
Would it be useful to consider this type to represent a selection, to which--like selections of text--ranges can be added or removed? Hence, Selection?

Karl · November 13, 2019, 6:35pm

I like the name Selection. It’s easy for people to understand and remember.

Oh, but it might be confusing, like if you’re doing UI programming

nnnnnnnn · November 13, 2019, 6:38pm

Thanks for the continued feedback! In regards to naming, a couple quick notes:

We have a name for a set of ranges: Set<Range<T>>. Because the semantics of that aren't the semantics of this proposed RangeSet<T>, using one of those is confusing and not terribly useful.
The name RangeSet isn't completely novel — it's been in the Guava library since 2013 with essentially the same semantics, and you can find third-party implementations under the same name for a variety of languages.

A Range always contains the values within its boundaries, it just doesn't store them individually. Every range has the contains(_:) method, not just those that have Collection conformance. RangeSet uses these same terms and semantics.

This example shows why the inverted method takes a collection — you always need the actual range of seats passed in from the outside. Inverting the set of occupied seats in the span of representable Ints would always give you way too many seats available!

glessard · November 13, 2019, 6:40pm

Countability or enumeratability are not required characteristics of a set. Swift.Set needs to be countable and enumeratable, but it's not a general set. This new type is more general in some ways (compact representations for some very large sets of Double), while being less in others.

Karl · November 13, 2019, 6:55pm

Yes but I think it should return a wrapper view rather than copying a new rangeset. In a simple case like that, you just want to query each row from the perspective of the holes. That should be a really simple operation.

Perhaps “inverted” could return a wrapper, and we could add an initialiser on RangeSet if you need an independent copy.

This is why I’d like it to go in the preview package.

hooman · November 13, 2019, 7:31pm

Just my quick 2cents on naming:

An old fashioned "Multi-" prefix sometimes works when extending an existing types to act like multiple instances of itself. How about MultiRange?

ben-cohen · November 13, 2019, 9:10pm

To reemphasize my earlier point: these names aren't better, just weirder. That is to say, IMO they do nothing for someone not already familiar with the type other than make them thing "huh, that's a weird name." I consider that first impression harmful and think we should be more mindful of this harm.

RangeSet has prior art in other libraries, and echoes IndexSet, with which Cocoa programmers are already familiar. These are strong recommendations as they allow porting of previous experience. Allusions to mathematical terms or specific use cases don't bring similar benefits, no matter how satisfyingly apt they may seem in an evolution thread.

Nevin · November 13, 2019, 9:21pm

I agree that RangeSet is an excellent name for this feature.

I also agree that it would be nice to have an InvertedRangeSet type, much as we have ReversedCollection.

lorentey · November 13, 2019, 9:40pm

This would be misleading me -- I expect the Multi prefix to appear on constructs that support duplicate values. (Which definitely have a role in/near the standard library.)

I like both RangeSet and Ranges.

I am not convinced that the complement of a RangeSet (or indeed, a Range) is something we need (or want) to model with a dedicated type. AFAICS, the relative complement implemented by subtract (and the convenience shorthand inverted(within:)) covers most (all?) practical use cases; dealing with a separate type seems to introduce more problems than it answers.

xwu · November 13, 2019, 9:54pm

To be clear, I think RangeSet is fine as a name.

If we are to explore other options, however, I think Ranges is unsatisfying for the reasons I mention—and because it is weird (not least because it is a plural, of an existing type no less).

zwaldowski · November 14, 2019, 12:55am

Overall, I really like the proposal. I had the need for it just the other day and went to borrow some code from @nnnnnnnn's proposal (namely, I the extremely good documentation ), so that definitely speaks to its quality. I agree, though, that I'd like to see this in i.e., a preview package and exercise it myself more.

The model proposed composes nicely with the collection system. I'm excited about how removeAll(at:) and DiscontiguousSlice might make the best-performing code also be the most readable, just as two examples.

within: in the init arguments to RangeSet is a little awkward when a parts of the stdlib already make use of in:.

I'm perfectly happy with the spelling RangeSet, with Ranges a close second.

I like that we have a bundle of types that have Set in the name with reasonably distinct uses. I prefer this to approach to, say, ending up like Java where you get cargo-culting that everyone on a project must use AtomicDoubleEdgedSwordHoldTheMustardLinkedList<T> because the senior dev says so.

Building off of that, though, I'm bummed like some others are to not have SetAlgebra conformance on a type named Set. Seems a little bit like the whole range expression/offset/offset expression build-up where we end up splintering protocol requirements. In particular, I think it likely that a project will end up needing to write a protocol SetAlgebraButActuallyJustTheSubsetThatRangeSetImplements { } and using that for as a bound in a generic algorithm.

dabrahams · November 14, 2019, 1:03am

This would be a good time to remind people of how we arrived at the Swift API Guidelines, particularly the first naming guideline, “Include all the words needed to avoid ambiguity for a person reading code where the name is used.” Following that principle leads directly to many of the naming choices in the standard library, some of which Ben is complaining about here.

The naming guidelines are a product of work from people from across the spectrum of subjective viewpoints, from those inclined toward minimal names like those in C++, to those steeped in the rather verbose tradition of Cocoa. Given that diversity, it's remarkable that we eventually achieved a consensus that everybody involved could buy into.

At the time, the tradition of Objective-C naming patterns was so heavily ingrained in the culture of the participants, that had we not de-emphasized subjective criteria like “mouth feel,” we would have ended up with standard library APIs in Swift that most of us would regard today as wildly inappropriate for Swift. Code that read simply as a.contains(b) could elicit a strong “yuck” reaction from an Objective-C programmer who expected each parameter to be introduced with a noun describing its type. Furthermore, emphasizing “mouth feel” over disambiguation can be harmful: we found examples of several Objective-C API that would “ring nicely in the ear” for someone steeped in that tradition, but in practice resulted in misleading code.

It is perfectly appropriate to spend a few words, and even risk disrupting a reader's sense of pronounceable code, to avoid an easily made misinterpretation at the use site of an API. That has always been a principle of naming in Swift's standard library, and discarding it now would not only create inconsistency, it would undermine one of the library's core design principles.

I can hardly think of a case that warrants this sort of treatment more than one where a reader will see a Range being inserted into an instance of RangeSet, which is not actually modeling a set of ranges.

ben-cohen · November 14, 2019, 1:04am

I'm bummed like some others are to not have SetAlgebra conformance on a type named Set .

Unfortunately SetAlgebra seems irretrievably flawed. There other set types that also cannot conform to it (such as a predicate set or an inverted wrapper set) and it also cannot support move-only types when we have them.

I would love to see a pitch for a new set protocol that was limited to the core capabilities of a set type (I think this may just be an element associated type and a contains algorithm) that could replace it.

dabrahams · November 14, 2019, 6:16am

Try not to be bummed . It's important to realize that SetAlgebra was never supposed to support all set types. When we designed it, we did a fairly complete exploration of the different kinds of possible sets, and worked out a complete hierarchy of protocols to accommodate them. But it's also a principle that you don't introduce a protocol unless you have a family of actual model types that can conform to that protocol, and by “have” I mean more than just “have imagined:” I mean that you have implementations and evidence that they're useful in real code. At the time, we had Set and OptionSet, so we could only really justify introducing the most refined protocol that covered those examples. That was SetAlgebra.

The intention was always that, as more set types were actually implemented, we'd add less-refined protocols to the hierarchy to account for them. So this isn't a bad thing at all; the discovery of a new, useful set type that didn't fit SetAlgebra was always expected.

I'm not sure whether we can figure out how to inject less-refined protocols into SetAlgebra's hierarchy now that we have ABI stability. If we can't, finding an alternative coding pattern that works will be an interesting problem, because the later discovery of these protocols is going to be a pattern in generic library evolution.

Hope this helps,
Dave