SE-0270 (Review #2): Add Collection Operations on Noncontiguous Elements

rxwei · December 18, 2019, 7:19pm

With respect to side effects, I believe gather(_:at:) is a reasonable name because it reads like an imperative phrase, which conforms to the Swift API design guideline for mutating methods.

As for the "gather" operation in the ML domain, for example, one such Swift library names this operation gathering(atIndices:alongAxis:), which is in line with API design guidelines.

That said, I do think gather(_:at:) is very unclear at call sites. The first argument is expected to be indices, but it does not have an argument label. This makes call sites read like it's gathering a bunch of integers, not elements at those positions.

// Am I gathering 1 and 3, or elements at 1 and 3?
array.gather([1, 3], at: 2)

I think moveElements as a base name, as Chris suggested, is better, because it clearly indicates element movement. To indicate where elements are moved to and from, the full name can be moveElements(at:to:).

dabrahams · December 18, 2019, 10:58pm

Chris' point about this name colliding with the numerics domain is well-taken. Scatter/gather isn't just an ML thing; if it were that much of a niche operation, I'd be less concerned.

Chris_Lattner3 · December 18, 2019, 11:15pm

Right, I'm not specifically concerned about ML. This operation is rearranging elements within a collection - using a verb like "move" or "rearrange" seems more likely to form a method family.

Using 'move' in particular is attractive to me, because when move semantics appear in the future, I suspect that movement of things will grow in importance.

dabrahams · December 18, 2019, 11:16pm

Drop `ExpressibleByArrayLiteral` Conformance

My second issue with the proposal is nicely illustrated by this example:

There are two problems here. The first is that the proposal doesn't make that code legal. You'd have to write:

array.gather([1..<2, 3..<4], at: 2)

Usefulness

But even with that change, I think the example is unintentionally misleading, because in real use cases we almost never have a set that can be expressed as a literal array of ranges. The one exception I can think of occurs when we have a single index (which is surely in some named variable):

array.gather([source..<a.index(after: source)], at: destination)

But that operation should be done with a different, O(N) algorithm, called rotate, so I'm not at all inclined to optimize syntactically for doing it by gathering a RangeSet.

A Real Source of Confusion

I also think it's a problem that when [1..<2, 3..<4] is interpreted as an Array or Set, its element type is Range<Int>, but if interpreted as a RangeSet, its element type is Int. As I claim above, RangeSets expressed as literals will be rare, but that doesn't necessarily make the confusion less impactful: if they were common, people would know to watch out for the possible misinterpretation.

Conclusion

Because literal RangeSets are almost never useful, and can be misleading, ExpressibleByArrayLiteral conformance should be dropped from the proposal. It adds complexity to the library, potentially harms readability, and stresses the type checker. If I'm wrong, we can always add it later.

Ilias_Karim · December 19, 2019, 3:52pm

(having only) a predicate-based method is inherently flawed. It must test all elements, even when only a relatively few will be targeted. Imagine touching only 10 elements out of 1000. An sibling function that can specify its targets by exact indices stored in ranges, (sorted) sets, etc. is still needed.

This logic does not make sense to me. Having one does not seem to indicate a need for the other.

Dave Abrahams raises a legitimate point about RangeSet literal syntax becoming a point of confusion. If the naming conflict with .regions does not pose type checker issues, as @benrimmington highlights, which based on past evidence history would suggest that it would, then perhaps ExpressibleByArrayLiteral conformance on RangeSet may.

Tino · December 19, 2019, 6:06pm

This might not be a huge proposal, but imho the topic itself is - I think even two reviews aren't enough to discuss all implications. I'm not saying we need more review cycles, though: Imho this is a perfect candidate for SE-0264 (review #2) — Standard Library Preview Package...

I agree with @bzamayo that it's odd to talk about ranges all the time, while the real code is actually focused on single indices. Take Collection.range(at:) as an example - this could as well have an additional length parameter, which imho would also make it significantly more useful in other contexts besides this proposal.

Another interesting example is public func ranges(of element: Element) -> RangeSet<Index>.
The natural result of this operation is a set of indexes, and using ranges here looks like an (possible) optimization which leaks out.
On the other hand, there is a very similar functionality which naturally produces ranges, and which I consider to be a big gap in Swift:
We have Apple Developer Documentation, but that's only part of NSString, not String.
Therefor, I'm puzzled that something like public func ranges<C: Collection>(of subsequence: C) -> RangeSet<Index> isn't even mentioned as a possible next step.

In the long run, I even see some overlap with [Prototype] Protocol-powered generic trimming, searching, splitting,, but I no idea how well the two concepts could work together.

My bottom line:
Improving the ease of use of collections is a crucial topic, and I really doubt that old procedure with small, simple steps will produce great results. Instead, I'd prefer a preview package with a complete vision of how to work with Collection - or even several competing approaches, so that we can compare the pros and cons.

CTMacUser · December 19, 2019, 6:56pm

If you only want one, then the index sequence version I proposed would be the one to keep, since you can simulate the predicate version with a prior index-testing pass. But that would be an extra explicit step.

dabrahams · December 20, 2019, 12:10am

I'm not sure the preview package addresses this problem. IIUC, it's only for things the standard library team thinks are ready to ship, not just for getting something out there to experiment with and seeing if it flies.

dabrahams · December 20, 2019, 12:25am

Is `RangeSet` Really an OK Name?

The argument in favor, IIUC, is that the generic argument to RangeSet tells you what it's a set of, so you'll see RangeSet<Int> and read it as “(range set) of Int.” I was prepared to be convinced by that argument, but today I started thinking about a similar, totally reasonable data structure: a set of ranges with generic bound type. What would we call that? “Probably RangeSet, I thought.” In fact, here's a typealias that does it:

typealias RangeSet<T> = Set<Range<T>>

Then a set of ranges of Int would be spelled RangeSet<Int>

The fact that the name is equally (or maybe more!) appropriate for another useful type troubles me. When I get to this point, I look at how we'd disambiguate it, and evaluate the costs. Since RangeBasedSet is clearer and not unwieldy, I'm suggesting it as a replacement. I'm interested to hear the community's thoughts.

anandabits · December 20, 2019, 1:18am

dabrahams:

typealias RangeSet<T> = Set<Range<T>>
Then a set of ranges of Int would be spelled RangeSet<Int>

The fact that the name is equally (or maybe more!) appropriate for another useful type troubles me.

I agree. The name hasn’t felt right to me for this reason as well. I wasn’t able to think of anything better though.

This is the first name I’ve seen that is both completely clear and relatively concise. I like it.

dabrahams · December 21, 2019, 7:25pm

`RangeSet` treats `Range`s as sets; should we name operations accordingly?

[Almost done with these posts; I have just one more on the stack—so far!]

Welcome back, @maven! Actually, this is logically a “set of indices”, or rather, a “set of anything comparable”. I know it can be confusing: here's the place where I finally realized it myself.

I had planned to let someone else answer this post, but I was just re-reading it and after refactoring your statement above I noticed you were saying Range<Bound> is a bit like Set<Bound>, which is an excellent point: Ranges are treated by RangeSet as contiguous sets of the Bound type. Once you realize that, a number of the APIs bear re-examination:

1.

    /// Creates a range set containing the values in the given range.
    public init(_ range: Range<Bound>)

RangeSet(0..<10)

LGTM; it's an information- and semantics- preserving construction of one set from another.

2.

    /// Creates a range set containing the values in the given ranges.
    public init<S: Sequence>(_ ranges: S) where S.Element == Range<Bound>

Hmm,

RangeSet([0..<10, 10..<20])

Doesn't look symmetric with the previous one, and seems to imply we're building a set of ranges. Should it be:

RangeSet(union: [0..<10, 10..<20])

? That doesn't read beautifully to me, but it's at least trying to address the issue. Open to other ideas.

3.

    /// Adds the values represented by the given range to the range set.
    public mutating func insert(contentsOf range: Range<Bound>)
        
    /// Removes the given range of values from the range set.
    public mutating func remove(contentsOf range: Range<Bound>)

Now that we have the contentsOf: label, these are less suggestive than they used to be of the idea that RangeSet is a set of ranges, but the use sites are now longer, and these APIs fail to acknowledge the set-like role played by parameter. The obvious alternatives are s.formUnion(someRange) and s.subtract(someRange), which are both consistent with the set role of the parameter and more compact at the use site. I used to be rather ambivalent about suggesting this change, but seeing the usage written out, it now looks like a no-brainer to me.

Full disclosure: changing these names as I've suggested implies the possiblility of a whole pile of related API, with usage like

let s1 = s.union(someRange)        // non-mutating
let s2 = otherRange.subtracting(s) // non-mutating, reversing order

etc. However, I don't think that implication obliges us to add any of those other APIs today. We should start conservatively and only expand when we have real use cases demonstrating a benefit.

4.

    public struct Ranges: RandomAccessCollection {
        public var startIndex: Int { get }
        public var endIndex: Int { get }
        public subscript(i: Int) -> Range<Bound>
    }
    
    /// A collection of the ranges that make up the range set.
    public var ranges: Ranges { get }

Seems to me contiguousSubsets is a better name for this, despite its length. This is not a name that will be uttered often, and where it is useful, IMO, there's usually a better way. For example, we should discourage

for r in s.ranges {
    for i in r {
        ... // something with i
    }
}

in favor of

for i in c.indices[s] {
    ... // something with i

}

because the latter is simpler, but also because the former is wrong with an arbitrary collection (the inner loop needs to be “for i in a.indices[r]”).

[Also, the property should be documented as being sorted by lowerBound and having non-overlapping elements.]

Conclusion

The proposed naming of these APIs may be a vestige of our initial failure to clearly answer the “what the heck is it?” question, and we should consider going all the way to reflecting a consistent view of the roles of RangeSet and Range. If we decide not to make the changes, we should at least understand why not.

Tino · December 23, 2019, 4:50pm

Afaics the preview-proposal isn't even accepted yet - and I think we (well, not actually we in the sense that I'm included ;-) are free to change processes so that they serve us better.
I'm convinced that the current process makes it extremely hard to add big changes to Swift, and that it's time to have an alternative.
Btw, what about Function builders - #374 by minuscorp or _modify? Looking at those two, I think we just have to acknowledge that Evolution doesn't work as it used to do...

Avi · December 23, 2019, 5:37pm

Has it ever really been the case that every change introduced since Apple started the evolution process has gone through a proposal? I see the process as providing 3rd-party input on proposed features, and a way to propose new features. Has it ever been the only way for Swift to receive new features?

David_Sweeris · December 24, 2019, 6:46am

Off the top of my head, I can’t think of anything that didn’t. Function builders and the other features introduced to support SwiftUI went up for review the day SwiftUI was announced, IIRC.

Avi · December 24, 2019, 6:51am

Those features are going to remain, and no review is going to change that. All we can do is shape the syntax and propose improvements. We can't provide any feedback that would cause Apple to remove them completely.

curt · December 26, 2019, 7:53pm

This leaning into the idea of RangeSet<Bound> being like a Set<Bound> is an interesting direction.

RangeSet(union: [0..<10, 10..<20]) seems like an improvement on the version with an unlabeled parameter. I think RangeSet(unionOf: [0..<10, 10..<20]) reads even better. Though I also wonder if this overload is needed at all. Apart from the new RangeSet.ranges, are there any other cases in the standard library where a sequence of Ranges is returned?

Similarly, given the type conversion enabled by public init(_ range: Range<Bound>) and the presence of the set operator methods, I wonder if insert and remove are necessary. That is, instead of:

s.insert(contentsOf: someRange)

it is equivalent, and perhaps clearer, to write:

s.formUnion(RangeSet(someRange))

At the least, this line of thinking makes clear that insert and remove are convenience API and must be justified on ergonomic grounds.

curt · December 26, 2019, 7:53pm

Agreed.

dabrahams · December 27, 2019, 3:13am

Works for me…

These are good questions. There is an efficiency argument for the constructor, which can pre-allocate required space (though it may over-allocate depending on the input sequence). The methods combining RangeSet with plain Ranges could avoid allocations altogether if RangeSet didn't have inline storage for a single Range, but IIUC the inline storage is planned,

One of my general concerns with the proposal has been that it has always been a bit too liberal about committing to new API. Because the proposal is bigger than it needs to be, it's easy to miss things that warrant scrutiny. The more instances we find of API that isn't strictly needed, the more inclined I am to ask that the proposal be pruned back to the absolute minimum set of basis operations.

dabrahams · January 6, 2020, 6:02pm

Miscellaneous Naming Issues

[This is the last of my separate threads].

a.range(at: i) is misleading; it strongly implies a is a collection of Ranges. The name I originally suggested to Nate was a.rangeOfElement(at: i), but I actually think a.regionOfElement(at: i) is more appropriate (see last bullet). This is an example where spending a few characters to avoid a potential misreading is totally worth it. Clarity is more important than brevity.
I've never been pleased with removeAll(where: p), because the commonality of the base name with removeAll(keepingCapacity:) strongly implies that these methods are a basic family. As such, it can easily be misread to mean “if p is satisfied, remove all elements.” The proposed method really means “remove (all elements where p is satisfied)” and is thus much more related to remove(at:) and Set's remove(value) method—which could be extended to RangeReplaceableCollection—so would be better named remove(where:) or even remove(allWhere:). We already have this API in RangeReplaceableCollection, but I believe the change could be made gradually without substantially disturbing the validity of extant code.
I'll repeat Chris' concern about gather conflicting with the naming of common scatter/gather operations from numerics. Aside from moveElements we could consider collect as a name but I'm just throwing it out there; not convinced it's better. FWIW, the argument against moveElements that goes “we avoid using ‘Elements’ in collection method names because it's redundant” is specious. We only avoid it when it's redundant, and a.move(…) would strongly imply that a is being moved, so it's not redundant here.
I'll repeat my concern about the use of the word “range” in many of the APIs here, because it drops semantic information: the relationship between the range and the collection. Without it, we could easily be speaking of ranges of element values. We're describing contiguous regions of the collection, or even of set element values.

Thanks for your patience with my many posts in this review, everybody.

dabrahams · January 14, 2020, 6:02pm

Since I can't tell what's happening with this review, I guess I should be explicit and answer all the questions here:

I think it's solving an important problem, but I think the proposal suffers from its evolution from something that was too large and not fully considered. It should not be accepted without at least having some robust discussion of the many issues I've raised in this thread.

Is the problem being addressed significant enough to warrant a change to Swift?

Yes.

Does this proposal fit well with the feel and direction of Swift?

Mostly.

If you have used other languages or libraries with a similar feature, how do you feel that this proposal compares to those?

My previous experience is with the STL; it only overlaps a little with what's in this proposal.

How much effort did you put into your review? A glance, a quick reading, or an in-depth study?

An in-depth study; I hope that's obvious from my other posts.

SE-0270 (Review #2): Add Collection Operations on Noncontiguous Elements

Drop ExpressibleByArrayLiteral Conformance

Usefulness

A Real Source of Confusion

Conclusion

Is RangeSet Really an OK Name?

RangeSet treats Ranges as sets; should we name operations accordingly?

1.

2.

3.

4.

Conclusion

Miscellaneous Naming Issues

Drop `ExpressibleByArrayLiteral` Conformance

Is `RangeSet` Really an OK Name?

`RangeSet` treats `Range`s as sets; should we name operations accordingly?