Pitch: Add RangeSet and Related Collection Operations

My initial reactions were a strong "-1", but I have since been forced to change my mind.

At first glance (and I would say that the pitch could use some refinement in this regard), there's not much difference between the proposed RangeSet<T> and a Set<T> that has a backing storage of discontiguous ranges. For all the simple cases (and these are the only cases described in the pitch), this is accurate: A RangeSet<Int> is functionally identical to a Set<Int>.

However, after further examination, there is one major flaw with using a Set<T> instead of a RangeSet<T>, and that is with values that are not countable. A trivial example of this is Double; you cannot effectively count the number of double values between (for example) 1.0 and 1.1. Since you cannot count them, you cannot form a Set<T> of them, because Set has the requirement of providing a count.

Thus, after a lot of thought, I am reluctantly in favor of this proposal.

I assumed that the indexes here were guaranteed to be presented in sorted order, but only on reading your comment, Dave, do I realized that that's not specified! @nnnnnnnn, was that the intention?

(That still makes it equivalent to a SortedSet<T: Comparable>, though.)

If this method is added, we should transition it to being a customization point for RangeReplacableCollection (in a separate proposal), since I can imagine collections whose internals allow discontiguous removal, and we need to allow said collections a hook to save time.

(Previous discussions on a rotate method also desired a customization point on MutableCollection.)

Yes, that's definitely the intent! I'll make sure it gets into the proposal.

As an example. our String type. It's probably similar to Array and uses something like a MutableBufferPointer to store its data. A string is a vector of Unicode characters, which are vectors of code points, which are vectors of code units, but is stored as code units, using conventions to determine the higher-level boundaries. This bans String from conforming to RandomAccessCollection or MutableCollection. The internals of String most likely can do the RRC & MC optimized algorithm for removing elements, but we can't actually do it because String can't conform to MC. So making an opportunity for customizing the algorithm is needed.

1 Like

For those following the discussion, I've updated the pitch / proposal, including a changelog entry with what's different about this version:

  • Renamed the move(...) methods to gather(...) for multiple-range moves and shift(from:toJustBefore:) for single-range and single-element moves to better reflect their behavior.
  • Removed the move(from:to:) method.
  • Added elements(within:) method for getting the individual index values in a RangeSet when the Bound type isn't integer-strideable.
3 Likes

To add on to the current index searching methods proposed as extensions on Collection, namely indices(where:) and indices(of:), I think extending this functionality to subsequences of the collection might also be useful. I think something like this would fit the bill nicely:

extension Collection where Element: Equatable {
    public func indices<OtherCollection>(of subsequence: OtherCollection) -> RangeSet where OtherCollection: Collection, OtherCollection.Element == Element
}

There was a separate “matchers” pitch recently which covered this use case and more.

The pitch that Brent mentioned that would cover that kind of searching (looks fantastic and) is here: [Prototype] Protocol-powered generic trimming, searching, splitting,

3 Likes

I've been thinking about this, which seems to be the majority view, and why I react against it, which I think comes down to something the current design doesn't account for: empty ranges have meaning, and thinking of the range aspect as "just an optimization" excludes this possibility.

Consider the implementation of a new feature on RangeReplaceableCollection: a plural version of replaceSubrange. Something like...

var a = Array(0..<9)
let ranges = RangeSet([1..<6, 8..<9])
a.replaceSubranges(ranges, with: [99])
// a now [0,99,6,7,99] (I think :)

(avoiding designing the full feature now, which perhaps would support either a single sequence or a sequence of sequences on the rhs or both)

Now replaceSubrange singular works with empty ranges. That's what makes it the multitool that can implement most of the other requirements of RangeReplaceableCollection:

a.replaceSubrange(a.startIndex..<a.startIndex, with: 99)
// is equivalent to
a.insert(99, at: a.startIndex)

That is, empty ranges can be used to represent points in the collection. And in theory that ought to be possible with RangeSet too:

var a = Array(0..<9)
let ranges = RangeSet([0..<0],[3..<9])
a.replaceSubranges(ranges, with: [99])
// a now [99,0,1,99]

Now, unfortunately it's not so simple to add support for empty ranges to range set. It opens up a whole can of worms about the coalescing behavior, because it also implies that adjacent ranges should still have independent meaning and shouldn't be coalesced:

var a = Array(0..<9)
let ranges = RangeSet([0..<5],[5..<9])
a.replaceSubranges(ranges, with: [99])
// a now [99,99]

...and that really replaceSubranges should instead take a sequence or collection of ranges. But I think it's worth considering as part of this pitch, to ensure all the angles are covered.

1 Like

One other thing to factor in when considering the range-versus-individual-indices: conversion from ranges to indices can be dangerous, and generally to be discouraged in any design.

let c = (0..<10).lazy.filter { $0%2 == 0 }

// LazyFilterSequence<(Range<Int>)>.Index is Range<Int>.Index, 
// which is Int, which is strideable, so this compiles but doesn't
// do what you might hope...

for i in c.startIndex..<c.endIndex {
  print(c[i])
}

The pitch doesn't propose it, but this is a key reason to never make RangeSet conditionally a collection of the countable elements in the ranges. It may also be a reason not to expose the elements convenience property and to encourage people to always supply the collection used to advance the index.

1 Like

My primary concern with "this is a collection of ranges" is that it defeats canonicalization, as you mention: [0..<1, 1..<5] is no longer the same as [0..<5]. That makes insertion and removal much more complicated: what happens with overlapping ranges? Splitting ranges? I think ultimately the design of such a type would come out very different from this RangeSet: you'd be probably be construct-only or add-only, with a precondition of no overlaps. I don't think that type would be as generally useful.

4 Likes

This argument and example have convinced me of the danger of relying on Strideable conformance for these behaviors, so I've removed SetAlgebra conformance and the Elements view from the proposal. I'm also deferring the rotation and partition methods to a future pitch, so that they can be considered separately.

I've updated the proposal and linked package with these changes:

  • Removed SetAlgebra conformance and the Elements collection view, as RangeSet can't guarantee correct semantics for individual index operations without the parent collection.
  • Renamed elements(within:) to individualIndices(within:).
  • Deferred the rotating and partitioning methods to a future pitch.
2 Likes

Regarding the question about representing empty ranges, I think that's a use case that will be better handled by a different type (even just a sequence of ranges, as you mention). RangeSet is specifically intended to represent noncontiguous ranges of elements in a collection, not just a group of ranges, so including empty ranges would make the semantics of different operations very difficult to reason about (as @jrose pointed out, as well).

1 Like

individualIndices(within:) seems a little verbose. I think indices(within:) is probably just as clear. It's an overload of the indices property, so raises the specter of the count(where:) type checker perf issue, but in this case since RangeSet won't be a collection, that's probably not a concern.

It may be worth also considering a method that you give a stride to as an argument, that returns a sequence that will stride within all the ranges.

1 Like

Do we need this method? rangeSet.individualIndices(within: collection) is exactly equivalent to collection.indices[rangeSet]; can we teach people to do that instead?

2 Likes