As we’ve been building out the Algorithms package, and looking toward future additions for searching and matching within strings and collections, we’ve identified some more kinds of collections of ranges, each with their own semantics and twists on their particular details.
The RangeSet
type, added via SE-0270
, represents a set of indices in a collection by coalescing them into non-overlapping, non-adjoining ranges. This makes it a great fit for storing the positions, of, say, all the alphanumeric characters in a string. However, it can’t represent all the occurrences of a substring, which might adjoin, or even overlap, and need to be treated as separate entities:
let matches = "abababa".overlappingRanges(of: "aba")
// equivalent to [0..<3, 2..<5, 4..<7]
We’d like to have a consistent set of APIs to use with all different kinds of collections of ranges, so that you can call methods like removeSubranges(_:)
without thinking about the specific semantics of a particular collection.
First, these are the types of range collections that I’ve identified so far:
- The
RangeSet
type represents all the parts of a collection that match some criteria. While it isn’t a collection itself, it presents a collection of the ranges it encompasses through itsranges
property — the ranges in this collection are always sorted in ascending order, and never overlap or even adjoin (e.g.0..<5
and5..<10
are coalesced into0..<10
). - The
SubSequenceFinder
(actual name tbd) type in progress in this PR is the result of a call toallRanges(of:)
. It is a collection of ranges that match the given subsequence, and its ranges are always in ascending order and never overlap. However, because each range represents a separate found instance, they may adjoin ("abcabc".allRanges(of: "abc")
yields the equivalent of[0..<3, 3..<6]
). Notably, this collection of ranges is only forward-traversable — reverse traversal can yield different subranges due to overlapping matches, so it is prohibited. -
FromEndSubSequenceFinder
is a separate type that yields ranges starting at the end of a collection rather than the beginning. These ranges do not overlap and are always in descending order, unlikeSubSequenceFinder
. - As noted in the introduction, the
OverlappingSubSequenceFinder
contains ranges that can overlap, and are always in ascending order. This collection yields the same ranges in either forward or backward order, so it can be bidirectional. - Any of the above collections can be transformed through collection operations, like
prefix()
,dropFirst()
,lazy.filter { ... }
, or evenreversed()
. - Finally, someone can manually generate an arbitrary collection of arbitrary ranges, with no guarantees about the relationship between ranges in terms of ordering or anything else.
Here’s a graphic depicting the different types described above:
So, we’d like to be able to use any of these types when calling operations that work with multiple ranges at once. The methods added as part of SE-0270 only take a RangeSet
as a parameter — what’s the best way to broaden the types that we can pass?
The primary challenge is that, in order for these operations to be efficient, the ranges that we pass must satisfy at least some of RangeSet
’s preconditions — being in ascending order and not overlapping — that aren’t reasonable to expect for the caller. That is, you should be able to write something like this without having to think about the order of the ranges in matches
:
let matches = myString.allRangesFromEnd(of: "needle")
myString.removeSubranges(matches.prefix(2))
To resolve this, I’d like to propose that we add overloads that are generic over collections of ranges. These overloads would convert the passed collection into a RangeSet
, and then call the RangeSet
-based method directly — the remove/move algorithms could continue to rely on the most efficient representations of the ranges that they’re working on. This approach is similar to our design for converting RangeExpression
types into concrete ranges, just without an explicit protocol.
extension RangeReplaceableCollection {
// Existing method
public mutating func removeSubranges(_ subranges: RangeSet<Index>)
// Additional method
public mutating func removeSubranges<S: Sequence>(_ subranges: S) where S.Element == Range<Index>
}
This approach would work for all of the types described above, but would include the overhead of generating the new RangeSet
instance, even when the passed collection already has ranges that meet the method’s requirements (like SubSequenceFinder
). Additionally, we would need two of each of these methods that use a RangeSet
as the parameter for their work.
Alternatively, we could introduce a protocol for collections of ranges. Types that conform to a hypothetical RangeProvidingCollection
protocol would implement a method that returns a collection of ranges in ascending order:
protocol RangeProvidingCollection: Collection where
Element == Range<Bound>
{
associatedtype Bound: Comparable
associatedtype AscendingRanges: Collection where
AscendingRanges.Element == Element
/// Returns non-overlapping ranges represented by this
/// collection, in ascending order.
func ascendingRanges() -> AscendingRanges
}
Of the types above, only SubSequenceFinder
could trivially conform to RangesProvider
, though RangeSet
could as well if we move the collection view from the ranges
property to the type itself. The remainder of the types would need to perform the same kind of sorting/coalescing as described above. In addition, RangeProvidingCollection
would be another, narrowly-scoped protocol on top of the already complex collection protocol hierarchy. Given that existing complexity, I’m reticent to add another with such a restricted purview.
What do people think between these two solutions? Or is this problem even worth solving, given that we could require that people manually coalesce ranges into a RangeSet
before calling these methods?