Pitch: Add RangeSet and Related Collection Operations

Hi folks!

Here's a pitch to fill a missing chunk of functionality in the standard library around noncontiguous ranges of elements. Please take a look at the draft proposal below, and/or try out this new functionality by using the SwiftPM package here: https://github.com/natecook1000/swift-evolution/tree/rangeset_and_friends

I would love to hear your feedback! Does this look like the right group of APIs? Are there any use cases that aren't covered here, or that you can see a better way to handle?


Add RangeSet and Related Collection Operations

Changelog

  • October 24, 2019: Renamed the move(...) methods to gather(...) for multiple-range moves and shift(from:toJustBefore:) for single-range and single-element moves to better reflect their behavior, and removed the move(from:to:) method. Added elements(within:) method for getting the individual index values in a RangeSet when the Bound type isn't integer-strideable.
  • October 31, 2019: Removed SetAlgebra conformance and the Elements collection view, as RangeSet can't guarantee correct semantics for individual index operations without the parent collection. Renamed elements(within:) to individualIndices(within:). Deferred the rotating and partitioning methods to a future pitch.

Introduction

We can use a range to address a single range of consecutive elements in a collection, but the standard library doesn't currently provide a way to access discontiguous elements. This proposes the addition of a RangeSet type that can store the location of any number of collections indices, along with collection operations that let us use a range set to access or modify the collection.

Motivation

There are many uses for tracking multiple elements in a collection, such as maintaining the selection in a list of items, or refining a filter or search result set after getting more input.

The Foundation data type most suited for this purpose, IndexSet, uses integers only, which limits its usefulness to arrays and other random-access collection types. The standard library is missing a collection that can efficiently store any number of indices, and is missing the operations that you might want to perform with such a collection of indices. These operations themselves can be challenging to implement correctly, and have performance traps as well β€” see last year's Embracing Algorithms WWDC talk for a demonstration.

Proposed solution

This proposal adds a RangeSet type for representing multiple, noncontiguous ranges, as well as a variety of collection operations for creating and working with range sets.

var numbers = Array(1...15)

// Find the indices of all the multiples of three
let indicesOfThree = numbers.indices(where: { $0.isMultiple(of: 3) })

// Perform an operation with just those multiples
let sumOfThrees = numbers[indicesOfThree].reduce(0, +)
// sumOfThrees == 45

// You can gather the multiples of 3 at the beginning
let rangeOfThree = numbers.gather(indicesOfThree, justBefore: 0)
// numbers[rangeOfThree] == [3, 6, 9, 12, 15]
// numbers == [3, 6, 9, 12, 15, 1, 2, 4, 5, 7, 8, 10, 11, 13, 14]

// Reset `numbers`
numbers = Array(1...15)

// You can also build range sets by hand using array literals...
let myRangeSet: RangeSet = [0..<5, 10..<15]
print(Array(numbers[myRangeSet]))
// Prints [1, 2, 3, 4, 5, 11, 12, 13, 14, 15]

// ...or by using set operations
let evenThrees = indicesOfThree.intersection(
    numbers.indices(where: { $0.isMultiple(of: 2) }))
print(Array(numbers[evenThrees]))
// Prints [6, 12]

The remainder of the RangeSet and collection operations, like inverting a range set or inserting and removing range expressions, are covered in the next section.

Detailed design

The RangeSet type is generic over any Comparable type, and supports fast containment checks for individual values, as well as adding and removing ranges of that type.

/// A set of ranges of any comparable value.
public struct RangeSet<Bound: Comparable> {
    /// Creates an empty range set.
    public init() {}

    /// Creates a range set with the given range.
    ///
    /// - Parameter range: The range to use for the new range set.
    public init(_ range: Range<Bound>)
    
    /// Creates a range set with the given ranges.
    ///
    /// - Parameter ranges: The ranges to use for the new range set.
    public init<S: Sequence>(_ ranges: S) where S.Element == Range<Bound>
    
    /// A Boolean value indicating whether the range set is empty.
    public var isEmpty: Bool { get }
    
    /// Returns a Boolean value indicating whether the given element is
    /// contained in the range set.
    ///
    /// - Parameter element: The element to look for in the range set.
    /// - Return: `true` if `element` is contained in the range set; otherwise,
    ///   `false`.
    ///
    /// - Complexity: O(log *n*), where *n* is the number of ranges in the
    ///   range set.
    public func contains(_ element: Bound) -> Bool
        
    /// Inserts the given range into the range set.
    ///
    /// - Parameter range: The range to insert into the set.
    public mutating func insert(_ range: Range<Bound>)
        
    /// Removes the given range from the range set.
    ///
    /// - Parameter range: The range to remove from the set.
    public mutating func remove(_ range: Range<Bound>)
}

RangeSet conforms to Equatable, CustomStringConvertible, and Hashable when its Bound type conforms to Hashable. RangeSet also has ExpressibleByArrayLiteral conformance, using arrays of ranges as its literal type.

SetAlgebra-like methods

RangeSet implements the methods of the SetAlgebra protocol that don't traffic in individual indices. Without access to a collection that can provide the index after an individual value, RangeSet can't reliably maintain the semantic guarantees of working with collection indices.

extension RangeSet {
    public func union(_ other: RangeSet<Bound>) -> RangeSet<Bound>
    public mutating func formUnion(_ other: RangeSet<Bound>)

    public func intersection(_ other: RangeSet<Bound>) -> RangeSet<Bound>
    public mutating func formIntersection(_ other: RangeSet<Bound>)

    public func symmetricDifference(_ other: RangeSet<Bound>) -> RangeSet<Bound>
    public mutating func formSymmetricDifference(_ other: RangeSet<Bound>)
    
    public func subtracting(_ other: RangeSet<Bound>) -> RangeSet<Bound>
    public mutating func subtract(_ other: RangeSet<Bound>)
    
    public func isSubset(of other: RangeSet<Bound>) -> Bool
    public func isSuperset(of other: RangeSet<Bound>) -> Bool
    public func isStrictSubset(of other: RangeSet<Bound>) -> Bool
    public func isStrictSuperset(of other: RangeSet<Bound>) -> Bool
}

Collection APIs

Adding or removing individual index values or range expressions requires passing the relevant collection, as well, for context. RangeSet includes initializers and insertion and removal methods for working with these kinds of values, as well as a way to "invert" a range set with respect to a collection.

extension RangeSet {
    /// Creates a new range set with the specified index in the given 
    /// collection.
    ///
    /// - Parameters:
    ///   - index: The index to include in the range set. `index` must be a
    ///     valid index of `collection` that isn't the collection's `endIndex`.
    ///   - collection: The collection that contains `index`.
    public init<C>(_ index: Bound, within collection: C)
        where C: Collection, C.Index == Bound
    
    /// Creates a new range set with the specified indices in the given
    /// collection.
    ///
    /// - Parameters:
    ///   - index: The index to include in the range set. `index` must be a
    ///     valid index of `collection` that isn't the collection's `endIndex`.
    ///   - collection: The collection that contains `index`.
    public init<S, C>(_ indices: S, within collection: C)
        where S: Sequence, C: Collection, S.Element == C.Index, C.Index == Bound
    
    /// Creates a new range set with the specified range expression.
    ///
    /// - Parameters:
    ///   - range: The range expression to use as the set's initial range.
    ///   - collection: The collection that `range` is relative to.
    public init<R, C>(_ range: R, within collection: C)
        where C: Collection, C.Index == Bound, R: RangeExpression, R.Bound == Bound
    
    /// Inserts the specified index into the range set.
    ///
    /// - Parameters:
    ///   - index: The index to insert into the range set. `index` must be a
    ///     valid index of `collection` that isn't the collection's `endIndex`.
    ///   - collection: The collection that contains `index`.
    public mutating func insert<C>(_ index: Bound, within collection: C)
        where C: Collection, C.Index == Bound
    
    /// Inserts the specified range expression into the range set.
    ///
    /// - Parameters:
    ///   - range: The range expression to insert into the range set.
    ///   - collection: The collection that `range` is relative to.
    public mutating func insert<R, C>(_ range: R, within collection: C)
        where C: Collection, C.Index == Bound, R: RangeExpression, R.Bound == Bound
    
    /// Removes the specified index from the range set.
    ///
    /// - Parameters:
    ///   - index: The index to remove from the range set. `index` must be a
    ///     valid index of `collection` that isn't the collection's `endIndex`.
    ///   - collection: The collection that contains `index`.
    public mutating func remove<C>(_ index: Bound, within collection: C)
        where C: Collection, C.Index == Bound
    
    /// Removes the specified range expression from the range set.
    ///
    /// - Parameters:
    ///   - range: The range expression to remove from the range set.
    ///   - collection: The collection that `range` is relative to.
    public mutating func remove<R, C>(_ range: R, within collection: C)
        where C: Collection, C.Index == Bound, R: RangeExpression, R.Bound == Bound

    /// Returns a range set that represents all the elements in the given
    /// collection that aren't represented by this range set.
    ///
    /// The following example finds the indices of the vowels in a string, and
    /// then inverts the range set to find the non-vowels parts of the string.
    ///
    ///     let str = "The rain in Spain stays mainly in the plain."
    ///     let vowels = "aeiou"
    ///     let vowelIndices = str.indices(where: { vowels.contains($0) })
    ///     print(String(str[vowelIndices]))
    ///     // Prints "eaiiaiaaiieai"
    ///
    ///     let nonVowelIndices = vowelIndices.inverted(within: str)
    ///     print(String(str[nonVowelIndices]))
    ///     // Prints "Th rn n Spn stys mnly n th pln."
    ///
    /// - Parameter collection: The collection that the range set is relative
    ///   to.
    /// - Returns: A new range set that represents the elements in `collection`
    ///   that aren't represented by this range set.
    public func inverted<C>(within collection: C) -> RangeSet
        where C: Collection, C.Index == Bound
}

Accessing Ranges and Individual Indices

RangeSet provides access to its ranges as a random-access collection via the ranges property. The individual indices in a collection that are represented by a range set are available as a bidirectional collection through the individualIndices(within:) method (see below for more about this method's return type).

extension RangeSet {
    public struct Ranges: RandomAccessCollection {
        public var startIndex: Int { get }
        public var endIndex: Int { get }
        public subscript(i: Int) -> Range<Bound>
    }
    
    /// A collection of the ranges that make up the range set.
    ///
    /// The ranges in this collection never overlap or adjoin, are never empty,
    /// and are always in ascending order.
    public var ranges: Ranges { get }
}

extension RangeSet {
    /// Returns a collection of all the indices represented by this range set
    /// within the given collection.
    ///
    /// The indices in the returned collection are unique and are stored in 
    /// ascending order. 
    ///
    /// - Parameter collection: The collection that the range set is relative
    ///   to.
    /// - Returns: A collection of the indices within `collection` that are
    ///   represented by this range set.
    public func individualIndices<C>(within collection: C) -> IndexingCollection<C.Indices>
        where C: Collection, C.Index == Bound
}

Storage

RangeSet will store its ranges in a custom type that will optimize the storage for known, simple Bound types. This custom type will avoid allocating extra storage for common cases of empty or single-range range sets.

New Collection APIs

Finding multiple elements

Akin to the firstIndex(...) and lastIndex(...) methods, this proposal introduces indices(where:) and indices(of:) methods that return a range set with the indices of all matching elements in a collection.

extension Collection {
    /// Returns the indices of all the elements that match the given predicate.
    ///
    /// For example, you can use this method to find all the places that a
    /// vowel occurs in a string.
    ///
    ///     let str = "Fresh cheese in a breeze"
    ///     let vowels: Set<Character> = ["a", "e", "i", "o", "u"]
    ///     let allTheVowels = str.indices(where: { vowels.contains($0) })
    ///     // str[allTheVowels].count == 9
    ///
    /// - Parameter predicate: A closure that takes an element as its argument
    ///   and returns a Boolean value that indicates whether the passed element
    ///   represents a match.
    /// - Returns: A set of the indices of the elements for which `predicate`
    ///   returns `true`.
    ///
    /// - Complexity: O(*n*), where *n* is the length of the collection.
    public func indices(where predicate: (Element) throws -> Bool) rethrows
        -> RangeSet<Index>
}

extension Collection where Element: Equatable {
    /// Returns the indices of all the elements that are equal to the given
    /// element.
    ///
    /// For example, you can use this method to find all the places that a
    /// particular letter occurs in a string.
    ///
    ///     let str = "Fresh cheese in a breeze"
    ///     let allTheEs = str.indices(of: "e")
    ///     // str[allTheEs].count == 7
    ///
    /// - Parameter element: An element to look for in the collection.
    /// - Returns: A set of the indices of the elements that are equal to 
    ///   `element`.
    ///
    /// - Complexity: O(*n*), where *n* is the length of the collection.
    public func indices(of element: Element) -> RangeSet<Index>
}

Accessing elements via RangeSet

When you have a RangeSet describing a group of indices for a collection, you can access those elements via a new subscript. This new subscript returns a new IndexingCollection type, which couples the collection and range set to provide access.

extension Collection {
    /// Accesses a view of this collection with the elements at the given
    /// indices.
    ///
    /// - Parameter indices: The indices of the elements to retrieve from this
    ///   collection.
    /// - Returns: A collection of the elements at the positions in `indices`.
    ///
    /// - Complexity: O(1)
    public subscript(indices: RangeSet<Index>) -> IndexingCollection<Self> { get }
}

extension MutableCollection {
    /// Accesses a mutable view of this collection with the elements at the
    /// given indices.
    ///
    /// - Parameter indices: The indices of the elements to retrieve from this
    ///   collection.
    /// - Returns: A collection of the elements at the positions in `indices`.
    ///
    /// - Complexity: O(1) to access the elements, O(*m*) to mutate the
    ///   elements at the positions in `indices`, where *m* is the number of
    ///   elements indicated by `indices`.
    public subscript(indices: RangeSet<Index>) -> IndexingCollection<Self> { get set }
}

/// A collection wrapper that provides access to the elements of a collection,
/// indexed by a set of indices.
public struct IndexingCollection<Base: Collection>: Collection {
    /// The collection that the indexed collection wraps.
    public var base: Base { get set }

    /// The set of index ranges that are available through this indexing
    /// collection.
    public var ranges: RangeSet<Base.Index> { get set }
    
    /// A position in an `IndexingCollection`.
    struct Index: Comparable {
        // ...
    }
    
    public var startIndex: Index { get }
    public var endIndex: Index { set }
    public subscript(i: Index) -> Base.Element { get }
}

IndexingCollection will conform to Collection, and conditionally conform to BidirectionalCollection and MutableCollection if its base collection conforms.

Moving elements

Within a mutable collection, you can gather the elements represented by a range set, inserting them before a specific index. This proposal also adds a method for gathering all the elements matched by a predicate, and methods for shifting a single range, a range expression, or a single element to a specific insertion point.

Whether you're working with a range set, a single range, or a single index, moving elements around in a collection can shift the relative position of the expected destination point. For that reason, these gathering and shifting methods return the new range or index of the elements that have been moved. To visualize this operation, divide the collection into two parts, before and after the insertion point. The elements to move are collected in order in that gap, and the resulting range (or index, for single element moves) is returned.

As an example, consider a shift from a range at the beginning of an array of letters to a position further along in the array:

var array = ["a", "b", "c", "d", "e", "f", "g"]
let newSubrange = array.shift(from: 0..<3, toJustBefore: 5)
// array == ["d", "e", "a", "b", "c", "f", "g"]
// newSubrange = 2..<5

//     β”Œβ”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”
//     β”‚  a  β”‚  b  β”‚  c  β”‚  d  β”‚  e  β”‚  f  β”‚  g  β”‚
//     β””β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”˜
//      ^^^^^^^^^^^^^^^^^               ^
//            source                insertion
//                                    point
//
//    β”Œβ”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”β”Œβ”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”β”Œβ”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”
//    β”‚  a  β”‚  b  β”‚  c  β”‚β”‚  d  β”‚  e  β”‚β”‚  f  β”‚  g  β”‚
//    β””β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”˜β””β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”˜β””β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”˜
//     ^^^^^^^^^^^^^^^^^              ^
//           source               insertion
//                                  point
//
//    β”Œβ”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”β”Œβ”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”β”Œβ”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”
//    β”‚  d  β”‚  e  β”‚β”‚  a  β”‚  b  β”‚  c  β”‚β”‚  f  β”‚  g  β”‚
//    β””β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”˜β””β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”˜β””β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”˜
//                  ^^^^^^^^^^^^^^^^^
//                 newSubrange == 2..<5

When shifting a single element, this can mean that the element ends up at the insertion point (when moving backward), or ends up at a position one before the insertion point (when moving forward, because the elements in between move forward to make room).

var array = ["a", "b", "c", "d", "e", "f", "g"]
let newPosition = array.shift(from: 1, toJustBefore: 5)
// array == ["a", "c", "d", "e", "b", "f", "g"]
// newPosition == 4

//    β”Œβ”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”β”Œβ”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”
//    β”‚  a  β”‚  b  β”‚  c  β”‚  d  β”‚  e  β”‚β”‚  f  β”‚  g  β”‚
//    β””β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”˜β””β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”˜
//             ^                     ^
//           source              insertion
//                                 point
//
//    β”Œβ”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”β”Œβ”€β”€β”€β”€β”€β”β”Œβ”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”
//    β”‚  a  β”‚  c  β”‚  d  β”‚  e  β”‚β”‚  b  β”‚β”‚  f  β”‚  g  β”‚
//    β””β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”˜β””β”€β”€β”€β”€β”€β”˜β””β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”˜
//                                ^
//                        newPosition == 4

The new gathering and shifting methods are listed below.

extension MutableCollection {
    /// Collects the elements at the given indices just before the specified
    /// index.
    ///
    /// This example finds all the uppercase letters in the array and gathers
    /// them between `"i"` and `"j"`.
    ///
    ///     var letters = Array("ABCdeFGhijkLMNOp")
    ///     let uppercase = letters.indices(where: { $0.isUppercase })
    ///     let rangeOfUppercase = letters.gather(uppercase, justBefore: 10)
    ///     // String(letters) == "dehiABCFGLMNOjkp"
    ///     // rangeOfUppercase == 4..<13
    ///
    /// - Parameters:
    ///   - indices: The indices of the elements to move.
    ///   - insertionPoint: The index to use as the destination of the elements.
    /// - Returns: The new bounds of the moved elements.
    ///
    /// - Complexity: O(*n* log *n*) where *n* is the length of the collection.
    @discardableResult
    public mutating func gather(
        _ indices: RangeSet<Index>, justBefore insertionPoint: Index
    ) -> Range<Index>

    /// Collects the elements that satisfy the given predicate just before the
    /// specified index.
    ///
    /// This example gathers all the uppercase letters in the array between
    /// `"i"` and `"j"`.
    ///
    ///     var letters = Array("ABCdeFGhijkLMNOp")
    ///     let rangeOfUppercase = letters.gather(justBefore: 10) { $0.isUppercase }
    ///     // String(letters) == "dehiABCFGLMNOjkp"
    ///     // rangeOfUppercase == 4..<13
    ///
    /// - Parameters:
    ///   - predicate: A closure that returns `true` for elements that should
    ///     move to `destination`.
    ///   - insertionPoint: The index to use as the destination of the elements.
    /// - Returns: The new bounds of the moved elements.
    ///
    /// - Complexity: O(*n* log *n*) where *n* is the length of the collection.
    @discardableResult
    public mutating func gather(
        justBefore: Index, where predicate: (Element) -> Bool
    ) -> Range<Index>
    
    /// Shifts the elements in the given range to the specified insertion point.
    ///
    /// - Parameters:
    ///   - range: The range of the elements to move.
    ///   - insertionPoint: The index to use as the insertion point for the 
    ///     elements. `insertionPoint` must be a valid index of the collection.
    /// - Returns: The new bounds of the moved elements.
    ///
    /// - Returns: The new bounds of the moved elements.
    /// - Complexity: O(*n*) where *n* is the length of the collection.
    @discardableResult
    public mutating func shift(
        from range: Range<Index>, toJustBefore insertionPoint: Index
    ) -> Range<Index>

    /// Shifts the elements in the given range expression to the specified
    /// insertion point.
    ///
    /// - Parameters:
    ///   - range: The range of the elements to move.
    ///   - insertionPoint: The index to use as the insertion point for the 
    ///     elements. `insertionPoint` must be a valid index of the collection.
    /// - Returns: The new bounds of the moved elements.
    ///
    /// - Complexity: O(*n*) where *n* is the length of the collection.
    @discardableResult
    public mutating func shift<R : RangeExpression>(
        from range: R, toJustBefore insertionPoint: Index
    ) -> Range<Index> where R.Bound == Index

    /// Moves the element at the given index to just before the specified
    /// insertion point.
    ///
    /// This method moves the element at position `i` to immediately before
    /// `insertionPoint`. This example shows moving elements forward and
    /// backward in an array of integers.
    ///
    ///     var numbers = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
    ///     let newIndexOfNine = numbers.shift(from: 9, toJustBefore: 7)
    ///     // numbers == [0, 1, 2, 3, 4, 5, 6, 9, 7, 8, 10]
    ///     // newIndexOfNine == 7
    ///
    ///     let newIndexOfOne = numbers.shift(from: 1, toJustBefore: 4)
    ///     // numbers == [0, 2, 3, 1, 4, 5, 6, 9, 7, 8, 10]
    ///     // newIndexOfOne == 3
    ///
    /// To move an element to the end of a collection, pass the collection's
    /// `endIndex` as `insertionPoint`.
    ///
    ///     numbers.shift(from: 0, toJustBefore: numbers.endIndex)
    ///     // numbers == [2, 3, 1, 4, 5, 6, 9, 7, 8, 10, 0]
    ///
    /// - Parameters:
    ///   - source: The index of the element to move. `source` must be a valid
    ///     index of the collection that isn't `endIndex`.
    ///   - insertionPoint: The index to use as the destination of the element.
    ///     `insertionPoint` must be a valid index of the collection.
    /// - Returns: The resulting index of the element that began at `source`.
    ///
    /// - Complexity: O(*n*) where *n* is the length of the collection.
    @discardableResult
    public mutating func shift(
        from source: Index, toJustBefore insertionPoint: Index
    ) -> Index
}

Removing elements

Within a range-replaceable collection, you can remove the elements represented by a range set. The implementation provides an additional, in-place overload for range-replaceable collections that also conform to MutableCollection.

extension RangeReplaceableCollection {
    /// Removes the elements at the given indices.
    ///
    /// For example, this code sample finds the indices of all the vowel
    /// characters in the string, and then removes those characters.
    ///
    ///     var str = "The rain in Spain stays mainly in the plain."
    ///     let vowels: Set<Character> = ["a", "e", "i", "o", "u"]
    ///     let vowelIndices = str.indices(where: { vowels.contains($0) })
    ///
    ///     str.removeAll(at: vowelIndices)
    ///     // str == "Th rn n Spn stys mnly n th pln."
    ///
    /// - Parameter indices: The indices of the elements to remove.
    ///
    /// - Complexity: O(*n*), where *n* is the length of the collection.
    public mutating func removeAll(at indices: RangeSet<Index>)
}

extension Collection {
    /// Returns a collection of the elements in this collection that are not
    /// represented by the given range set.
    ///
    /// For example, this code sample finds the indices of all the vowel
    /// characters in the string, and then retrieves a collection that omits
    /// those characters.
    ///
    ///     let str = "The rain in Spain stays mainly in the plain."
    ///     let vowels: Set<Character> = ["a", "e", "i", "o", "u"]
    ///     let vowelIndices = str.indices(where: { vowels.contains($0) })
    ///
    ///     let disemvoweled = str.removingAll(at: vowelIndices)
    ///     print(String(disemvoweled))
    ///     // Prints "Th rn n Spn stys mnly n th pln."
    ///
    /// - Parameter indices: A range set representing the elements to remove.
    /// - Returns: A collection of the elements that are not in `indices`.
    public func removingAll(at indices: RangeSet<Index>) -> IndexingCollection<Self>
}
24 Likes

To start off, great job on this detailed pitch, the large amount of effort put into it really shows!

I've had the chance to read through the full proposal on the GitHub link you provided and even played around with it a bit in Xcode. The concept of a single type to store discontiguous indices is awesome. This is something I have definitely found myself reaching for in the past and, due to the complexity of properly implementing such a type, being forced to unfortunately abandon. I think it provides fantastic and simple ways of performing previously cumbersome tasks. I am a massive +1 on this proposal, I think it definitely fills a gap in the Collection API.

That being said, I have a few questions/suggestions/corrections after having thoroughly read through the proposal.


Questions

1. ArrayLiteralType of RangeSet

I was wondering if you think there is any benefit of using Range<Bound> as ArrayLiteralType versus using ClosedRange<Bound>? At least for the the case where Bound == Int, it seems more straightforward to me that ArrayLiteralType be ClosedRange<Bound> (but that may just be me).

2. Non-mutating variants

In this proposal you introduce a lot of mutating functions on collection. Do you think there is any benefit in also introducing non-mutating counterparts for them?

3. Naming of move methods

In the proposal itself you do a great job illustrating how the move function works with excellent diagrams. That being said, I think that its naming in it of itself is a bit confusing. You give the following example in your proposal.

var array = ["a", "b", "c", "d", "e", "f", "g"]
let newSubrange = array.move(from: 0..<3, insertingAt: 5)
// array == ["d", "e", "a", "b", "c", "f", "g"]
// newSubrange = 2..<5

At first glance, I'd say that the result seems unintuitive because though I am "insertingAt" 5, none of the elements in the specified range are actually at index 5. I would have initially expected the following result.

// array == ["d", "e", "f", "a", "b", "c", "g"]
// newSubrange = 3...5 (3..<6)

I would also say that the method name aside from that is a bit unclear. The first half of the method name, move(from:, does not really express the from are indices of the collection and not something else like the elements themselves for example. I think some of the following names would make the operation a lot clearer to the reader:

  • moveElements(at:beforeIndex:)
  • move(at:beforeIndex:)

Also, I am curious as to your rationale for why move should be inserting elements before the specified index rather than just such index. I know you do have one example of this, move(from:to:) in the proposal. If other move methods can be replaced or added to exhibit the the latter behaviour (inclusive insertion at the specified index), method names such as the following I think would really improve clarity:

  • moveElements(at:to:)
  • move(at:to:)

And for the single element moving methods, maybe moveElement(at:beforeIndex:) and moveElement(at:to:) would be more clear opposed to move(from:insertedAt:) and move(from:to:) respectively.

4. Naming of rotate

I think that an alternative name such as "cycle" may impove clarity. Though its argument label, shiftingToStart explain its functionality well, when I see a method called "rotate" on an array, my mind usually jumps to rotation of matrices rather than cycling through/shifting the elements of an array. Maybe using a different name would help disambiguate between the two operations.


Documentation

1.

Seems like you meant to right one of those sentences rather than both.

2.

I think you probably meant to write "in" rather than "into".

3.

I think you may have meant to say "A set of the indices" rather than "A set of ranges". This change would also make the wording more consistent with that of the Return documentation for indices(where:).

4.

You mistakingly wrote numbers.halfStablePartition in the code example you provide in the documentation of stablePartition(by:).

5.

I think you copy and pasted the Return documentation section of the documentation from on of the other partitioning methods. The Return documentation of stablyPartitioned(by:) does not describe the value being returned from that function.


Redundancies

1.

Though this is not wrong per say, you only need to state conformance of Elements to BidirectionalCollection as BidirectionalCollection already inherits from both Sequence and Collection.


Overall, the intense care and effort that went into this proposal is quite apparent. This fills important gaps in the Collection API and I hope to see this make it in to Swift some time soon :crossed_fingers:.

5 Likes

Thanks so much for your detailed feedback, @Wildchild9! I've addressed the issues you found with the documentation and redundancies in the package and in the initial post, and have some answers for your questions below.


We can only choose one type to use as the array literal element type, and for non-integer indices, there's no way to convert a ClosedRange to a Range without access to the collection that produced the index. You can definitely still add ClosedRanges to a range set by calling the RangeExpression-based insert(_:within:) , you just need to also pass the collection.

The proposal adds non-mutating versions of stable partition and the removalAll(at:) method, but doesn't provide them for rotation or for moving elements. In the moving elements case, I think the mutating operation more often what you want, but it may make sense to including a non-mutating version.

Can you think of uses for non-mutating versions of moving elements, or for rotation?

About the behavior β€” when moving elements within a collection, what you're trying to do is reposition your selected elements (the range or range set you're passing in) with respect to the other elements in the collection. These are all insertion operations so that they can share the behavior of the existing insert(_:at:) method, which allows you to insert (or move) elements anywhere within the collection. If you pass startIndex you insert before all existing elements, and if you pass endIndex then you insert before the "past-the-end" index, or after all the existing elements.

var array = ["a", "b", "c", "d", "e", "f", "g"]
array.insert("H", at: 5)
array.insert("J", at: 0)
array.insert("K", at: array.endIndex)
// array == ["J", "a", "b", "c", "d", "e", "H", "f", "g", "K"]

The issue of where the elements end up is clouded in my examples by the fact that the elements that are being moved are getting out of the way as well. I should perhaps start by moving elements from the end toward the front. Is this example any more clear?

var array = ["a", "b", "c", "d", "e", "f", "g"]
let newSubrange = array.move(from: 4..<7, insertingAt: 1)
// newSubrange == 1..<4
// array == ["a", "e", "f", "g", "b", "c", "d"]
//
//     β”Œβ”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”
//     β”‚  a  β”‚  b  β”‚  c  β”‚  d  β”‚  e  β”‚  f  β”‚  g  β”‚
//     β””β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”˜
//              ^               ^^^^^^^^^^^^^^^^^
//          insertion                 source
//            point
//
//    β”Œβ”€β”€β”€β”€β”€β”β”Œβ”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”β”Œβ”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”
//    β”‚  a  β”‚β”‚  b  β”‚  c  β”‚  d  β”‚β”‚  e  β”‚  f  β”‚  g  β”‚
//    β””β”€β”€β”€β”€β”€β”˜β””β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”˜β””β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”˜
//           ^                   ^^^^^^^^^^^^^^^^^
//       insertion                     source
//         point
//
//    β”Œβ”€β”€β”€β”€β”€β”β”Œβ”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”β”Œβ”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”
//    β”‚  a  β”‚β”‚  e  β”‚  f  β”‚  g  β”‚β”‚  b  β”‚  c  β”‚  d  β”‚
//    β””β”€β”€β”€β”€β”€β”˜β””β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”˜β””β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”˜
//            ^^^^^^^^^^^^^^^^^
//           newSubrange == 1..<4
//                  range

It can also be instructive to look at how row rearrangement works in a spreadsheet app, such as Numbers. Here's a recording of me dragging rows around to mimic my example here and in the original post.

ScreenRecording2019-07-01at12293

Your points about the names are very good! I especially like moveElements, and think that may be an improvement. I'll think some more about these too.

You aren't the first person to bring up matrix rotation! We do have some proposals for cycle, which would turn a finite sequence or a collection into a repeating cycle of those elements, and I think fits better with that name. Also of note is that rotate is used by other languages with this feature (C++, Ruby, and Rust, to name a few), so it may be worth sticking with the term.


Thanks again for all the valuable feedback!

4 Likes

Ideally, we would use a type that can represent any of the range variants, so the array literal could have heterogenous elements. In particular, if we could use a PAT with all associated types fully constrained, then β€œRangeExpression where Bound == Self.Bound” would fit the bill perfectly.

That would allow a mixture of closed, half-open, and one-sided ranges within the array literal, and this seems like a good motivating example for that feature.

2 Likes

I'd still prefer one element type unless we can do heterogenous elements without any performance tradeoff. Since RangeSet works mostly with collection, it'd make sense to prefer Range over ClosedRange.

1 Like

It's more complicated than this. If you're suggesting permitting one-sided ranges too, then that cannot be converted to an internal representation that uses two-sided half-open ranges. You also cannot convert closed ranges to half-open ranges when dealing with unstridable bounds. So a design for this support could end up infecting the storage or even collection element type of the range set, which is probably too great a price to pay for what is otherwise somewhat of a nice-to-have feature.

I don’t understand the concern.

If we had the ability to store a value of type β€œRangeExpression where Bound == Self.Bound” then that would be the obvious Element type for whatever collection the RangeSet uses internally.

Why would we want to shoehorn everything into half-open ranges, if we could instead use a perfectly good type that’s able to represent all ranges?

1 Like

RangeExpression itself does not have an upper or lower bound, because it provides the ability to express ranges unbounded on either side. So if it did provide them (and this might make for a reasonable addition in future) they would be optional (or some similar enum) to express that they may not be present. Consider both the usability and performance consequences for this element type. A type-erased Element type like this would be disastrous for both performance and usability of the collection.

Looks like a good proposal on a skim. A couple comments:

  1. Why doesn’t RangeSet conform to RandomAccessCollection Itself instead of having a ranges view? I suppose users might assume they can loop over it to get elements and be a bit surprised that they get ranges instead, but the type system will tell them what’s happening.

  2. The doc comments for rotate should probably point out that you can rotate a portion of a collection by slicing it. Developers are still learning this Swift pattern and rotate needs it more than most Collection methods.

1 Like

I also feel like this type should have Range as it's Element, with projections for the individual elements within the ranges. Or rather two: one for strideable ranges that takes no argument; and another that takes a collection to advance the index for non-strideable ranges or ranges within things like lazily filtered collections.

The problem here is that SetAlgebra and the collection protocols both define an Element associated type, which needs to be the bound index type in SetAlgebra's case and Range when treated as a collection.

So right now we have the first elements projection, and you can get the second by using a range set on the collection's indices:

let str = "nOn-StRiDeAbLe CoLlEcTiOnS"
let ucIndices = str.indices(where: { $0.isUppercase })
let collectionOfIndices = str.indices[ucIndices]
for i in collectionOfIndices {
    print(str[i])
}

Would it make sense to have myRangeSet.elements(within: str) to capture that pattern?

This is a good call β€” I'll add some examples there.

Sounds good to me.

The pitch introduces stablePartition (mutating) and stablyPartitioned (non-mutating); perhaps it would be a good idea to unify these (stable/stably) terms somehow.

Otherwise, this an excellent proposal and I cannot wait to use these APIs.

Can you mutate the original collection through an IndexingCollection, like we can through a SubSequence?

What happens if the destination index is part of the moved set? For the single-element and contiguous-range versions, I'm guessing a no-op. But what happens for discontinuous move sets?

These methods don't use RangeSet (externally); are they still related to RangeSet, or did you add them just for the heck of it? If the latter, maybe they could go into a separate pitch.

(Rotation has been brought up at least twice before.)

For the second and third methods, "partition" is used as a verb, not a noun. So the modifier should be "stably." I'm still trying to figure out how to incorporate the "half" in the second method. Maybe use "partitionWithStablePrefix"?


Those are the same thing. If the moved elements go on top of the specified destination index, the target element would move forward; that's equivalent to them going before the target element.

Sometimes while implementing a type, you have to include all of the base protocols; protocol base implication does not work. Maybe that happened to the author.


"stably" is the adverb form of "stable," so the method pair's names are already unified. The term change in the modifier is to match the one in the main word ("partitioned"/"partition").

Ah, that's a good reason. An alternative would be to keep ranges and eliminate elements:

extension RangeSet: SetAlgebra, BidirectionalCollection where Bound: Strideable, Bound.Stride: SignedInteger {
    public init<S: Sequence>(_ sequence: S) where S.Element == Bound

    @discardableResult
    public mutating func insert(_ newMember: Bound)
        -> (inserted: Bool, memberAfterInsert: Bound)

    @discardableResult
    public mutating func remove(_ member: Bound) -> Bound?

    public mutating func update(with newMember: Bound) -> Bound?

    public typealias Index = FlattenSequence<Ranges>.Index
    
    public var startIndex: Index { get }
    public var endIndex: Index { get }                
    public subscript(i: Index) -> Bound { get }
}

Yep! IndexingCollection is mutable when its base collection is, and mutable collections have a { get set } subscript.

The elements in the range set that are below and above the insertion point are gathered into a single range, which is the return value for that call to move. The elements in the collection that aren't part of the range set are moved downward and upward, respectively, to make room.

var array = Array("ABCdeFGhijkLMNOp")
let set = array.indices(where: { $0.isUppercase })
let resultingRange = array.move(from: set, insertingAt: 13) // the index of 'N'
// String(array) == "dehijkABCFGLMNOp"
// String(array[resultingRange]) == "ABCFGLMNO"

They aren’t explicitly used by the range set implementations, but they still fall in the same bucket of related algorithms, especially with moves and removals. Good point about β€œstably” being usable as an adverb for the mutating method, too. Thinking about it that way, we could use either β€œstably partition this collection” or β€œperform a stable partition on this collection” as the root of the phrase that we convert into the method names. To me, β€œstably” is just an awkward word, so I’d rather settle on β€œstable” across the board if we can.

It also occurs to me that if we’re only having one nonmutating partition, we could just call it partitioned(by:) instead of extending the name with the precise semantics.

1 Like

I've thought quite a bit about this type, having used it in the form of Foundation.IndexSet, RangeSet in the static analyzer, and an interview question (where I've usually called it IntervalSet). As such, I have Opinions, specifically wondering about the choice of open ranges vs. closed ones. Open ranges are definitely more natural for working with collections in many ways, but one thing that they're not good at doing is working with single indexes. If the RangeSet's natural implementation used closed ranges,

  • you wouldn't need Strideable to implement SetAlgebra
  • as part of that, you don't need a Collection to insert a single element
  • you'd be able to include Int.max in the set
  • you'd have a canonical "empty" representation (no ranges)

On the other hand, though,

  • you start needing a Collection to add (half-open) Ranges
  • you'd have to go through collections to turn ClosedRanges back into Ranges.
  • you still need the existing Collection to do removals
  • you now need a Collection to ensure a canonical representation after insertion (joining adjacent ranges)

So it's a tradeoff, and that last one might be a deal-breaker. In the cases of Foundation.IndexSet and the analyzer's RangeSet, the tradeoffs are a little clearer because the element type is Strideable anyway (Int). But Strideable isn't even 100% correct for a Collection-Index-based RangeSet, because of things like "this Collection's indexes are all even integers" and 2 and 4 are considered "adjacent". (It's probably close enough, though.)


All of that said, I'm glad we're getting a version of this in the standard library, and I look forward to being able to use these APIs! The moves in particular are finicky business that I know every Cocoa developer implementing drag-and-drop in a table view added in a category to NSMutableArray, so they definitely satisfy the stdlib criteria of "useful" and "easy to get wrong".

2 Likes

I'm late to the discussion, but I'm very much against making the Range an Element of RangeSet, even without the collision with SetAlgebra and even if it's hinted at in the name of the type. The ranges are an implementation detail of the setβ€”an important one, for performance, but nonetheless. Iterating over the ranges is an optimization for bulk operations, not the primary purpose of the API.

3 Likes

Thanks, Jordan β€” I had fallen naturally onto Range as the underlying type, and hadn't fully explored using closed ranges instead. When I look at switching to that, the lack of being able to do removals without the index-providing collection seems like a big stumbling block, since that similarly rules out SetAlgebra conformance (which requires subtract and remove(_:)) for non-strideable bounds. I think that issue, coupled with the added burden of converting and canonizing the ranges before performing collection operations, keeps the tradeoff on the side of half-open ranges.

Agreed on this! I think most usage will essentially treat RangeSets as opaque, and just pass them in and out of the different collection methods.

2 Likes

I feel silly now. Yes, that lessens that side of the balancesheet considerably again. Oh well!

1 Like