[Pitch Revision #4] Add Collection Operations on Noncontiguous Elements

Hi all, a few years ago @nnnnnnnn proposed new additions to the standard library to describe collection operations over noncontiguous elements with a new type named RangeSet. While the final revision of the API design was approved, its implementation was reverted before release. I'd like to re-pitch this idea for inclusion in the standard library with a few modifications to the API surface to tweak a few issues and modernize the API. In particular, I'd like to highlight the following changes from the previously approved proposal:

  1. The subranges(where:)/subranges(of:) APIs were renamed to indices(where:)/indices(of:) to avoid conflicting with recently introduced APIs related to the string processing module
  2. I've added some missing conformances to Equatable/Hashable/Sendable where appropriate
  3. I've removed DiscontiguousSlice's MutableCollection conformance since this API is slightly misleading as it did not guarantee an in-place mutation and it is also hard to use and implement with predictable behavior. Rather than leading developers down the wrong path, I've chosen to remove the conformance.

Take a look below and let me know if you have any comments on the revived proposal!


Add Collection Operations on Noncontiguous Elements

Introduction

We can use a Range<Index> to refer to a group of consecutive positions in a collection, but the standard library doesn't currently provide a way to refer to discontiguous positions in an arbitrary collection. I propose the addition of a RangeSet type that can represent any number of positions, along with collection algorithms that operate on those positions.

Motivation

There are varied uses for tracking multiple elements in a collection, such as maintaining the selection in a list of items, or refining a filter or search result set after getting more input from a user.

The Foundation data type most suited for this purpose, IndexSet, uses integers only, which limits its usefulness to arrays and other random-access collection types. The standard library is missing a collection that can efficiently store ranges of indices, and is missing the operations that you might want to perform with such a collection. These operations themselves can be challenging to implement correctly, and have performance traps as well — see WWDC 2018's Embracing Algorithms talk for a demonstration.

Proposed solution

This proposal adds a RangeSet type for representing multiple, noncontiguous ranges, as well as a variety of collection operations for creating and working with range sets.

var numbers = Array(1...15)

// Find the indices of all the even numbers
let indicesOfEvens = numbers.indices(where: { $0.isMultiple(of: 2) })

// Perform an operation with just the even numbers
let sumOfEvens = numbers[indicesOfEvens].reduce(0, +)
// sumOfEvens == 56

// You can gather the even numbers at the beginning
let rangeOfEvens = numbers.moveSubranges(indicesOfEvens, to: numbers.startIndex)
// numbers == [2, 4, 6, 8, 10, 12, 14, 1, 3, 5, 7, 9, 11, 13, 15]
// numbers[rangeOfEvens] == [2, 4, 6, 8, 10, 12, 14]

I've posted the full proposal including the detailed design and additional sections as a gist here.

14 Likes

I've updated the gist to have a link to the current proposed implementation of this pitch - I'll update that PR as I finalize the optimizations for the implementations while I complete benchmarking of the implementation.

1 Like

If RangeSet is not a set of ranges, it's misnamed. Don't describe its implementation detail; describe the abstraction it presents to the user.

5 Likes

The range-iness of RangeSet isn’t just an implementation detail, it’s a core part of the API and the semantics of the type. When building up a range set, you add and subtract ranges of values rather than individual elements, and the way those ranges are merged and maintained is visible through the type’s interface.

In the time since this proposal was accepted for previewing, members of the Swift community have had a chance to use RangeSet for a variety of purposes, such as:

  • non-contiguous selections in a collection (as prominently covered in the proposal), representing UI selections and other uses
  • ranges of initialized memory within a larger block, when implementing a simple slab allocator
  • ranges of times that a resource is available

Each of those uses is distinct from what you would use a regular Set for, and relies on the particular range-based behavior of RangeSet.

7 Likes

Sounds like you have the right name then.

4 Likes

Thanks for providing this missing feature.

I suppose that RangeSet should provide documentation with meaningful examples and area of usage. Even in this pitch it is compared to IndexSet.
The IndexSet can be represented by OrderedSet / SortedSet for representing a Set of individual elements. So it should be clear what is the difference between any kind of _Set<Index> and RangeSet.