Offset Indexing and Slicing

v2 changes

  • Removed .start and .end, now all OffsetBounds are built from .first and .last.
    • Simplifies usage and learnability.
    • Delineates OffsetBound abstraction and terminology from indices.
  • Added RangeReplaceableCollection convenience overloads, subscript setters.
    • .insert(at:), .remove(at:), etc.,
    • Makes OffsetBound broadly useful

Offset-Based Access to Indices, Elements, and Slices

Introduction

This proposal introduces OffsetBound, which can represent a position in a collection specified as an offset from either the beginning or end of the collection (i.e. the collection’s “bounds”). Corresponding APIs provide a more convenient abstraction over indices. The goal is to alleviate an expressivity gap in collection APIs by providing easy and safe means to access elements, indices, and slices from such offsets.

If you would like to try it out, you can just copy-paste from this gist, which includes the functionality as well as test cases and examples. This work is the culmination of prior discussion from an earlier thread, the thread before that, and @Letan ’s original thread. The latest pitch thread can be found here.

Motivation

Easily and Safely Getting Indices, Elements, and Slices from Offsets

Collection’s current index manipulation methods are meant to represent the lowest level programming interface, and as such they impose requirements on their use which are important for performance. Violations of these requirements are treated as irrecoverable logic errors, trapping whenever they lead to a potential memory safety issue. But, Collection lacks higher-level APIs that allow the programmer to treat such violations as recoverable domain errors. This proposal addresses the gap.

Extracting elements and slices from offsets into a collection is an onerous task, requiring manually advancing indices. This proposal offers an ergonomic approach to offsetting from the start or end of collections, including their slices.

This commonly comes up with casual String usage, and aligns with the String Essentials effort.

For a simple example taken from Advent of Code 2018 Day 7 , we want a function taking a string where each line is of the form Step C must be finished before step A can begin. , and returns an array representing the requirement (finish: "C", before: "A").

func parseRequirements(_ s: String) -> [(finish: Character, before: Character)] {
  s.split(separator: "\n").map { line in
    let finishIdx = line.index(line.startIndex, offsetBy: 5) // 5 after first
    let beforeIdx = line.index(line.endIndex, offsetBy: -12) // 11 before last
    return (line[finishIdx], line[beforeIdx])
  }
}

Advancing indices by hand through line.index(line.startIndex, offsetBy: 5) is fairly obnoxious and distracts from the intent of the code.

Alternatively, we could take a detour through forming SubSequences:

func parseRequirements(_ s: String) -> [(finish: Character, before: Character)] {
  s.split(separator: "\n").map { line in
    (line.dropFirst(5).first!, line.dropLast(11).last!)
  }
}

This results in less boilerplate code, but the detour through slicing APIs increases the cognitive load. Anyone reading the code has to jump through mental hoops, and the code author has more details to reason through (when we first wrote this example, we had an off-by-one error).

Instead, this proposal provides a way to directly extract elements from known offsets.

Common Bugs and Assumptions in Int-Indexed Collections

When a collection’s index type happens to be Int, it’s a common mistake to assume that such indices start from zero. For an example from this forum post, an Int range’s indices are the very elements themselves; they don’t start at zero.

print((3..<10)[5...]) // 5..<10

Slices share the same indices with the base collection. Assuming indices start from zero can be especially pernicious in generic code and when working with self-sliced types such as Data.

func fifth<C: Collection>(_ c: C) -> C.Element? where C.Index == Int {
  return c.count >= 5 ?? c[4] : nil
}

let array = [1,2,3,4,5,6,7,8,9]
print(fifth(array)!) // 5
print(fifth(array[2...])!) // still `5`, but `7` would be the real fifth item

func fifth(_ data: Data) -> UInt8? {
  return data.count >= 5 ?? data[4] : nil
}

var data = Data([1, 2, 3, 4, 5, 6, 7, 8, 9])
print(fifth(data)!) // 5

data = data.dropFirst(2)
print(fifth(data)!) // still `5`, but `7` is the real fifth item

Common advice when working with Data is to index by adding to the start index, as in data[data.startIndex + 4]. However, even this approach is not valid in a generic context (even for random access collections). Fetching an index and then performing integer arithmetic is different than advancing a position:

struct EveryOther<C: RandomAccessCollection>: RandomAccessCollection {
  internal var storage: C

  var startIndex: C.Index { storage.startIndex }
  var endIndex: C.Index {
    if storage.count % 2 == 0 { return storage.endIndex }
    return storage.index(before: storage.endIndex)
  }

  subscript(position: C.Index) -> C.Element { storage[position] }

  func index(before i: C.Index) -> C.Index { storage.index(i, offsetBy: -2) }
  func index(after i: C.Index) -> C.Index { storage.index(i, offsetBy: 2) }
  // ... and override `distance`, `index(_:offsetBy:)` for performance ...
}

let everyOther = EveryOther(storage: [1,2,3,4,5,6,7,8])
print(everyOther.contains(2)) // false
print(everyOther.contains(3)) // true

let startIdx = everyOther.startIndex
print(everyOther[startIdx + 1]) // 2, but everyOther doesn't even contain 2!
print(everyOther[everyOther.index(after: startIdx)]) // 3

This proposal provides a way to have offset-based element access for such collections, with similar expressivity but with explicit bounds for clarity.

Proposed solution

We propose convenient subscripts for slicing, single-element retrieval, and fetching an index from an offset:

let str = "abcdefghijklmnopqrstuvwxyz"
print(str[.first + 3 ..< .first + 6]) // "def"
print(str[.first + 3 ..< .last - 2]) // "defghijklmnopqrstuvw"
print(str[.first + 3 ..< .last - 22]) // "",
print(str[.last]) // Optional("z")
print(str[.last - 1]) // Optional("y")
print(str[.first + 26]) // nil
print(str[(.last - 3)...]) // "wxyz"

print(str.index(at: .last - 1)) // Optional(... index of "y")
print(str.index(at: .end - 26)) // Optional(... index of "a")
print(str.index(at: .end - 27)) // nil

The parseRequirements example from above can be written as:

func parseRequirements(_ s: String) -> [(finish: Character, before: Character)] {
  s.split(separator: "\n").map { line in
    (line[.first + 5]!, line[.last - 11]!)
  }
}

These APIs are available on all Collections, allowing a more general solution. The Advent of Code exercise only requires the extraction and comparison of the elements at the corresponding positions, so we can generalize to:

func parseRequirements<C: Collection>(
  _ c: C, lineSeparator: C.Element
) -> [(finish: C.Element, before: C.Element)] where C.Element: Comparable {
  c.split(separator: lineSeparator).map { line in
    (line[.first + 5]!, line[.last - 11]!)
  }
}

Here, the line[.last - 11] will run in constant-time if c conforms to BidirectionalCollection, or in linear-time if it does not (since we have to count from the front). This algorithmic guarantee is added as part of this proposal, without which this generalization cannot be done.

These also address the expressivity issues and assumptions with Int-indexed Collections above:

print((3..<10)[(.first + 5)...]) // 8..<10

func fifth<C: Collection>(_ c: C) -> C.Element? { return c[.first + 4] }

let array = [1,2,3,4,5,6,7,8,9]
print(fifth(array)!) // 5
print(fifth(array[2...])!) // 7

var data = Data([1, 2, 3, 4, 5, 6, 7, 8, 9])
print(fifth(data)!) // 5
data = data.dropFirst(2)
print(fifth(data)!) // 7

let everyOther = EveryOther(storage: [1,2,3,4,5,6,7,8])
print(everyOther[.first + 1]!) // 3

Detailed design

This proposal adds an OffsetBound struct, representing a position at an offset from the start or end of a collection.

/// A position in a collection specified as an offset from either the first
/// or last element of the collection (i.e. the collection's bounds).
///
/// You can use an `OffsetBound` to access an index or element of a collection
/// as well as extract a slice. For example:
///
///       let str = "abcdefghijklmnopqrstuvwxyz"
///       print(str[.last]) // Optional("z")
///       print(str[.last - 2]) // Optional("x")
///       print(str[.first + 26]) // nil
///       print(str[.first + 3 ..< .first + 6]) // "def"
///       print(str[.first + 3 ..< .last - 2]) // "defghijklmnopqrstuvw"
///
/// `OffsetBound`s also provide a convenient way of working with slice types
/// over collections whose index type is `Int`. Slice types share indices with
/// their base collection, so `0` doesn't always mean the first element. For
/// example:
///
///     let array = [1,2,3,4,5,6]
///     print(array[2...][3) // 4
///     print(array[2...][.first + 3]!) // 6
///
public struct OffsetBound {
  /* internally stores an enum, not ABI/API */

  /// The position of the first element of a nonempty collection, corresponding
  /// to `startIndex`.
  public static var first: OffsetBound

  /// The position of the last element of a nonempty collection, corresponding
  /// to `index(before: endIndex)`.
  public static var last: OffsetBound

  /// Returns a bound that offsets the given bound by the specified distance.
  ///
  /// For example:
  ///
  ///     .first + 2  // The position of the 3rd element
  ///     .last + 1   // One past the last element, corresponding to `endIndex`
  ///
  public static func +(_ lhs: OffsetBound, _ rhs: Int) -> OffsetBound

  /// Returns a bound that offsets the given bound by the specified distance
  /// backwards.
  ///
  /// For example:
  ///
  ///     .last - 2 // Two positions before the last element's position
  ///
  public static func -(_ lhs: OffsetBound, _ rhs: Int) -> OffsetBound
}

OffsetBound is Comparable, and as such can be used as a bound type for RangeExpressions.

extension OffsetBound: Comparable {
  /// Compare the positions represented by two `OffsetBound`s.
  ///
  /// Offsets relative to `.first` are always less than those relative to
  /// `.last`, as there are arbitrarily many offsets between the two
  /// extremities. Offsets from the same bound are ordered by their
  /// corresponding positions. For example:
  ///
  ///     .first + n < .last - m    // true for all values of n and m
  ///     .first + n < .first + m  // equivalent to n < m
  ///     .last - n < .last - m      // equivalent to n > m
  ///
  public static func < (_ lhs: OffsetBound, _ rhs: OffsetBound) -> Bool

  /// Compare two `OffsetBound`s to see if they represent equivalent positions.
  ///
  /// This is only true if both offset the same bound by the same amount. For
  /// example:
  ///
  ///     .first + n == .last - m    // false for all values of n and m
  ///     .first + n == .first + m  // equivalent to n == m
  ///
  public static func == (_ lhs: OffsetBound, _ rhs: OffsetBound) -> Bool
}

Collection gets an API to retrieve an index from an OffsetBound if it exists, a subscript to retrieve an element from an OffsetBound if it exists, and a slicing subscript to extract a range.

extension Collection {
  /// Returns the corresponding index for the provided offset, if it exists,
  /// else returns nil.
  ///
  /// - Complexity:
  ///   - O(1) if the collection conforms to `RandomAccessCollection`.
  ///   - O(*k*) where *k* is equal to the offset if the collection conforms to
  ///     `BidirectionalCollection`.
  ///   - O(*k*) if `position` is `.first + n` for any n, or `.last + 1`.
  ///   - Otherwise, O(*n*) where *n* is the length of the collection.
  public func index(at position: OffsetBound) -> Index?

  /// Returns the corresponding element for the provided offset, if it exists,
  /// else returns nil.
  ///
  /// Example:
  ///
  ///       let abcs = "abcdefg"
  ///       print(abcs[.last]) // Optional("g")
  ///       print(abcs[.last - 2]) // Optional("e")
  ///       print(abcs[.first + 8]) // nil
  ///
  /// - Complexity:
  ///   - O(1) if the collection conforms to `RandomAccessCollection`.
  ///   - O(*k*) where *k* is equal to the offset if the collection conforms to
  ///     `BidirectionalCollection`.
  ///   - O(*k*) if `position` is `.first + n` for any n, or `.last + 1`.
  ///   - Otherwise, O(*n*) where *n* is the length of the collection.
  public subscript(position: OffsetBound) -> Element?

  /// Returns the contiguous subrange of elements corresponding to the provided
  /// offsets.
  ///
  /// Example:
  ///
  ///       let abcs = "abcdefg"
  ///       print(abcs[.first + 1 ..< .first + 6]) // "bcdef"
  ///       print(abcs[.first + 1 ..< .last - 1]) // "bcde"
  ///
  /// - Complexity:
  ///   - O(1) if the collection conforms to `RandomAccessCollection`.
  ///   - O(*k*) where *k* is equal to the larger offset if the collection
  ///     conforms to `BidirectionalCollection`.
  ///   - O(*k*) if the offsets are `.first + n` for any n or `.last + 1`.
  ///   - Otherwise, O(*n*) where *n* is the length of the collection.
  public subscript<ORE: RangeExpression>(
    range: ORE
  ) -> SubSequence where ORE.Bound == OffsetBound
}

RangeReplaceableCollection gets corresponding APIs in terms of OffsetBound, as well as subscript setters.

extension RangeReplaceableCollection {
  /// Replaces the specified subrange of elements with the given collection.
  ///
  /// This method has the effect of removing the specified range of elements
  /// from the collection and inserting the new elements at the same location.
  /// The number of new elements need not match the number of elements being
  /// removed.
  ///
  /// In this example, two characters in the middle of a string are
  /// replaced by the three elements of a `Repeated<Character>` instance.
  ///
  ///      var animals = "🐕🐈🐱🐩"
  ///      let dogFaces = repeatElement("🐶" as Character, count: 3)
  ///      animals.replaceSubrange(.first + 1 ... .last - 1, with: dogFaces)
  ///      print(animals)
  ///      // Prints "🐕🐶🐶🐶🐩"
  ///
  /// If you pass a zero-length range as the `subrange` parameter, this method
  /// inserts the elements of `newElements` at `subrange.startIndex`. Calling
  /// the `insert(contentsOf:at:)` method instead is preferred.
  ///
  /// Likewise, if you pass a zero-length collection as the `newElements`
  /// parameter, this method removes the elements in the given subrange
  /// without replacement. Calling the `removeSubrange(_:)` method instead is
  /// preferred.
  ///
  /// Calling this method may invalidate any existing indices for use with this
  /// collection.
  ///
  /// - Parameters:
  ///   - subrange: The subrange of the collection to replace, specified as
  ///   offsets from the collection's bounds.
  ///   - newElements: The new elements to add to the collection.
  ///
  /// - Complexity: O(*n* + *m*), where *n* is length of this collection and
  ///   *m* is the length of `newElements`. If the call to this method simply
  ///   appends the contents of `newElements` to the collection, the complexity
  ///   is O(*m*).
  public mutating func replaceSubrange<C: Collection, R: RangeExpression>(
    _ subrange: R, with newElements: __owned C
  ) where C.Element == Element, R.Bound == OffsetBound

  /// Inserts a new element into the collection at the specified position.
  ///
  /// The new element is inserted before the element currently at the specified
  /// offset. If you pass `.last + 1` as the `position` parameter, corresponding
  /// to the collection's `endIndex`, the new element is appended to the
  /// collection.
  ///
  ///     var numbers = "12345"
  ///     numbers.insert("Ⅸ", at: .first + 1)
  ///     numbers.insert("𐄕", at: .last + 1)
  ///
  ///     print(numbers)
  ///     // Prints "1Ⅸ2345𐄕"
  ///
  /// Calling this method may invalidate any existing indices for use with this
  /// collection.
  ///
  /// - Parameter newElement: The new element to insert into the collection.
  /// - Parameter `position`: The position at which to insert the new element,
  ///   specified as offsets from the collection's bounds
  ///
  /// - Complexity: O(*n*), where *n* is the length of the collection. If
  ///   `position == .last + 1`, this method is equivalent to `append(_:)`.
  public mutating func insert(
    _ newElement: __owned Element, at position: OffsetBound
  )

  /// Inserts the elements of a sequence into the collection at the specified
  /// position.
  ///
  /// The new elements are inserted before the element currently at the
  /// specified offset. If you pass `.last + 1` as the `position` parameter,
  /// corresponding to the collection's `endIndex`, the new elements are
  /// appended to the collection.
  ///
  /// Here's an example of inserting vulgar fractions in a string of numbers.
  ///
  ///     var numbers = "12345"
  ///     numbers.insert(contentsOf: "↉⅖⅑", at: .first + 2)
  ///     print(numbers)
  ///     // Prints "12↉⅖⅑345"
  ///
  /// Calling this method may invalidate any existing indices for use with this
  /// collection.
  ///
  /// - Parameter newElements: The new elements to insert into the collection.
  /// - Parameter `position`: The position at which to insert the new elements,
  ///   specified as offsets from the collection's bounds
  ///
  /// - Complexity: O(*n* + *m*), where *n* is length of this collection and
  ///   *m* is the length of `newElements`. If `position == .last + 1`, this
  ///   method is equivalent to `append(contentsOf:)`.
  public mutating func insert<S: Collection>(
    contentsOf newElements: __owned S, at position: OffsetBound
  ) where S.Element == Element

  /// Removes and returns the element at the specified position, if it exists,
  /// else returns nil.
  ///
  /// All the elements following the specified position are moved to close the
  /// gap.
  ///
  /// Example:
  ///     var measurements = [1.2, 1.5, 2.9, 1.2, 1.6]
  ///     let removed = measurements.remove(at: .last - 2)
  ///     print(measurements)
  ///     // Prints "[1.2, 1.5, 1.2, 1.6]"
  ///     print(measurements.remove(at: .first + 4))
  ///     // Prints nil
  ///
  /// Calling this method may invalidate any existing indices for use with this
  /// collection.
  ///
  /// - Parameter position: The position of the element to remove, specified as
  ///   an offset from the collection's bounds.
  /// - Returns: The removed element if it exists, else nil
  ///
  /// - Complexity: O(*n*), where *n* is the length of the collection.
  mutating func remove(at position: OffsetBound) -> Element?

  /// Removes the elements in the specified subrange from the collection.
  ///
  /// All the elements following the specified position are moved to close the
  /// gap. This example removes two elements from the middle of a string of
  /// rulers.
  ///
  ///     var rulers = "📏🤴👑📐"
  ///     rulers.removeSubrange(.first + 1 ... .last - 1)
  ///     print(rulers)
  ///     // Prints "📏📐"
  ///
  /// Calling this method may invalidate any existing indices for use with this
  /// collection.
  ///
  /// - Parameter range: The range of the collection to be removed, specified
  ///   as offsets from the collection's bounds.
  ///
  /// - Complexity: O(*n*), where *n* is the length of the collection.
  public mutating func removeSubrange<R: RangeExpression>(
    _ range: R
  ) where R.Bound == OffsetBound

  /// Accesses the element corresponding to the provided offset. If no element
  /// exists, `nil` is returned from the getter. Similarly, setting an element
  /// to `nil` will remove the element at that offset.
  ///
  /// Example:
  ///
  ///       let abcs = "abcdefg"
  ///       print(abcs[.last]) // Optional("g")
  ///       print(abcs[.last - 2]) // Optional("e")
  ///       print(abcs[.first + 8]) // nil
  ///       abcs[.first + 2] = "©"
  ///       print(abcs) // "ab©defg"
  ///       abcs[.last - 1] = nil
  ///       print(abcs) // "ab©deg"
  ///
  /// - Complexity (get):
  ///   - O(1) if the collection conforms to `RandomAccessCollection`.
  ///   - O(*k*) where *k* is equal to the offset if the collection conforms to
  ///     `BidirectionalCollection`.
  ///   - O(*k*) if `position` is `.first + n` for any n, or `.last + 1`.
  ///   - Otherwise, O(*n*) where *n* is the length of the collection.
  ///
  /// - Complexity (set):
  ///   - O(*n*) where *n* is the length of the collection.
  public subscript(position: OffsetBound) -> Element? { get set }

  /// Accesses the contiguous subrange of elements corresponding to the provided
  /// offsets.
  ///
  /// Example:
  ///
  ///       var abcs = "abcdefg"
  ///       print(abcs[.first + 1 ..< .first + 6]) // "bcdef"
  ///       print(abcs[.first + 1 ..< .last - 1]) // "bcde"
  ///       abcs[.first ... .first + 3] = "🔡"
  ///       print(abcs) // "🔡efg"
  ///
  /// - Complexity (get):
  ///   - O(1) if the collection conforms to `RandomAccessCollection`.
  ///   - O(*k*) where *k* is equal to the larger offset if the collection
  ///     conforms to `BidirectionalCollection`.
  ///   - O(*k*) if the offsets are `.first + n` for any n or `.last + 1`.
  ///   - Otherwise, O(*n*) where *n* is the length of the collection.
  ///
  /// - Complexity (set):
  ///   - O(*n*) where *n* is the length of the collection.
  public subscript<ORE: RangeExpression>(
    range: ORE
  ) -> SubSequence where ORE.Bound == OffsetBound { get set }

This proposal adds a new “internal” (i.e. underscored) customization hook to Collection to apply a reverse offset to a given index. Unlike index(_:offsetBy:limitedBy:) which will trap if the collection is not bidirectional, this will instead advance startIndex.

public protocol Collection: Sequence {
  /// Returns an index `distance` positions prior to `i` if it exists.
  ///
  /// Other methods such as `index(_:offetBy:)` must not be passed a negative
  /// offset if the collection is bidirectional. This method will perform a
  /// negative offset even if the collection is not bidirectional, by using a
  /// less efficient means. `BidirectionalCollection` customizes this with a
  /// more efficient implementation.
  ///
  /// - Parameters
  ///   - i: a valid index of the collection.
  ///   - distance: The distance to offset `i` backwards. `distance` must be
  ///     positive or zero.
  /// - Returns: The index `distance` positions prior to `i` if in bounds, else
  ///   `nil`.
  ///
  /// - Complexity:
  ///   - O(1) if the collection conforms to `RandomAccessCollection`.
  ///   - O(*k*), where *k* is equal to `distance` if the collection conforms
  ///     to `BidirectionalCollection`.
  ///   - Otherwise, O(*n*), where *n* is the length of the collection.
  func _reverseOffsetIndex(_ i: Index, by distance: Int) -> Index?
}

extension Collection {
  // Scans from the start
  public func _reverseOffsetIndex(_ i: Index, by distance: Int) -> Index?
}
extension BidirectionalCollection {
  // Reverse offsets
  public func _reverseOffsetIndex(_ i: Index, by distance: Int) -> Index?
}

Change Collection.suffix() and Collection.dropLast() to use this hook, which will improve algorithmic complexity in generic code when the collection happens to be bidirectional.

extension Collection {
   /// Returns a subsequence containing all but the specified number of final
   /// elements.
   ///
   /// If the number of elements to drop exceeds the number of elements in the
   /// collection, the result is an empty subsequence.
   ///
   ///     let numbers = [1, 2, 3, 4, 5]
   ///     print(numbers.dropLast(2))
   ///     // Prints "[1, 2, 3]"
   ///     print(numbers.dropLast(10))
   ///     // Prints "[]"
   ///
   /// - Parameter k: The number of elements to drop off the end of the
   ///   collection. `k` must be greater than or equal to zero.
   /// - Returns: A subsequence that leaves off the specified number of elements
   ///   at the end.
   ///
-  /// - Complexity: O(1) if the collection conforms to
-  ///   `RandomAccessCollection`; otherwise, O(*n*), where *n* is the length of
-  ///   the collection.
+  /// - Complexity:
+  ///   - O(1) if the collection conforms to `RandomAccessCollection`.
+  ///   - O(*k*), where *k* is equal to `distance` if the collection conforms
+  ///     to `BidirectionalCollection`.
+  ///   - Otherwise, O(*n*), where *n* is the length of the collection.
   @inlinable
   public __consuming func dropLast(_ k: Int = 1) -> SubSequence

   /// Returns a subsequence, up to the given maximum length, containing the
   /// final elements of the collection.
   ///
   /// If the maximum length exceeds the number of elements in the collection,
   /// the result contains all the elements in the collection.
   ///
   ///     let numbers = [1, 2, 3, 4, 5]
   ///     print(numbers.suffix(2))
   ///     // Prints "[4, 5]"
   ///     print(numbers.suffix(10))
   ///     // Prints "[1, 2, 3, 4, 5]"
   ///
   /// - Parameter maxLength: The maximum number of elements to return. The
   ///   value of `maxLength` must be greater than or equal to zero.
   /// - Returns: A subsequence terminating at the end of the collection with at
   ///   most `maxLength` elements.
   ///
-  /// - Complexity: O(1) if the collection conforms to
-  ///   `RandomAccessCollection`; otherwise, O(*n*), where *n* is the length of
-  ///   the collection.
+  /// - Complexity:
+  ///   - O(1) if the collection conforms to `RandomAccessCollection`.
+  ///   - O(*k*), where *k* is equal to `maxLength` if the collection conforms
+  ///     to `BidirectionalCollection`.
+  ///   - Otherwise, O(*n*), where *n* is the length of the collection.
   @inlinable
   public __consuming func suffix(_ maxLength: Int) -> SubSequence
}

Finally, BidirectionalCollection’s overloads are obsoleted in new versions as they are fully redundant with Collection’s.

Source compatibility

This change preserves source compatibility.

Effect on ABI stability

This does not change any existing ABI.

Effect on API resilience

All additions are versioned. The _reverseOffsetIndex customization hook is @inlinable, while all other added ABI are fully resilient.

Alternatives considered

Add a .start and .end

(Over Discourse character limit, refer to the gist)

Don’t add the customization hook

(Over Discourse character limit, refer to the gist)

Offset Arbitrary Indices

(Over Discourse character limit, refer to the gist)

Other Syntax

(Over Discourse character limit, refer to the gist)

Don’t Make it Easy

(Over Discourse character limit, refer to the gist)

Don’t Return Optional Index or Element

The optionality of index(at:) and single-element subscript’s return values cover an important expressivity gap in Collection and follows the standard library’s API design philosophy.

Trapping APIs

Index manipulation through methods such as index(after:), index(_:offsetBy:), and subscripting with an index represent the lowest level programming interface for accessing elements and advancing positions in a collection. As such, they impose requirements on their use which are important for performance. E.g.:

  • index(after:) must not be called on endIndex.
  • index(_:offsetBy:)’s offset must not advance an index past the collection’s bounds and must not be negative unless the collection is also bidirectional.
  • subscript(_:Index) -> Element must be passed a valid index less than endIndex.

A violation of the above requirements leading to a potential memory safety issue causes a trap. I.e. such violations are irrecoverable logic errors and the process is safely taken down.

Additionally, such low-level index-manipulation-heavy code is most often written within a context where indices are known to be valid, e.g. because the indices were vended by the collection itself. Subscript taking an index is similarly non-optional, as an index is an assumed-valid “key” to a specific element. Swift, being a memory-safe language, will still bounds-check the access, but this is not surfaced to the programmer’s code and bounds checks can be eliminated by the optimizer if safe to do so. Again, invalid indices for these operations represent irrecoverable logic errors, so a trap is issued.

Optional Returning APIs

In contrast, higher-level operations are often written in a context where the existence of a desired element is not known. Whether there is or is not such an element represents cases that should be handled explicitly by code, most often through the use of an optional result. That is, non-existence is not a logic error but a simple domain error, for which Optional is the best tool for the job.

For example, Dictionary has the regular subscript(_:Index) -> Element trapping subscript for indices derived from the dictionary itself, but additionally provides a subscript(_:Key) -> Value?, which returns nil if the key is not present in the dictionary. A missing key is not necessarily a logic error, just another case to handle in code. Similarly, first, last, .randomElement(), min(), etc., all return optionals, where emptiness is not necessarily a logic error, but a case to be handled in code.

Optional handling is a strength of Swift and known-valid access is an acceptable use of !, which has the effect of converting a simple domain error into an irrecoverable logic error.

Single element offset-based subscripting follows this pattern by returning nil if the offset is not valid for the collection. While this does introduce optional-handling code at the use site, handling such access would be necessary anyways outside of a context where the offset is known to be less than the count. Otherwise, in these contexts, the programmer would have to remember to guard a trapping subscript’s use site by an explicit check against count. Offset ranges are clamped, resulting in empty slice return values, just like other partial ranges.

This means that collection.last matches collection.index(at: .last) and collection[.last] in optionality, and the latter two can be offset.

Example of Known-Valid Offset Context

(Over Discourse character limit, refer to the gist)

57 Likes

I think the fixed endpoint constants are an excellent part of this design. They both provide a way for readers of the code to see what's going on & look up the types (unlike standalone negative integers) AND avoid my least favorite part about negative offsets elsewhere, which is that they're 1-based instead of zero-based from the end.

# python
"abcde"[1]      # 'b'
"abcde"[-1]     # 'e' :/ 
// swift + this proposal
"abcde"[.first + 1]   // "b"
"abcde"[.last - 1]    // "d" :)
24 Likes

This is fantastic. On a quick first read, I think I agree with all of the rationales and find the design pitched here to be well motivated as compared to the thoroughly surveyed alternatives.

I agree with @nnnnnnnn that writing out start and the others greatly improves clarity; and it obviates the need for an explicit label to distinguish offset subscripts from traditional subscripts.

I also fully agree with the decision to have these APIs return optional elements instead of trapping, as they are higher level interfaces intended to be more lenient.

9 Likes

Good documentation and the optionality of collection[.start + i] should help with that. The for ... in ... structure and higher order methods like map, filter and such should also help with that.

Warning about it for String and SubString is very important though since they're the collections people complains the most about not having Int indices.

1 Like

How do you think we can best surface this to users? In the documentation for subscript? String is definitely a big existing use case for this, but formally any Collection that is not RandomAccess has O(n) index offsetting behavior. In fact, this distinction is the fundamental distinction of RandomAccessCollection. From the docs:

/// A collection that supports efficient random-access index traversal.
///
/// Random-access collections can move indices any distance and 
/// measure the distance between indices in O(1) time. Therefore, the
/// fundamental difference between random-access and bidirectional collections
/// is that operations that depend on index movement or distance measurement
/// offer significantly improved efficiency. For example, a random-access
/// collection's `count` property is calculated in O(1) instead of requiring
/// iteration of an entire collection.

Eventually, I could see the Swift project having optional perf-warnings/hints. Issuing a perf-warning for an OffsetBound constructed from a loop-variant variable should be very straightforward in any intermediate representation. As would index(_:offsetBy:), etc.

2 Likes

Hm...this reminds me of DispatchTime's API.

I've been frustrated with Swift's Index ergonomics enough to be in favor of just supporting integers and dealing with the problems it creates.

This solution has me rethinking that stance. The API is concise, clear, feels natural in Swift, and greatly improves one of the clunkiest APIs in Swift!

+100!

2 Likes

How does this work with code completion?

If someone starts typing “str[.s” will their IDE be able to suggest “.start”?

2 Likes

Wow, this has really shaped up nicely. Great work! I really like how the .start, .first, .last and .end “anchors” make the meaning very clear at the usage site while still being relatively concise. This design feels like it makes all the right tradeoffs and will be a great addition to Swift.

5 Likes

I like this a lot, too. A couple of small things:

  • It's not absolutely source compatible, since the symbol forms .start, .end, .first and .last may be present in existing source code.

  • I can't help wishing that there was a trapping version of the subscript — as well as the optional-returning one, if that's the general preference for the "default" behavior.

  • The pitch doesn't show any examples with unary range operators on the bounds. Since these tend to be ugly (because of whitespace rules), it'd be better to be upfront about this.

2 Likes

Between the integer offset operators and Comparable conformance, I was going to suggest adding Strideable conformance. But the offsets in the start class and in the end class are unreachable from each other, so that's out.

Yes, .start is the first code completion result Xcode will recommend. You can play around with it by copy-pasting the whole implementation from this gist

4 Likes

Could you clarify? Does this have more potential than any other stdlib addition? (Also noting that local shadows take precedence over the stdlib in general).

That would have to be balanced against the weight of an API just for that purpose vs the user adding a ! at the call site. In other places of the stdlib, we only expose both trapping and optional APIs when there is a salient performance difference between the two.

There's one hidden in there, but I could add one more prominently. Good catch.

print((3..<10)[(.start + 5)...]) // 8..<10
3 Likes

No, not really. If that's the definition of source compatible, your proposal is compatible. However, if someone has defined, for example, an extension to Int that defines a static member start, that could silently change the behavior of, say, an existing array subscript expression that uses .start. (I think. Correct me if my reasoning is wrong.)

Sure, and my comment is no stronger than wishing. The problem is that there's a pretty big community bias against ! even in situations where it's absolutely the right thing to use. Other unwrapping strategies (guard, if let, etc) are likely to be too heavyweight for the cases where code "simply" wants to get the element at a known-safe offset.

I do really like your proposal. It seems to walk the fine line between the competing interests in earlier iterations.

What version of Swift does this require?

I am using Xcode 10.3, and when I paste that code into a playground it fails to compile with the error:

Ambiguous reference to member 'split(separator:maxSplits:omittingEmptySubsequences:)'

Amazing API addition, and I‘m glad it‘s not .start() like in DispatchTime. Every time I use that API it feels wrong (why do we need () there?).

If this proposal gets accepted and if the SwiftPreview proposal also gets accepted but after the current one, can we still move that API into the module? cc @Ben_Cohen

Looks much nicer than the previously pitched ++ and --

Do we know how slow is type-checking of these? I'm kinda afraid of The compiler is unable to type-check this expression in reasonable time when seeing that many operators and literals in one line (even though I haven't seen that error in a long time)

2 Likes

Looks good to me. Any thoughts around providing a nicer way to get a slice of a particular length using these offset indices? Looking at

print(str[.start + 3 ..< .start + 6]) // "def"

makes me wonder if there should be a

print(str[.start + 3, length: 3]) // "def"

or similar (and perhaps an equivalent subscript for real indices). The alternatives are slicing and then using a prefix, or doing the arithmetic on the offset from start (e.g. .start + start + length).

6 Likes

Isn‘t it strange if you have a length parameter but the returned sequence is not of the given length, which would also imply that the returned type must always be Optional!?

I prefer the range because it does not necessarily imply any number of elements.

Not really, no, because the other offset index slicing method is also lenient in a similar way. i.e. the potential exists to be similarly confused when str[.start + 3 ..< .start + 6] isn't of length 3. If this is really a general concern then you could change the label to maxLength, or similar.

Edit: And I'll note that methods like prefix(_:) don't return an optional or trap when the result isn't of the length specified.

2 Likes

I also think that a length parameter could be handy and I’m not concerned about the size of data returned being smaller.

This is also common in I/O read operations. You may request a number of bytes but there is no guarantee that will be the actual number of bytes read and returned.