Shorthand for Offsetting startIndex and endIndex

Nevin · February 2, 2018, 8:11pm

Letanyan_Arumugam:

As an example; to get a slice of a String, not anchored at the start or end of the collection, one would use the following subscript method:
let s = "Hello, Swift!"
let subject = s[s.index(s.startIndex, offsetBy: 7)...s.index(s.startIndex, offsetBy: 11)]

If one cares about performance, one would currently write something more like this:

let s = "Hello, Swift!"
let idx1 = s.index(s.startIndex, offsetBy: 7)
let idx2 = s.index(idx1, offsetBy: 4)
let subject = s[idx1...idx2]

When making just one slice, the proposed offset subscript is quite elegant. However when making multiple slices, the offset subscript would have to re-iterate from the beginning every time.

Letan · February 2, 2018, 8:26pm

Do you have any suggestions?

Thanks, this really helped. Got a working implementation :)

Letan · February 2, 2018, 8:34pm

I have and would have liked some way to do this, however, I to cannot really think of any good solutions.

On a side note with the offset subscript, one can write list[offset: 3...list.count - 4] which isn't great but I think it's a little better than before.

Michael_Ilseman · February 2, 2018, 9:23pm

"offset" as a required label suffices.

Ben_Cohen · February 2, 2018, 9:24pm

I have some horrible ideas related to overloading the ...- and ...+ operators to allow for things like ...-4. But probably better I keep those thoughts to myself...

xwu · February 3, 2018, 4:53am

Out of curiosity, since Perl seems to have an operator for every occasion, does it have an operator for this that we could "borrow"?

Letan · February 3, 2018, 2:18pm

Here's a revised proposal.

Is there an ethic with documents? Should proposals be inline or should a link be provided instead?

Offsetting Range Subscript

Proposal: SE-NNNN
Authors: Letanyan Arumugam
Review Manager: TBD
Status: Implementation Ready
Implementation: apple/swift#14389

Introduction

A collection that has an Index type that cannot be offset independently of its
collection can cause overly verbose code that obfuscates one's intent. To help
improve this we propose adding a subscript(offset:) method to Collection and
MutableCollection that would accept an offsetting range.

Swift-evolution thread: Discussion thread topic for that proposal

Motivation

Working with an index that cannot be offset independently, without its
corresponding collection, causes the intent of code to get lost in an overly
verbose call site.

Currently to get a slice of a String, not anchored at the start or end of
the collection, one might use the following subscript method:

let s = "Hello, Swift!"
let subject = s[s.index(s.startIndex, offsetBy: 7)...s.index(s.startIndex, offsetBy: 11)]

This approach, unfortunately, suffers from redundancy and is in general unwieldy
to handle.

A shorter approach, that is also available, is to use combinations of prefix,
suffix and the drop variants. A solution using these would follow like such:

let subject = s.suffix(6).prefix(5)

While this is much shorter it suffers from multiple drawbacks. It is not as
natural as using a range, due to it using a 'sliding' coordinate system, which
increases the cognitive load for a user. This solution also suffers from
API discoverability issues, since a user must learn multiple methods and figure
out that they can be composed in this way.

Proposed solution

A solution we propose to this problem is to extend Collection and
MutableCollection with a subscript method, that takes a range, which would be
used to offset the starting index of a collection.

Using the above example, along with our solution, we will be able to write the
following.

let subject = s[offset: 7...11]

Future Directions

It would be nice to have the ability to offset using the endIndex as a base,
however, no design has yet to emerge that will allow us to do this expressively.

Detailed design

Subscript method protocol requirements should be added to Collection and
MutableCollection.

protocol Collection {
  ...

  /// Accesses a contiguous subrange of the collection's elements with an
  /// offsetting range.
  ///
  /// The accessed slice uses the same indices for the same elements as the
  /// original collection uses. Always use the slice's `startIndex` property
  /// instead of assuming that its indices start at a particular value.
  ///
  ///
  /// - Parameter offset: A range of values that will offset the collections 
  ///   starting index to form a new range of indices relative to the 
  ///   collection.
  subscript(offset offset: Range<Int>) -> SubSequence { get }
}

protocol MutableCollection {
  ...

  /// Accesses a contiguous subrange of the collection's elements with an
  /// offsetting range.
  ///
  /// The accessed slice uses the same indices for the same elements as the
  /// original collection uses. Always use the slice's `startIndex` property
  /// instead of assuming that its indices start at a particular value.
  ///
  ///
  /// - Parameter offset: A range of values that will offset the collections 
  ///   starting index to form a new range of indices relative to the 
  ///   collection.
  subscript(offset offset: Range<Int>) -> SubSequence { get set }
}

Default implementations should be provided for the methods in Collection and
MutableCollection.

extension Collection {
  subscript<R: RangeExpression>(offset offset: R) -> SubSequence 
  where R.Bound == Int {
    ...
  }
}

extension MutableCollection {
  subscript<R: RangeExpression>(offset offset: R) -> SubSequence 
  where R.Bound == Int {
    get { ... }
    set { ... }
  }
}

Source compatibility

None

Effect on ABI stability

N/A

Effect on API resilience

N/A

Alternatives considered

Add methods to offset startIndex and/or endIndex

Adding convenience methods to offset startIndex and endIndex would help make
intent more obvious, however, it still is not ideal. The following is an
illustration of what this might look like:

let subject = s[s.startIndex(offsetBy: 7)...s.endIndex(offsetBy: -2)]

Only add a method to offset startIndex

If we were to include only a startIndex(offsetBy:) we might want to reconsider
a rename. One suggested name was index(atOffset:).

Use a KeyPath

Add an index(_:offsetBy:) method that would take a KeyPath as its first
argument. This will give us the following usage.

let subject = s[s.index(\.startIndex, offsetBy: 7)..<s.index(\.endIndex, offsetBy: -1)]

While this will shorten code, when the collection instance name is long, it is
still redundant and relatively verbose.

davedelong · February 3, 2018, 5:11pm

Can we talk about the support (or lack thereof) for negative indices in the range?

Personally I would love to be able to do let last3 = aString[offset: -3...]

Michael_Ilseman · February 3, 2018, 5:15pm

Yes, and that's what @Ben_Cohen was getting at. As for your specific example, I think this is more clear: let last3 = aString.suffix(3), but in general I would really like some story for negative-offset-from-the-end.

davedelong · February 3, 2018, 5:16pm

Right; I was bringing them up again in the context of @Letanyan_Arumugam's revised proposal, since it doesn't mention them.

Ben_Cohen · February 3, 2018, 5:20pm

I'm trying to think of a gotcha for why not to support aString[offset: ...-3] and aString[offset: -3...] (or aString[offset: -5..<-3] for that matter), and I can't think of any.

I think the sticking point before was trying to make it work as a DSL for offsets combined with indices in general i.e. aString[i..<-3] or anArray[2..<-3], which doesn't work without shenanigans. But I think that a subscript(offset: Range<Int>) version doesn't have those problems.

So +1 from my perspective, until someone uncovers the horrible flaw I'm missing.

Letan · February 3, 2018, 5:32pm

Would it not be surprising if the subscript range offset worked like this, but index(_:offsetBy:) didn’t?

Letan · February 3, 2018, 5:34pm

I had mentioned it, under proposed solution > future directions, I didn’t have proposed syntax other than ben’s operators. I thought he wasn’t actually suggesting it. If it was a serious recommendation then I’ll certainly add it in.

nick.keets · February 3, 2018, 6:34pm

What about 2..<-2 ?

Letan · February 3, 2018, 6:53pm

This is already not allowed as being a range on its own.

I do have other concerns about seemingly fine ranges that are actually really hard to tell if they would cause a trap. As an example:

let x = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
let i = x[offset: -2..<5]

I don't think its obvious that this will cause an upperBound < lowerBound. However, I'm fine with including it and seeing what is said in the review.

Letan · February 3, 2018, 6:53pm

By include, I mean as the default implementation, not alternative.

Ben_Cohen · February 3, 2018, 6:59pm

Ah right, yeah this was the reason for a ...- overload.

xwu · February 3, 2018, 7:57pm

I'd like to propose going in a different direction than coming up with an operator that fakes a prefix -. This syntax fools the eye but doesn't really express the desired semantics.

In the context of offsetting indices, 2..<-1 doesn't actually mean a negative range, because in this context we actually expect endIndex - 1 to come after (i.e., to be greater than) startIndex + 2. I'd rather we made it possible to actually express the desired semantics:

@_fixed_layout
public enum IndexOffset : Equatable {
  case start(Int)
  case end(Int)
}

extension IndexOffset {
  public init(_ source: Int) {
    self = source < 0 ? .end(source) : .start(source)
  }
}

extension IndexOffset : ExpressibleByIntegerLiteral {
  public init(integerLiteral value: Int) {
    self.init(value)
  }
}

extension IndexOffset : Comparable {
  public static func < (lhs: IndexOffset, rhs: IndexOffset) -> Bool {
    // Note how this comparison reflects our intended semantics.
    switch (lhs, rhs) {
    case (.start, .end): return true
    case let (.start(a), .start(b)): return a < b
    case (.end, .start): return false
    case let (.end(a), .end(b)): return a < b
    }
  }
}

extension Collection {
  internal func _index(_ offset: IndexOffset) -> Index {
    switch offset {
    case let .start(distance):
      return index(startIndex, offsetBy: distance)
    case let .end(distance):
      return index(endIndex, offsetBy: distance)
    }
  }

  public subscript(offset range: Range<IndexOffset>) -> SubSequence {
    return self[_index(range.lowerBound)..<_index(range.upperBound)]
  }
}

let x = [1, 2, 3, 4, 5]
x[offset: 1 ..< -1] // [2, 3, 4]

[Edited per Nevin's suggestion.]

Nevin · February 3, 2018, 8:06pm

Good idea Xiaodi.

(Though I might suggest leaving the associated value for the .end case alone, ie. negative, rather than negating it twice.)

QuinceyMorris · February 3, 2018, 8:44pm

To me, the keyword “offset” is confusing. Yes, reading this entire thread, it’s clear how we got there, but it seems likely to confuse anyone who sees only the end-product. There's no visible offsetting going on: we’re just getting the n'th through m'th elements.

I’d suggest using the keyword “ordinal” instead, since that’s the mathematical concept being used here. I’d also suggest including a non-range variant for consistency, so we have:

let slice = s[ordinal: 7...11]
let element = s[ordinal: 7]

The other aspect of this proposal that bothers me is a point that @Ben_Cohen kinda brought up. By providing subscript syntax, we are encouraging lazy developers to think of this as O(1) syntax, not [potentially] O(n) or worse. (Go to forums.developer.apple.com if you want see what code lazy developers write.) I’d be happier if this was a regular method rather than a subscript:

let slice = s.slice(ordinal: 7...11)

So my question is: how important is it that this syntax provide an l-value for mutability?