Shorthand for Offsetting startIndex and endIndex

Introducing a new labeled subscript would have the unfortunate effect that the notation we introduce here would only work in the context of that new subscript. We already have the RangeExpression protocol as a generic way of supporting range expressions that require context from the collection they get applied to in order to be resolved, which is how the open-ended ranges work today. I wonder whether we can extend that concept to allow relative offsets to be included in addition to open-ended ranges. I like the idea from upthread of using keypaths, since a KeyPath<C, C.Index> for any C: Collection is a nice generic representation of an index expression waiting to be resolved. If we allowed offset arithmetic on top of that, you could say \.startIndex+3... or ...\.endIndex-2. Being able to encode all of these as RangeExpression s would mean they work not only for subscripting but for other APIs that accept RangeExpressions for slicing or operating on parts of a collection.

Edit: If + and - on keypaths is distasteful, then putting subscript(offset: Int) -> KeyPath<C, C.Index> operations on Index types might be an interesting alternative too, e.g. \.startIndex[+3] or \.endIndex[-4].

6 Likes

The argument against using keypath offset was that it wasn't an improvement over suffix/prefix and the such.

I'm not sure how you would make such a type Comparable for it to be used with RangeExpression.

I didn't see (or don't recall) this suggestion upthread, but it may actually be the most elegant design so far.

In terms of spelling, it also has the advantage of working within existing Swift syntax rules that would interpret virtually every other sigil (including leading dot) as part of the existing range operator, necessitating parentheses or spaces to separate them (for example, 3...-3 is parsed differently from 3 ... -3 or 3...(-3)), or else new operators to be defined that simulate two operators.

RangeExpression doesn't require Comparable, AFAICT. These expressions would need to create new kinds of type that conform to RangeExpression. For example:

struct KeyPathBoundedClosedRange<C: Collection>: RangeExpression {
  var start: KeyPath<C, C.Index>, end: KeyPath<C, C.Index>

  func contains(_: C.Index) -> Bool { return false }
  func relative<D: Collection>(to: D) -> Range<C.Index> where D.Index == C.Index {
    // Return an empty range if the collection type doesn't match
    guard let c = to as? C else { return to.startIndex..<to.startIndex }
    return c[keyPath: start]...c[keyPath: end]
  }
}

func ...<C>(left: KeyPath<C, C.Index>, right: KeyPath<C, C.Index>) -> KeyPathBoundedClosedRange<C> {
  return KeyPathBoundedClosedRange(start: left, end: right)
}

It's possible that the design of RangeExpression might need refinement to more neatly handle these forms. The contains operation doesn't really make sense for this example, and it can only really answer relative queries based on collections of its own base type, rather than any collection with the same index type.

It does:

/// A type that can be used to slice a collection.
///
/// A type that conforms to `RangeExpression` can convert itself to a
/// `Range<Bound>` of indices within a given collection.
public protocol RangeExpression {

    /// The type for which the expression describes a range.
    associatedtype Bound : Comparable

Not only that, there is an upperBound >= lowerBound check when forming a RangeExpression, which crashes if you try to cheat on Comparable.

It's unclear whether it's desirable to use offset-based ranges in a regular (no-keyword) subscript, because that might be taken as a performance guarantee, matching that of Collection.Index subscripts or ranges.

It's not likely to be confusing when ranges are written literally, but could be misleading if the range is stored in a variable. In that case, it may look like the stored range is already resolved to indices, but it won't be. (Don't all other resolutions of keyPaths as subscripts require a keyPath keyword?)

Bound in my example is C.Index, which is required to be Comparable by the Collection protocol. RangeExpression is a protocol; it doesn't have any behavior on its own. Its role is to become a concrete Range once applied relative to a concrete collection. It's a judgment call whether this is an improvement over prefix/suffix or whether it suggests unsatisfiable performance guarantees. Re: whether it's better, I can think of a couple benefits: It's more general, since it'd work with any index key path expression, and it's more uniform, since if you're already used to slicing with ranges, then this is an extension of that idea, and doesn't require learning different APIs. IMO, I think the use of keypaths makes it clearer that this isn't a simple range expression, but I could see how other people might be misled. The fact that the expression produces a different concrete type, like other context-sensitive RangeExpressions do, should mean that it's obvious in short order you aren't working with a simple Range in most situations, though.

1 Like

I'm not too used to working with keyPath's, but there seem's to be a bit of friction using it with operators and subscripts.

let x = [1, 2, 3, 4, 5, 6, 7]

//error: ambiguous without more context
let y = x[\.startIndex ... \.endIndex]

//error: expected ',' seperator
let y = x[\Array<Int>.startIndex ... \Array<Int>.endIndex] 

//compiles fine
let y = x[(\Array<Int>.startIndex) ... (\Array<Int>.endIndex)]

Is there something I'm missing or is this how it's used?

Those are both bugs; all three ought to compile. I’ll file bug reports.

Edit: [SR-7091] Can't parse a binary operator after a key path expression · Issue #49639 · apple/swift · GitHub

I missed that you resolved the range bounds directly to Collection.Index. In that case, not only don't you need a subscript keyword, it would be positively redundant and/or confusing. So I've got no complaints about this syntax.

I still don't see, though, how expressions like \.startIndex+3 can be made to produce a KeyPath<C, C.Index>, since indices can't offset themselves any more. What key-path would the compiler emit in this case?

We would need yet another different kind of RangeExpression that represents an endpoint-with-offset. I agree that \.startIndex+n may not be the best notation for this, since it strongly suggests that collection.startIndex + n would also work.

I should emphasize that I'm not terribly attached to the keypath idea; it seems superficially appealing, but I can see that it has shortcomings too. My main concern is that this seems like something that ought to fit into the existing RangeExpression protocol infrastructure instead of becoming yet another concept. At a high level, we're trying to achieve the same thing RangeExpression is, allowing you to specify a range expression at a high level in a way where the specifics of how it applies to a collection are deferred until you apply it to a specific collection.

1 Like

RangeExpression.Bound does have to Comparable, but there's no check like this as part of the protocol. (The protocol doesn't even require any concrete bounds.) Range and ClosedRange are the ones that check order, so we'd need a new type if we wanted anything Python-style.

5 Likes

I'd love to see some shorthand methods added to the standard library to access to offset the startIndex and endIndex. Right now the API is very verbose and repetitive in my opinion.
Some time ago I published a micro-library on GitHub similar to what the standard library should include.

I experimented a little bit with making a range expression for offsets, but it's hampered by both the Bound associated type requirement and the contains(_:) method in RangeExpression. I ended up with just an extra subscript on Collection — it looks handy enough, but not as succinct as Python by any means:

The proposal does something similar except it also introduces a new protocol that other concrete range types conform to. The design of the solution is very similar to the original proposal except it introduces the new operators that were suggested to handle the lowerBound > upperBound issues. Some example usage can be found in the implementation PR test cases.

Actually, it occurred to me that this bounds issue can be solved, by tagging the offset as "lower" vs. "upper" as well as "start-relative" vs. "end-relative". In terms of your code, this:

public struct RelativeOffsetRange : RelativeOffsetRangeExpression {
  /// The lower base of the range
  public let lowerBound: Int
  /// The upper base of the range. If `nil` upperBound represents the endIndex.
  public let upperBound: Int?

becomes something like this:

fileprivate enum RelativeOffset: Comparable {
  case lower (Int)
  case upper (Int)
  static func < (lhs: RelativeOffset, rhs: RelativeOffset) {
    switch (lhs, rhs) {
    case (.lower, .upper): return true
    case (.upper, lower): return false
    case (_, _) let (l, r):
      if l >= 0 && r < 0 { return true }
      if l < 0 && r >= 0 { return false }
      return l < r
    }
  }
}
public struct RelativeOffsetRange : RelativeOffsetRangeExpression {
  /// The lower base of the range
  public let lowerBound: RelativeOffset
  /// The upper base of the range. If `nil` upperBound represents the endIndex.
  public let upperBound: RelativeOffset?

That means the range bounds are Comparable, so the range can always be formed from reasonable input (3 ... 2 will fail immediately, as it should). A range may still fail when applied to an actual collection, which seems like a reasonable concession.

Edit: I meant, lowerBound would always be case .lower, and upperBound would always be case .upper, enforced by the RelativeOffsetRange initializer.

... except that:

  • people also want to use range operators directly inside the subscript expression, and

  • people want an explicit sigil for "start" and "end", because not everyone is happy with negative numbers implying end-relative offsets.

The first of these seems doable, but there's no consensus on that last point. Every suggestion has been shot down.

You should be able to parameterize your Offsets struct on the kind of index the offsets will be applied to as a phantom type, which would let you satisfy the constraints on RangeExpression.

edit: I do think RangeExpression's requirements are a bit stricter than what you want for a general index range expression, since it's also trying to provide enough support for pattern matching range expressions against values (e.g. case 100...: should match anything greater than 100), which is not something that makes sense in this more general case. It may be that a generalization of RangeExpression would be the "right thing" to do.

1 Like

Exactly -- I tried it with a phantom type, but conforming to RangeExpression ended up being more trouble than it's worth. It couldn't infer the phantom type when in the RangeExpression subscript, so I needed to provide a concrete implementation anyway, and as you say, the idea of an Offsets instance "containing" a particular index doesn't make sense.

Oh, sorry @Letan! I don't remember if I looked at that already or if I just ended up reimplementing some of it out of chance. Love it.