Pitch: Offsetting Indices and Relative Ranges

Marxon13 · May 2, 2019, 6:31pm

A do agree with a lot of others, I don't think the ++ and -- syntax is a good fit. ++ or -- along with the range syntax (... or ..<) and the square brackets for collections is a lot of non-alphanumeric characters to look at and I find it hard to read.

I like where @gregtitus went with the NSLayoutConstraint suggestion, but I think we could get it down to one character.

If we take the pipe character as the edge to offset from (a collection bound or index) we can create we end up with the patterns: |# and #| where "#" is an index. |# would mean to offset by "#" to the right, and #| means offset # to the left. If we want to offset from an index like String.Index, we can add the variable containing the index before or after the pipe. So index|# would mean to offset by "#" to the right, and #|index would mean by offset by "#" to the left.

A basic example using only numbers would look like:

let str = "abcdefghijklmnopqrstuvwxyz"
print(str[14|]) // m
print(str[|5]) // f

If have an index we want to offset from:

let str = "abcdefghijklmnopqrstuvwxyz"
let idx = str.firstIndex { $0 == "n" }!
print(str[1|idx]) // m
print(str[idx|1]) // o

If we want to grab a range of elements:

let str = "abcdefghijklmnopqrstuvwxyz"
let idx = str.firstIndex { $0 == "n" }!
print(str[|20..<2|]) // uvwx
print(str[1|idx...idx|1]) // mno

Using @Michael_Ilseman 's examples:

let str = "abcdefghijklmnopqrstuvwxyz"
let idx = str.firstIndex { $0 == "n" }!

// -- single element subscript --
print(str[14|]) // m
print(str[100|idx]) // a
print(str[idx|1]) // o
print(str[10|(idx|1)]) // e

// -- relative range --
print(str[|1 ..< 2|]) // bcdefghijklmnopqrstuvwx
print(str[2|idx ..< 2|]) // lmnopqrstuvwx
print(str[idx ..< 2|]) // nopqrstuvwx
print(str[2|idx..<idx]) // lm
print(str[2|idx..<idx|3]) // lmnop
print(str[4| ..< 2|]) // wx

// -- relative range through --
print(str[2|idx ... 2|]) // lmnopqrstuvwxy
print(str[idx ... 2|]) // nopqrstuvwxy
print(str[2|idx...idx]) // lmn
print(str[2|idx...idx|3]) // lmnopq
print(str[4| ... 2|]) // wxy

// -- partial relative range up to --
print(str[..<idx|2]) // abcdefghijklmno
print(str[..<2|idx]) // abcdefghijk
print(str[..<|20]) // abcdefghijklmnopqrst
print(str[..<20|]) // abcdef

// -- partial relative range through --
print(str[...idx|2]) // abcdefghijklmnop
print(str[...2|idx]) // abcdefghijkl
print(str[...|20]) // abcdefghijklmnopqrstu
print(str[...20|]) // abcdefg

// -- partial relative range from --
print(str[idx|2...]) // pqrstuvwxyz
print(str[2|idx...]) // lmnopqrstuvwxyz
print(str[|20...]) // uvwxyz
print(str[20|...]) // ghijklmnopqrstuvwxyz

The only issue that I can find with this is that when we have an collection with an integer based index, and we want to offset from said index, the index and direction of offset will be ambiguous.

What does this mean?

let arr = ["a", "b", "c", "d", "e", "f", "g"]
let idx = arr.firstIndex(of: "d")
print(arr[idx|3]) // Is this "g" or "a"?

The only way this could be solved is by either:

Have every collection index work the same way String.Index does. That way there is always one Int and one Index type.
Provide an additional alternate syntax for collections with integer indices where the relative bound is ambiguous. Perhaps using the NSLayoutConstraint syntax: |- and -| where the "-" is on the side of the offset, or |: and :|.

I would like to see this proposal as a part of Swift, I just think that ++ and -- aren't the best option and would like to see a single character option for most use cases.

hartbit · May 2, 2019, 9:46pm

That sounds like the natural solution: a protocol that doesn't have all the requirements that RangeExpression has that is hurting us right now.

As a user, I can't remember needing to offset from an index as much as offset from start and end. And when I do, I find @nnnnnnnn's solution seem quite sufficient:

array[i...][++2 ..< --4]

Can you elaborate? Can you give a very simple example?

Michael_Ilseman · May 2, 2019, 10:58pm

Update: this non-phantom-typed variant is not workable, because it would cause 1..<5 to default to OffsetRange and not Range<Int>. But, it's still useful for illustration purposes.

New variant prototype up here.

This goes all-in on @xwu's OffsetRange, which is not phantom-typed and thus does not conform to RangeExpression or any such protocol. It also does not use any new operators such as ++ or --, preferring an explicit offset: label in a subscript overload.

The main downside of this approach is that as formulated it's hard to avoid introducing ambiguity in 1..<3 between Range<Int> and OffsetRange. I'll have to talk with @xedin about how to feasibly resolve this.

The next downside is that it doesn't support offsetting an existing index directly. @nnnnnnnn's suggestion allows some of the examples to be migrated to slicing (though I frequently had bugs and off-by-ones for things like ... when I did so), but not all.

I chose not to overload + and - on OffsetBound, which has the effect of discouraging naive use in e.g. a loop and avoid yet more type checker overloads to resolve. I speculatively added an advanced(by:) to further offset, but we cannot be RawRepresentable or serializable as an offset because there is no -0 representation (e.g. the RHS of ...-1).

We could remove some of the extra parens on the negative postfix range examples through additional overloads as well.

  let str = "abcdefghijklmnopqrstuvwxyz"
  let idx = str.firstIndex { $0 == "n" }!

  print("-- single element subscript --")
  print(str[offset: -4]) // w
  print(str[..<idx][offset: -100]) // a
  print(str[idx...][offset: 1]) // o
  // No equivalent: print(str[(idx++1)--10]) // e

  print("-- relative range --")
  print(str[offset: 1 ..< -2]) // bcdefghijklmnopqrstuvwx
  // No equivalent: print(str[..<idx--2 ..< --2]) // lmnopqrstuvwx
  print(str[idx...][offset: ..<(-2)]) // nopqrstuvwx
  print(str[..<idx][offset: (-2)...]) // lm
  // No equivalent: print(str[idx--2..<idx++3]) // lmnop
  print(str[offset: -4 ..< -2]) // wx

  print("-- relative range through --")
  // No equivalent: print(str[idx--2 ... --2]) // lmnopqrstuvwxy
  print(str[idx...][offset: ...(-2)]) // nopqrstuvwxy
  print(str[...idx][offset: (-3)...]) // lmn
  // No equivalent: print(str[idx--2...idx++3]) // lmnopq
  print(str[offset: -4 ... -2]) // wxy

  print("-- partial relative range up to --")
  // No equivalent: print(str[..<idx++2]) // abcdefghijklmno
  print(str[..<idx][offset: ..<(-2)]) // abcdefghijk
  print(str[offset: ..<20]) // abcdefghijklmnopqrst
  print(str[offset: ..<(-20)]) // abcdef

  print("-- partial relative range through --")
  // No equivalent: print(str[...idx++2]) // abcdefghijklmnop
  print(str[...idx][offset: ...(-3)]) // abcdefghijkl
  print(str[offset: ...20]) // abcdefghijklmnopqrstu
  print(str[offset: ...(-20)]) // abcdefg
  print(str[offset: ...(-20)][offset: ...(-3)]) // abcde
  // No equivalent: print(str[...((--20)++2)]) // abcdefghi

  print("-- partial relative range from --")
  print(str[idx...][offset: 2...]) // pqrstuvwxyz
  // No equivalent: print(str[idx--2...]) // lmnopqrstuvwxyz
  print(str[offset: 20...]) // uvwxyz
  print(str[offset: (-20)...]) // ghijklmnopqrstuvwxyz

  func splitAndTruncate<T: BinaryFloatingPoint>(
    _ value: T, precision: Int = 3
  ) -> (whole: Substring, fraction: Substring) {
    let str = String(describing: value)
    guard let dotIdx = str.firstIndex(of: ".") else { return (str[...], "") }
    return (str[..<dotIdx], str[dotIdx...][offset: 1..<OffsetBound(1+precision)])
  }

  print(splitAndTruncate(1.0)) // (whole: "1", fraction: "0")
  print(splitAndTruncate(1.25)) // (whole: "1", fraction: "25")
  print(splitAndTruncate(1.1000000000000001)) // (whole: "1", fraction: "1")
  print(splitAndTruncate(1.3333333)) // (whole: "1", fraction: "333")
  print(splitAndTruncate(200)) // (whole: "200", fraction: "0")

  func getFifth<C: RandomAccessCollection>(
    _ c: C
  ) -> (absolute: C.Element, relative: C.Element) where C.Index == Int {
    return (c[5], c[offset: 5])
  }

  let array = [0, 1,2,3,4,5,6,7,8,9]
  print(getFifth(array)) // (absolute: 5, relative: 5)
  print(getFifth(array[2...])) // (absolute: 5, relative: 7)

  let range: OffsetRange = 1...(-1)
  let x = " Hello, World! "
  let y = "abcdefg"
  let z = "  "

  print(x[offset: range]) // "Hello, World!"
  print(y[offset: range]) // "bcdef"
  print(z[offset: range]) // ""

  print("abc"[offset: 1...]) // bc
  print("abc"[offset: 1..<(-1)]) // b
  print("abc"[offset: 1..<(-2)]) // ""
  print("abc"[offset: 1..<(-3)]) // ""

Michael_Ilseman · May 2, 2019, 11:36pm

Here is similar to the above, but phantom-typed and adds the IndexRangeExpression protocol. Because the type checker will prefer a concrete overload of ..< over one generic over Comparable, having a concrete OffsetBound will break existing source code by preferring that over Range for something like 1..<5. Phantom typing solves this problem.

Usage examples look the same, except that a type is needed for a raw offset range in the wild:

let range: OffsetRange<String.Index> = 1...(-1)
let x = " Hello, World! "
let y = "abcdefg"
let z = "  "

print(x[offset: range]) // "Hello, World!"
print(y[offset: range]) // "bcdef"
print(z[offset: range]) // ""

lorentey · May 2, 2019, 11:41pm

I deeply dislike the idea of using the sign of the offset to implicitly specify the anchor.

extension BidirectionalCollection {
  func element(beforeOffset offset: Int) -> Element {
    return self[offset: offset - 1]
  }
}

let array = ["a", "b", "c", "d"]
print(array.element(beforeOffset: 0)) // ⟹ "d" ?!

Michael_Ilseman · May 2, 2019, 11:57pm

Currently, if you say a..<b it will invoke an overload of the operator ..<. There is one generic over Comparable which produces a Range<T>. There's a bunch of ~~hacks~~ overloads that mimic omitted arguments, though they lose some precedence information.

But, what if you wanted to have your own kind of range type that could be produced by ..<. You might consider adding an overload of ..<, but that could clash and become ambiguous with Range's, as is what happened here. You run this risk if there's any overlap between your bounds types and Comparable. Also, for completeness, you'd add an overload of infix ..., prefix ..., postfix ..., prefix ..<, and no-fix .... Also, you'd probably want to add more overloads as in here because prefix and postfix don't have the right precedence.

This is faking language syntax with operators. You're also replacing something pretty fast (parsing) with something slow (overload resolution in the presence of Swift's rich type system). If you wanted to extend this fake syntax further, you'd use more operators, which restricts you to the operator name space (e.g. -> and : are not available). If instead we had a range-expr, such as (total strawman and completely off-the-cuff):

range-expr -> expression? ... expression?
range-expr -> expression? ..< expression

This would express syntactically what we're getting at with operators. Then, the semantics could be that this is similar to other kinds of library-directed expressions, such as literals and string interpolations, so that some conformance to ExpressibleByRangeExpression is invoked with the kind of expression that was written and the left-hand side and right-hand side. The default would be the existing Range structs.

Then, if we wanted to extend this with say -> and <- to apply a relative offset, we could do something like that (hand-wavy):

index-expression -> expression -> integer-expression
index-expression -> expression <- integer-expression

And get the precedence we want out of it. Again, this is super hand-wavy and we'd want to tighten it down some and make sure we don't accidentally lose the ability for e.g. generalized destructuring pattern matching with <-.

Michael_Ilseman · May 3, 2019, 12:28am

That doesn't compile, intentionally.

If you operate on OffsetBound, that's also not the behavior, intentionally. Offsets clamp, they do not wrap around, and there is no way to get an integral representation of an offset because there is no negative zero. (I also fixed a bug in the gist while trying the below where we trapped rather than clamped).

I also agree that it's a little dubious to have the init(_:Int) which has the negative-means-from-end convention, so I dropped that one and made the existing init(fromStart:) and init(fromEnd:) public. I don't know if we can completely prevent users from getting at the convention if ExpressibleByIntegerLiteral has it, but at least the integerLiteral label discourages this (or hints that there is a convention involved). Inside the stdlib, I haven't testing if only conforming to _ExpressibleByBuiltinIntegerLiteral would give us this, but it's possible we can make that work.

Using the phantom typed variant:

error: cannot convert value of type 'Int' to expected argument type 'OffsetBound<_>'

Let's make it compile. We have two ways to do so:

extension BidirectionalCollection {
  func element(beforeOffset offset: OffsetBound<Index>) -> Element {
    return self[offset: offset.advanced(by: -1)]
  }

  func element(beforeOffset offset: Int) -> Element {
    return self[offset: OffsetBound(fromStart: offset - 1)]
  }
}

let array = ["a", "b", "c", "d"]
print(array.element(beforeOffset: OffsetBound(0))) // ⟹ "a"
print(array.element(beforeOffset: 0)) // ⟹ "a"

michelf · May 3, 2019, 12:50am

Couldn't we avoid inventing a new OffsetRange type by using some clever implementation of Comparable for OffsetBound?

struct OffsetBound: ExpressibleByIntegerLiteral, Comparable {
	let value: Int
	init(integerLiteral value: Int) {
		self.value = value
	}
	static func < (lhs: OffsetBound, rhs: OffsetBound) -> Bool {
		return UInt(bitPattern: lhs.value) < UInt(bitPattern: rhs.value)
	}
}

This ensures the "negative" from-end offsets compare greater than the "positive" from-start ones. And thus the Range<OffsetBound> becomes valid in all these cases:

let a = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
a[offset: 0 ..< 2]
a[offset: 0 ..< -2]
a[offset: -4 ..< -2]

Note the Range<OffsetBound> is invalid in this last one... even though the resulting indexes would be valid:

a[offset: -8 ..< 8]

Perhaps this last one should work, but I'm not sure.

lorentey · May 3, 2019, 1:27am

Ah, of course! I wasn't paying enough attention, sorry.

I agree the sign works great as an anchor sigil for integer literals.

Wait, why would (implicit) clamping be desirable here? It leads to the same safety issues as automatic wraparound.

foo[offset: OffsetBound(fromStart: -1)] should simply trap, shouldn't it?

jawbroken · May 3, 2019, 5:14am

Starting with some disclaimers: I'm having a lot of trouble keeping track of all the variants, so apologies if this has already been considered. I also don't have time to do a full prototype, only implementing single element access, so this might all fall apart when range operators are added because of syntax ambiguity, etc.

Here's a quick sketch of a theoretical alternative where:

Negative offsets don't automatically index from the end, you need to explicitly refer to .end (more verbose but stops off-by-one errors from silently wrapping, has precedent in Julia, Matlab and Perl 6).
Existing indices can be used/offset from using .index(i).

enum Offset<Index>: ExpressibleByIntegerLiteral {
  case fromStart(Int)
  case fromEnd(Int)
  case fromIndex(Index, Int)
  
  init(integerLiteral value: Int) {
    self = .fromStart(value)
  }
  
  static var end: Offset { return .fromEnd(0) }
  static func index(_ i: Index) -> Offset { return .fromIndex(i, 0) }
  
  static func + (left: Offset, right: Int) -> Offset { switch left {
    case let .fromStart(o): return .fromStart(o + right)
    case let .fromEnd(o): return .fromEnd(o + right)
    case let .fromIndex(i, o): return .fromIndex(i, o + right)
  }}
  static func - (left: Offset, right: Int) -> Offset { switch left {
    case let .fromStart(o): return .fromStart(o - right)
    case let .fromEnd(o): return .fromEnd(o - right)
    case let .fromIndex(i, o): return .fromIndex(i, o - right)
  }}
  
  public func indexIn<C: Collection>(_ c: C) -> C.Index where C.Index == Index { switch self {
    case let .fromStart(o):
      if o < 0 { return c.startIndex }
      return c.index(c.startIndex, offsetBy: o, limitedBy: c.endIndex) ?? c.endIndex
    case let .fromEnd(o):
      if o > 0 { return c.endIndex }
      return c.index(c.endIndex, offsetBy: o, limitedBy: c.startIndex) ?? c.startIndex
    case let .fromIndex(i, o):
      if o >= 0 { return c.index(i, offsetBy: o, limitedBy: c.endIndex)   ?? c.endIndex }
      else      { return c.index(i, offsetBy: o, limitedBy: c.startIndex) ?? c.startIndex }
  }}
}

extension Collection {
  subscript(offset o: Offset<Index>) -> Element { return self[o.indexIn(self)] }
}

Example usage:

let s: String = "abcdefg"

s[offset: 1]        // b
s[offset: .end - 1] // g

let dIndex = s.firstIndex(of: "d")!
s[offset: .index(dIndex) - 1] // "c"
s[offset: .index(dIndex) + 1] // "e

This is the equivalent of your internal Kind enum. Then you would form OffsetBounds or similar using range operators on Offsets, something like:

s[offset: 1 ..< .end - 2]
s[offset: .index(i) - 2 ... .index(i) + 2]

With names and out-of-bounds behaviour up for discussion.

hartbit · May 3, 2019, 5:38am

This seems like the ideal solution! How much compiler work does it entail?

Michael_Ilseman:

Then, if we wanted to extend this with say -> and <- to apply a relative offset, we could do something like that (hand-wavy):
index-expression -> expression -> integer-expression
index-expression -> expression <- integer-expression
And get the precedence we want out of it. Again, this is super hand-wavy and we'd want to tighten it down some and make sure we don't accidentally lose the ability for e.g. generalized destructuring pattern matching with <- .

That would only be necessary if we wanted to increment/decrement indices, no?

If we had a compiler range-expression, we could have a custom RelativeRange and a single subscript(offset offset: RelativeRange) so that:

let a = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
collection[4...8] // [4, 5, 6, 7, 8]
collection[offset: 4...-2] // [4, 5, 6, 7, 8]

And no need for the weird ++ or other operators. But I admit this may be a bit confusing: very similar range expressions would have very different meaning regarding the upperBound: in the first case, 8 relates to the 8th index from the start, in the second case, -2 is relative to the end index.

QuinceyMorris · May 3, 2019, 6:34am

And we did. Pretty much everything in this thread is a rehash of the earlier 200-message thread and its predecessors, including (pretty much) the proposed implementations, although @Michael_Ilseman has very nicely expressed both the proposed solutions and the problems inherent in the proposed solutions.

This paragraph struck me as the heart of the matter:

I agree, syntax is a huge problem, for which we have no good proposed solutions. (We have partial solutions. We have ugly solutions. Just no really good ones.)

The other side of the problem is that, given how Collection is designed, there's no satisfactory way to distinguish Int-the-index from Int-the-offset without extreme measures.

I'm coming down on the side of waiting for years, if that's what it takes.

Michael_Ilseman · May 3, 2019, 9:28pm

I had tried this approach with the original pitch and several of the variants, but it always hit a snag due to the constraints on RangeExpression or the phantom type. However, for the approach that adds the explicitly-labeled offset: subscript, only represents pure offsets, and is not phantom typed, I was able to make it work here.

Note that OffsetBound can't just store an Int because we need to differentiate between zero-from-the-start and zero-from-the-end, and we also don't want wrapping.

Michael_Ilseman · May 3, 2019, 9:36pm

The compiler work itself is not significant, it's the language/design work necessary to even know what we're building.

Right, that's an example of how it could be extended for more functionality.

hartbit:

If we had a compiler range-expression, we could have a custom RelativeRange and a single subscript(offset offset: RelativeRange) so that:
let a = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
collection[4...8] // [4, 5, 6, 7, 8]
collection[offset: 4...-2] // [4, 5, 6, 7, 8]
And no need for the weird ++ or other operators.

This is already workable using the existing hackery, as in the prototype for @michelf's approach, (though we'd sometimes need parens). This is why, while I think long-term the language support would be great to have, it's not necessary to address the majority of people's pain points today.

edit: fixed attribution

Michael_Ilseman · May 3, 2019, 9:38pm

No worries, I'll make an update highlighting the different approaches and comparative advantages / disadvantages.

I can try to prototype this up. My concern is the impact on type checking for introducing a new ExpressibleByIntegerLiteral type and having overloaded + on it. @xedin, how strongly are we trying to avoid doing this?

Michael_Ilseman · May 3, 2019, 9:45pm

There's two sides to clamp, one I would strongly argue is beneficial and the other I'm undecided (and you're opposed to).

It's very beneficial to clamp in the unknown direction. E.g. .fromStart(10) should clamp to endIndex for small collections, similarly .fromEnd(-10) clamping to start. Forming the range .fromStart(5)..<.fromEnd(-5) would produce the empty range (happening to coincide with .fromStart(5), but that's not an important detail), that is, the range would clamp in the unknown direction as well rather than form a negative range.

Then there is whether we should clamp or trap in the known direction, e.g. .fromStart(-1) traps on access. I agree wrapping is undesirable and trapping is consistent with index(_offsetBy:limitedBy:) when the limit is in the other direction. Could you elaborate more on the safety issues?

pvieito · May 3, 2019, 10:03pm

I really like the explicit offset Python-style subscript proposed by @hartbit. It seems clear and easy to understand:

let string = "ABCD"
string[offset: 0] // "A"
string[offset: -1] // "D"
string[offset: 1...-2] // "BC"
string[offset: ...1] // "AB"
string[offset: ...-2] // "ABC"
string[offset: 1...] // "BCD"

pvieito · May 3, 2019, 10:10pm

I am not sure if it is intended or a typo, but I would like start indexes to be zero-based, and end index -1 based, so we can get non-range indexing:

let collection = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
collection[3...7] // [4, 5, 6, 7, 8]
collection[offset: 3...-3] // [4, 5, 6, 7, 8]
collection[offset: 0] // 1
collection[offset: 3] // 4
collection[offset: -3] // 8
collection[offset: -1] // 10

michelf · May 3, 2019, 11:06pm

Wrapping isn't great, I'll agree to that. But do we really need a way to express zero-from-the-end? Zero-from-the-end is not an offset you can use except as the upper bound of an open range. There's still two other ways to express a range that goes to the end:

a[offset: 1...-1] // closed range with last index
a[offset: 1...] // open-ended range

So do we really need this additional range to work too?

a[offset: 1 ..< -0]

Sure it'd be nice if this was available, but I don't think it's a great loss given how unnatural it looks and how easy it is to work around.

xedin · May 3, 2019, 11:09pm

Type-checker is pretty good at handling concrete overloads at the moment so adding +/- with concrete parameters and result type would be fine, everything else would be problematic. Versions mentioned in the previous reply (in Offset) might not be as bad as fully generic ones, but it would still cause a slowdown because solver optimizations wouldn't be able to skip it as "unrelated" even when it is.