Pitch: Offsetting Indices and Relative Ranges

Michael_Ilseman · May 3, 2019, 11:12pm

This is a case that needs 0-from-the-end, as ClosedRange's relative(to:) needs to convert into a Range by advancing the upper bound.

Morten_Bek_Ditlevsen · May 4, 2019, 7:10am

I’ve been playing with the labelled offset subscript operator and I really, really like it!
The composition with indexes by adding an extra subscript (myArray[idx...][offset: 4]) is very elegant in my opinion.
I very much hope that this is the direction that the pitch is taking.

Michael_Ilseman · May 6, 2019, 7:27pm

Here's a prototype.

For practical reasons due to Swift's type checker, this approach needs to be constrained to purely concrete types. Since we cannot parameterize over a type, we can only represent abstract offsets relative to the start/end of some collection. This covers the majority of usage, but is not as general as @jawbroken's example.

This makes it effectively the same as the concrete no-new-range prototype replacing the the convention that a negative literals means "from end" with a use of a .end static member and - overload. Negative literals mean "from start", and will only be valid if they are later offset by a value >= to the literal before use.

  let str = "abcdefghijklmnopqrstuvwxyz"
  let idx = str.firstIndex { $0 == "n" }!

  print("-- single element subscript --")
  print(str[offset: .end-4]) // w
  print(str[..<idx][offset: .end-100]) // a
  print(str[idx...][offset: 1]) // o
  // No equivalent: print(str[(idx++1)--10]) // e

  print("-- relative range --")
  print(str[offset: 1 ..< .end-2]) // bcdefghijklmnopqrstuvwx
  // No equivalent: print(str[..<idx--2 ..< --2]) // lmnopqrstuvwx
  print(str[idx...][offset: ..<(.end-2)]) // nopqrstuvwx
  print(str[..<idx][offset: (.end-2)...]) // lm
  // No equivalent: print(str[idx--2..<idx++3]) // lmnop
  print(str[offset: .end-4 ..< .end-2]) // wx

  print("-- relative range through --")
  // No equivalent: print(str[idx--2 ... --2]) // lmnopqrstuvwxy
  print(str[idx...][offset: ...(.end-2)]) // nopqrstuvwxy
  print(str[...idx][offset: (.end-3)...]) // lmn
  // No equivalent: print(str[idx--2...idx++3]) // lmnopq
  print(str[offset: .end-4 ... .end-2]) // wxy

  print("-- partial relative range up to --")
  // No equivalent: print(str[..<idx++2]) // abcdefghijklmno
  print(str[..<idx][offset: ..<(.end-2)]) // abcdefghijk
  print(str[offset: ..<20]) // abcdefghijklmnopqrst
  print(str[offset: ..<(.end-20)]) // abcdef

  print("-- partial relative range through --")
  // No equivalent: print(str[...idx++2]) // abcdefghijklmnop
  print(str[...idx][offset: ...(.end-3)]) // abcdefghijkl
  print(str[offset: ...20]) // abcdefghijklmnopqrstu
  print(str[offset: ...(.end-20)]) // abcdefg
  print(str[offset: ...(.end-20)][offset: ...(.end-3)]) // abcde
  // No equivalent: print(str[...((--20)++2)]) // abcdefghi

  print("-- partial relative range from --")
  print(str[idx...][offset: 2...]) // pqrstuvwxyz
  // No equivalent: print(str[idx--2...]) // lmnopqrstuvwxyz
  print(str[offset: 20...]) // uvwxyz
  print(str[offset: (.end-20)...]) // ghijklmnopqrstuvwxyz

Tino · May 6, 2019, 7:32pm

?
What can it do that the current prefix/suffix can‘t?

pvieito · May 6, 2019, 7:36pm

Why adding .end when negative indexes already clearly define the offset point? Is there any advantage?

Michael_Ilseman · May 6, 2019, 8:20pm

This is a variant with a slightly different set of trade-offs. I haven't formed any strong opinions yet, but here's a potential pros/cons argument for why it could be "worse" for code authors and slightly "better" for code readers.

For the writer of code, after learning that offsetting from the end is possible, there's the cognitive load of remembering what that mechanism is. Remembering a magical convention of negative offset means from end seems easier than remembering a magical .end static member. Code completion will at least kick in after typing ..

For the reader of code: it trades the cognitive load of inferring that a negative literal must mean from the end for a more explicit operation. Seeing something like collection[offset: .end-5 ..< .end-2] gives a stronger hint as to what result to expect than collection[offset: -5 ..< -2], at least until your eyes are trained. For me, this is a straight-forward inference, but others have voiced that it is less so for them.

For both the reader and writer of code: it alleviates some of the potential confusion that the first element is at offset 0 and the last at offset -1. That is collection[offset: .end - 1] seems to slightly better communicate that we're talking about collection.last()! than collection[offset: -1].

Michael_Ilseman · May 6, 2019, 8:35pm

(I know you realize much of this, just establishing some framing)

Everything in this thread can already be accomplished through combinations of slicing, dropFirst/dropLast, and prefix/suffix operations. As you mentioned, this is in need of some polish.

Restricting offsets to only be from the beginning or end is sacrificing some use cases for simplicity. That simplicity may allow us to explore more directions. If this restriction gives us the vast majority of benefit and allows us to have a better expression, then it could be worth it.

For example, adding a generic overload of + and - is problematic for Swift's type checker and could kill this effort.

Main concern: would this be a slow down only to use sites of Offset, or could this happen to slow down other uses of +/-? If other uses of +/- could be slower, then adding a generic overload for offsets is a non-starter.

edit: accidentally hit "save" early.

xedin · May 6, 2019, 8:54pm

would this be a slow down only to use sites of Offset , or could this happen to slow down other uses of + / - ? If other uses of + / - could be slower, then adding a generic overload for offsets is a non-starter.

Sorry that I didn't clarify this originally, adding generic versions of +/- would affect all uses of such operators and not just use sites of Offset.

Michael_Ilseman · May 6, 2019, 9:09pm

And here's a variant that drops the offset: label in exchange for explicit .start + n, cosmetically similar to @Tino's syntax, but restricted to simple from-end / from-start offsets due to the infeasibility of overloading generic +/-.

(To those casually following along, I'll summarize the variants and provide more rationale after I feel this space has been adequately explored. There are a few not-quite-orthogonal axis here, and I'll try to pick the best representative of each not-quote-orthant and show minor variations of those).

// Examples
func printStrings() {
  let str = "abcdefghijklmnopqrstuvwxyz"
  let idx = str.firstIndex { $0 == "n" }!

  print("-- single element subscript --")
  print(str[.end-4]) // w
  print(str[..<idx][.end-100]) // a
  print(str[idx...][.start+1]) // o
  // No equivalent: print(str[(idx++1)--10]) // e

  print("-- relative range --")
  print(str[.start+1 ..< .end-2]) // bcdefghijklmnopqrstuvwx
  // No equivalent: print(str[..<idx--2 ..< --2]) // lmnopqrstuvwx
  print(str[idx...][..<(.end-2)]) // nopqrstuvwx
  print(str[..<idx][(.end-2)...]) // lm
  // No equivalent: print(str[idx--2..<idx++3]) // lmnop
  print(str[.end-4 ..< .end-2]) // wx

  print("-- relative range through --")
  // No equivalent: print(str[idx--2 ... --2]) // lmnopqrstuvwxy
  print(str[idx...][...(.end-2)]) // nopqrstuvwxy
  print(str[...idx][(.end-3)...]) // lmn
  // No equivalent: print(str[idx--2...idx++3]) // lmnopq
  print(str[.end-4 ... .end-2]) // wxy

  print("-- partial relative range up to --")
  // No equivalent: print(str[..<idx++2]) // abcdefghijklmno
  print(str[..<idx][..<(.end-2)]) // abcdefghijk
  print(str[..<(.start+20)]) // abcdefghijklmnopqrst
  print(str[..<(.end-20)]) // abcdef

  print("-- partial relative range through --")
  // No equivalent: print(str[...idx++2]) // abcdefghijklmnop
  print(str[...idx][...(.end-3)]) // abcdefghijkl
  print(str[...(.start+20)]) // abcdefghijklmnopqrstu
  print(str[...(.end-20)]) // abcdefg
  print(str[...(.end-20)][...(.end-3)]) // abcde
  // No equivalent: print(str[...((--20)++2)]) // abcdefghi

  print("-- partial relative range from --")
  print(str[idx...][(.start+2)...]) // pqrstuvwxyz
  // No equivalent: print(str[idx--2...]) // lmnopqrstuvwxyz
  print(str[(.start+20)...]) // uvwxyz
  print(str[(.end-20)...]) // ghijklmnopqrstuvwxyz
}

func printSplitFloats() {
  func splitAndTruncate<T: BinaryFloatingPoint>(
    _ value: T, precision: Int = 3
  ) -> (whole: Substring, fraction: Substring) {
    let str = String(describing: value)
    guard let dotIdx = str.firstIndex(of: ".") else { return (str[...], "") }
    return (str[..<dotIdx], str[dotIdx...][.start+1 ..< .start+(1+precision)])
  }

  print(splitAndTruncate(1.0)) // (whole: "1", fraction: "0")
  print(splitAndTruncate(1.25)) // (whole: "1", fraction: "25")
  print(splitAndTruncate(1.1000000000000001)) // (whole: "1", fraction: "1")
  print(splitAndTruncate(1.3333333)) // (whole: "1", fraction: "333")
  print(splitAndTruncate(200)) // (whole: "200", fraction: "0")
}

func printRanges() {
  let r: Range<Int> = 3..<10 // Explicit type only needed for this gist, if part of stdlib, it won't be preferred...
  print((absolute: r[5...], relative: r[(.start+5)...]))
  // (absolute: Range(5..<10), relative: Range(8..<10))
}

func printFifths() {
  func getFifth<C: RandomAccessCollection>(
    _ c: C
  ) -> (absolute: C.Element, relative: C.Element) where C.Index == Int {
    return (c[5], c[.start+5])
  }

  let array = [0, 1,2,3,4,5,6,7,8,9]
  print(getFifth(array)) // (absolute: 5, relative: 5)
  print(getFifth(array[(.start+2)...])) // (absolute: 5, relative: 7)
}

import Foundation
func printDataFifths() {
  func getFifth(_ data: Data) -> (absolute: UInt8, relative: UInt8) {
    return (data[5], data[.start+5])
  }

  var data = Data([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
  print(getFifth(data)) // (absolute: 5, relative: 5)

  data = data.dropFirst()
  print(getFifth(data)) // (absolute: 5, relative: 6)
}

func printRequirements() {
  func parseRequirement(
    _ str: Substring
  ) -> (predecessor: Unicode.Scalar, successor: Unicode.Scalar) {
    return (str.unicodeScalars[.start+5], str.unicodeScalars[.start+36])
  }

  """
  Step C must be finished before step A can begin.
  Step C must be finished before step F can begin.
  Step A must be finished before step B can begin.
  Step A must be finished before step D can begin.
  Step B must be finished before step E can begin.
  Step D must be finished before step E can begin.
  Step F must be finished before step E can begin.
  """.split(separator: "\n").forEach { print(parseRequirement($0)) }
  // (predecessor: "C", successor: "A")
  // (predecessor: "C", successor: "F")
  // (predecessor: "A", successor: "B")
  // (predecessor: "A", successor: "D")
  // (predecessor: "B", successor: "E")
  // (predecessor: "D", successor: "E")
  // (predecessor: "F", successor: "E")
}

func runAll() {
  printStrings()
  printSplitFloats()
  printRanges()
  printFifths()
  printDataFifths()
  printRequirements()
}

runAll()

let intRange = 1..<5
dump(intRange) // Range<Int>


let range: ClosedRange<OffsetBound> = .start+1 ... .end-1
let x = " Hello, World! "
let y = "abcdefg"
let z = "  "

print(x[range]) // "Hello, World!"
print(y[range]) // "bcdef"
print(z[range]) // ""

print("abc"[(.start+1)...]) // bc
print("abc"[.start+1 ..< .end-1]) // b
print("abc"[.start+1 ..< .end-2]) // ""
print("abc"[.start+1 ..< .end-3]) // ""

We can drop some of the extra parenthesis required in this approach with some more overloads, this is just demonstrating the simplest form.

pvieito · May 6, 2019, 9:42pm

For me at least, is the other way around, the negative index option feels more clear:

collection[offset: -1]: it must be the last element, since collection[offset: 0] already means the first element.
collection[offset: .end - 1]: seems to mean one before the end one (what does collection[offset: .end] mean?)

Michael_Ilseman · May 6, 2019, 9:50pm

end is an open bound and not a valid offset for subscripting, the way endIndex is an open bound and not a valid index for subscripting. collection[offset: .end] means the same thing as collection[collection.endIndex].

Note that the negative is just a convention on the interpretation of the literal. It wouldn't behave that way if a from-start offset were to be manipulated to have a negative offset. We're not actually implementing wrap-around semantics.

pvieito · May 6, 2019, 10:01pm

Yes, I meant it as an example of what a code reader could think.

Yep, that seems like a good trade-off between easy-of-use and correctness.

Torust · May 6, 2019, 10:21pm

Yet another alternative here (apologies if it's already been mentioned) is to forgo the explicit .start and .end computed properties in @jawbroken's example and have the enum cases be used explicitly.

Adapted example:

   print("-- single element subscript --")
    print(str[.fromEnd(4)]) // w
    print(str[.fromIndex(idx, -100)]) // a
    print(str[.fromIndex(idx, 1)]) // o
    // No equivalent: print(str[(idx++1)--10]) // e
    
    print("-- relative range --")
    print(str[.fromStart(1)..<.fromEnd(2)]) // bcdefghijklmnopqrstuvwx
    // No equivalent: print(str[..<idx--2 ..< --2]) // lmnopqrstuvwx
    print(str[idx..<.fromEnd(2)]) // nopqrstuvwx
    print(str[.fromIndex(idx, -2)..<idx]) // lm
    // No equivalent: print(str[idx--2..<idx++3]) // lmnop
    print(str[.fromEnd(4)..<.fromEnd(2)]) // wx

That reads a little better to me than other options for two reasons: the arithmetic is more clearly delineated from the ranges, and it feels less magical and more consistent with the rest of Swift.

jawbroken · May 7, 2019, 4:13am

One disadvantage of negative integers indexing from the end is that any mistake with calculated offsets will silently wrap around to the end. I don't know what mix of computed vs literal offsets are expected in practice though.

Thanks for this. I wonder if it would be possible/reasonable to use global computed variables here to remove the need for the dot prefix, which has unfortunate ambiguity issues and mixes with range operators in an unpleasant way. They would have to be named carefully in order to avoid confusion/clashing. The lack of generic computed variables is going to make this difficult, though it can probably be solved with more (problematic) overloading.

pvieito · May 7, 2019, 4:52pm

Negative indexes will only work for literals, not for computed offsets:

Michael_Ilseman · May 7, 2019, 4:55pm

As @pvieito pointed out, we don't implement wrap around. That's why all prototypes are internally represented as an enum of "fromStart" and "fromEnd" cases, and not a single stored Int. We need to either clamp or trap (still debating, but leaning towards trap), but not wrap.

jawbroken · May 7, 2019, 10:48pm

Thanks, I didn't appreciate that this is true of all the prototypes, even the ones with just integer ranges. What do you do if you've computed an offset from the end of a collection in the “negative literals index from the end” cases? Just recalculate it to be relative to the start?

Michael_Ilseman · May 8, 2019, 1:08am

I'm not sure I follow. Could you give an example of the scenario?

jawbroken · May 8, 2019, 1:36am

I was confused by the usage examples and didn't understand that e.g. this prototype only worked with literals full stop, and the limitation wasn't unique to offsets from the end. If you want to use a non-literal integer (e.g. function parameter or computed offset) from either the start or the end then you would need to go through the initialiser or similar, e.g. OffsetBound(fromEnd: maxLength).

QuinceyMorris · May 8, 2019, 5:13am

The conceptual problem with needing to interpret negative numbers as "from the end" can be solved, if you're prepared to be a bit flexible about the approach. There are many specific ways to do this, so I'll just pick one. I'm going to break the problem down into cases:

Pure relative to start. Use a labeled subscript and a range:

 c[fromStart: 0...1] // the first 2 elements
 c[fromStart: a...b] // where a >= 0 and b >= a

Pure relative to end. Again, a labeled subscript and a range:
```
 c[fromEnd: 0...1] // the last 2 elements
 c[fromEnd: a...b] // where a >= 0 and b >= a
```
This one reads "take the elements starting a elements from the end, and proceeding towards the left, ending b elements from the end", counting backwards.

If you've got used to the negative offsets, you'll hate this, but it's simpler to grasp for someone who has no preconceptions. The nice thing is that it's symmetrical with the fromStart: variant — [fromStart: 0...1] vs. [fromEnd: 0...1] — and there are no awkward -1-relative calculations.
Mixed relative to start and end. In this case, there's no range, but a 2-argument subscript, and the numbers are not offsets, but insets:
```
 c[inset: 1, 1] // all except the first and last element
 c[inset: c, d] // all except the first c and last d elements (where c >= 0 and d >= 0)
```
Note that the subscript isn't expressed as a range, because it isn't a range until after it's resolved against the collection. This also avoids the problems of the Comparable hole in the middle of a pseudo-range.

All of this would be implemented in terms of a single Collection.Offsets type (internally using a pair of enum cases to keep track of start and end offsets, say), no matter which of the three variants was originally used.

The type would have 3 corresponding initializer variants, so that there would be a way of keeping the various constructs in a variable, to meet @xwu's requirement that an offsetting expression can be calculated, stored and reused:

 let o = Collection.Offsets(fromStart: a...b); … c[o]
 let o = Collection.Offsets(fromEnd: a...b); … c[o]
 let o = Collection.Offsets(inset: c, d); … c[o]

There can also be:

 c[fromStart: 1] // the second element from the start
 c[fromStart: a] // the a'th element from the start, where a >= 0
 c[fromEnd: 1] // the second element from the end
 c[fromEnd: a] // the a'th element from the end, where a >= 0

These would be constructed in terms of a Collection.Offset type (internally using an enum case to distinguish start and end, say), again with variant initializers so that the offsets could be stored in a separate variable. Again, this is symmetrical at both ends of the collection.

In terms of the earlier part of this thread, I suspect that everyone will hate that the insets don't use Range syntax, but — if you think about it — pretending that this is a range is a big part of why earlier solutions are objectionable and/or complicated.