Pitch: Offsetting Indices and Relative Ranges

jrose · May 8, 2019, 4:17pm

I mostly like this a lot. The fromEnd range being reversed is weird, though. Is that mostly just so you can use an existing range type, or are there actually cases where we'd expect someone to pass an existing range into fromEnd:?

Jens · May 8, 2019, 5:52pm

Given the existing meaning of endIndex and eg:

let str = "abcd"
print(str[str.endIndex]) // Runtime error: String index is out of bounds.

I would expect this:

to mean the last element, ie the first (1) rather than second (2) from the end (endIndex).

And I would expect this:

to trap (0 from end is out of bounds).

Even if it was spelled c[fromLast: 0...1] (to avoid the end/ endIndex-issue), it would still be a bit awkward imo, since it reads like it would return the last two elements reversed.

The existing way to write it is clearer:

c.suffix(2) // simple
c[fromEnd: 0...1] // complicated (end is endIndex or endIndex-1? result is reversed or not?)

And I think you made a mistake (related to assigning multiple meanings to "end") here:

QuinceyMorris:

c[fromStart: 1] // the second element from the start
c[fromStart: a] // the a'th element from the start, where a >= 0
c[fromEnd: 1] // the second element from the end
c[fromEnd: a] // the a'th element from the end, where a >= 0

That would be "// the a+1'th element from the start" and
"// the a+1'th element from the end" unless I'm mistaken.

What would the result of the following be:

let str = "abcd"
let result = str[inset: 3, 3]

?

jrose · May 8, 2019, 5:54pm

I think the fromEnd behavior is that it gives the same elements as c.reversed[fromStart: theRange].reversed. So c[fromEnd: 0] would correctly be the last element, not the past-the-end trap.

Jens · May 8, 2019, 6:09pm

I do understand what the intended behavior is, I'm just saying that
c[fromEnd: 0] the element at endIndex-1 (ie c.last) is complicated, since it requires users to keep in mind that "end" is not endIndex, but the index of the last element).

Michael_Ilseman · May 8, 2019, 6:09pm

Then it should be named appropriately, e.g. fromLast.

Jens · May 8, 2019, 6:10pm

Exactly, but:

To me, it reads like this: Starting at the last element, return the elements at offsets 0 and 1 (backwards from last), ie "dc" from "abcd".

Michael_Ilseman · May 8, 2019, 6:13pm

I also think that any fromLast or fromEnd offsets should be negative-or-zero so that ordering follows naturally without having to mentally reverse everything. Offsets are signed.

jrose · May 8, 2019, 6:29pm

Oh, that's interesting. c[fromEnd: -5..<0] is the last five elements. That's not bad.

Les_Pruszynski · May 8, 2019, 8:05pm

I thought that the point of this exercise was to make the expressions inside the square brackets as short as possible. Now we are trying to put some descriptive words inside them. Comments are for that. I wish we just adopted Ruby's patters and be done with it.

David_Smith · May 8, 2019, 9:18pm

Generally speaking, the goal is to strike a useful balance between readability, writability, and terseness.

Personally, as an old ObjC programmer, I put very little weight on terseness, but it can be pretty nice sometimes.

Les_Pruszynski · May 9, 2019, 12:27am

And this is the place to keep things short and tidy. I dread to think what kind of monstruosity we will come up with when it comes to Regular Expressions. I think we would do well here to adopt a subset of REs for Relative Ranges. Besides, there is prior art here and we don't have to reinvent this wheel.

QuinceyMorris · May 9, 2019, 4:42am

Yes, I can see that it might be an issue for the offset to go one way, and the elements the other. It's not — I think — conceptually difficult, but it certainly might be unexpected to someone who didn't already know the behavior.

This has some attractions as a way out:

c[fromEnd: -2..<0] // last two elements

except that it loses the symmetry with fromStart: because you can't write c[fromEnd: -1...0] to mean two elements. Another way out would be something like:

c[fromFirst: 0...1] // first two
c[fromLast: -1...0] // last two

which looks pretty symmetrical, but now this isn't very symmetrical:

c[fromFirst: 0..<1] // the first element
c[fromLast: -1..<0] // the second-last element (!)

This general approach also loses the beauty of not having any minus signs, giving way to this kind of mess:

c[fromEnd: -2..<(-1)]

Overall, my personal preference would be the offsetting style I originally suggested, but with this (hopefully) slightly preferable labelling:

c[fromFirst: 0...1] // first two
c[fromLast: 0...1] // last two
c[fromFirst: 0..<1] // first one
c[fromLast: 0..<1] // last one

It's mostly to avoid minus signs.

QuinceyMorris · May 9, 2019, 4:56am

It's definitely been a design goal (in endless previous threads on this subject) to keep the expressions simple, and it's more than reasonable for you to wish for something extremely concise.

The essential problem, though, is that there's an extremely high risk of misinterpretation, when the collection's indexes are Int as well as the offsetting expressions. For example, for a String, the expression s[0...1] "clearly" means the first two characters, but, for an array slice, a[0...1] "clearly" means the elements at indexes 0 and 1, which aren't necessarily first.

[Edit: Oops. Since array indexes are non-negative, elements at indexes 0 and 1 are necessarily first, if they exist in the array slice at all. But you know what I mean, I hope.]

Really good signposting is needed to prevent this being a trap for the unwary. That's one reason why you'll see some redundancy in the suggestions for offsetting.

The other significant problem is that we want to avoid giving any impression that offsetting is an O(1) operation, when it's frequently O(n). That's one reason why simple subscripting syntax like c[a,b] — where a and b can be negative for end-relative offsetting — has never gained any traction in these forums. It invites abuse, from a performance standpoint.

jawbroken · May 9, 2019, 12:07pm

I would add that the current way to do most of these things is very verbose and hard to interpret, so anything that makes it less verbose (but not necessarily maximally so) and easier to read will be welcome.

Michael_Ilseman · May 9, 2019, 9:30pm

I don't think there's a problem with a pure offset bound being Comparable. The issue arises when the bound could have both an index and an offset. This is the case for other partial ranges:

(1...).contains(100) // true
[1,2,3][1...] // [2, 3], i.e. open ranges are clamped and it won't subscript with 100