Pitch: Offsetting Indices and Relative Ranges

Jon_Hull · April 29, 2019, 8:10am

Crazy idea: What about reusing $0 to mean the index?

print(string[$0 + 1])

or if that would conflict inside of closures, $i, $idx, or $index:

print(string[$index + 1])

gwendal.roue · April 29, 2019, 8:12am

This is consistent with other languages that provide support for from-the-end negative offsets (which start at -1, since 0 is the first offset from the beginning).

For example, Python:

>>> string = "12345"
>>> string[3]
'4'
>>> string[-3]
'3'

Ruby:

irb(main):001:0> string = "12345"
=> "12345"
irb(main):002:0> string[3]
=> "4"
irb(main):003:0> string[-3]
=> "3"

This is not a strong argument for importing this convention into Swift, I know. But at least it makes those ++ and -- operators behave as their appearance wants them to behave, according to prior art. The story could be different with >> and <<, I guess.

Avi · April 29, 2019, 10:10am

It's also consistent with the difference between startIndex (0) and endIndex (count), but it might be unexpected when providing just offsets, with no mention of the starting index at the call site.

Having a positive offset behave as if it started at -1 would make it clear that it's not the same as the index. Whether this is desirable is something else.

johannesweiss · April 29, 2019, 3:26pm

I agree we should have this in the standard library. I really dislike the proposed syntax but I don't really have a good suggestion either...

gregtitus · April 29, 2019, 3:29pm

I share Quincey's concern here. I don't especially like ++ and -- for infix, but I realize that's probably C heritage getting in my way and I can imagine growing okay with them.

The prefix versions with range operations, though, feel especially noisy and hard to read, especially mixed with the infix versions. As a possible alternative, I'd like to suggest (yet more operators!) prefix |- and postfix -|, borrowed from NSLayoutConstraint's visual format language. |-1 becomes an expression meaning one index from the start, and 1-| is one index from the end.

Some samples from the pitch with these:

  let str = "abcdefghijklmnopqrstuvwxyz"
  let idx = str.firstIndex { $0 == "n" }!
  // -- relative range --
  print(str[|-1 ..< 2-|]) // bcdefghijklmnopqrstuvwx
  print(str[idx--2 ..< 2-|]) // lmnopqrstuvwx
  print(str[idx ..< 2-|]) // nopqrstuvwx
  print(str[4-| ..< 2-|]) // wx
  // -- relative range through --
  print(str[idx--2 ... 2-|]) // lmnopqrstuvwxy
  print(str[idx ... 2-|]) // nopqrstuvwxy
  print(str[4-| ... 2-|]) // wxy

DeFrenZ · April 29, 2019, 3:35pm

I'm generally happy with the functionality and its internal structure, not as much with the call-site dx. The essence of what I don't like is probably --2:
to me there's no indication that this is an index that's offset from endIndex, even if I knew what anIndex -- 2 meant. The explanation of it is "if the value is missing, the edge is inserted in the hole", which works, but why should I assume endIndex and not startIndex in there? For a looping collection it would be equivalent, but I think that graphically it's not intuitive. There's also the additional issue of having a range from the start to that relative index... which is written as ...(--2)? I see it as confusing because it puts all the weight on the left, leaving no indication that there is a gap on the right.

If we start the design by finding a solution to this, then we can think of simply making unary -- a postfix func instead, which to me reads better as ...2--, but then it breaks the "fill the hole" logic as in full it's still endIndex -- 2. The only way I see to maintain both is to have a symmetrical design, so an operator like the ones suggested in the thread e.g. <<. The important detail about this is to allow this operator to be used only in one way: index >> offset and offset << index, but not viceversa.

Given this design, it would read as:
startIndex >> 1, >>1
2 << endIndex, 2<<
>>1..., ...2<<, >>1 ... 2<<

I believe these examples represents the actual range better graphically, while still being as correct. Also avoids the issue of people coming from other languages and thinking that ++/-- are for increment/decrement.
There is the issue that having both << and ..< might confuse a bit, so maybe <</>> isn't the best choice, but I highly favour a symmetric choice here.

Michael_Ilseman · April 29, 2019, 8:46pm

Sure, in fact, maybe we can just have RelativeRange and be done with it. I’ll prototype that up.

It shouldn’t have, nor should it have a same type constraint on Bound and Collection.Index. ABI stability means we can’t change any of this. But, I don’t think it’s a significant problem.

Not quite, RangeExpression needs an associated Bound type. For a relative range that only has offsets, it needs a phantom Bound type. So its applicability to some collection is limited by that collection’s index type.

let range: RelativeClosedRange<String.Index> = ++1 ... --1

Although as I mentioned to @nnnnnnnn, maybe we can drop all the variants.

Ah, this is an interesting perspective. We have two kinds of things here:

A phantom-typed “offset” range, which can be applied to any collection satisfying the same type requirement.
A “relative” range with an index in there somewhere, which should only be applied to the collection that index was derived from.

And we have two questions, which have no particular order as the answer to one influences the other:

Is this an important enough distinction to make?
How do we make it?

Just to explore this out a little bit, let’s answer the second one by saying we have OffsetRange<Bound> and RelativeRange<Bound>. I haven’t prototyped this out, but let’s assume there’s no further friction that comes up from this distinction (as there sometimes is).

The cost is two types, whose particulars are communicated via documentation but hopefully users are guided to using the right type in the right context. The benefit is allowing OffsetRange to be used in more generally applicable situations. If these are commonly confined to just a local scope where there is only one collection to be used against (such as within a subscript), then they don’t deliver that benefit. But, if OffsetRange is used as some kind of currency type in API, e.g. returned from a function, then there is real benefit there.

How likely or important is the currency case? I don’t have a strong argument here. My feeling was that such a value may be so general as to lose its usefulness. How useful is it to pass around a value that represents “the range of all but the first two and last three indices”? On the other hand, parseRequirement in the examples could be restructured to return such a thing.

In the pitch:

For something like an OffsetRange, which is a currency type, it could make sense to have it conform to Sequence, Codable, etc., if Bound does. Its bounds could also be public members. This wouldn't need to be the case for RelativeRange.

I’ll try to prototype this two-type approach and see how that looks.

That might be possible, what do you think the benefits would be? As currently implemented, RelativeRange's stored upperBound and lowerBound properties aren’t public as I don’t think they make sense to poke at beyond using them on a collection. However, this wouldn’t necessarily apply to an OffsetRange like mentioned above.

Michael_Ilseman · April 29, 2019, 8:47pm

I’ll clarify the wording; it’s worth calling out. This pitch is presenting the syntax of omitted operands, but it has to do so through the mechanism of extra overloads and prefix operators. If/when Swift gets omitted operands for infix functions, we can drop that cruft and bask in the joy of precedence groups.

Pitch

Thanks, the proposal will be more formal and have terminology defined up front.

Postfix doesn’t have as many issues, so we’ll explore that some.

Not forgotten, but it’s worth a sentence in the detailed design. The actual documentation will mention it and it will be enforced at run time, same as index(_:offsetBy) etc.

Michael_Ilseman · April 29, 2019, 8:49pm

Operators in the standard library are currently constrained to ASCII. If this constraint is lifted, then all kinds of useful alternatives are available.

There are issues with reusing any operator that’s also defined on Int. Bound’s only constraints is on Comparable so we can’t compete with any more specific operator. I don’t think <</>> could be used as infix operators in this case.

This has the same issues as using >>/<<, but for the prefix version. E.g. how does it work for printRanges, printFifths, or printDataFifth example?

Also, (with the exception of + for append), they strongly imply constant-time operations, while these offset operators have time linear in the length of the offset. This could be debated or reconsidered, but let’s also consider these view points:

and others that were strongly voiced in prior threads. Visual distinction of run-time complexity can be important.

Michael_Ilseman · April 29, 2019, 8:49pm

Totally understandable, let’s explore some alternatives without C baggage. (No one ever complains about Haskell baggage)

Using symmetrical operators where the “backwards” offset is on the left-hand-side is appealing. It swaps the syntactic issue of backwards offsets at the end of a range with a syntactic issue of backwards offsets at the beginning of a range. This is a better tradeoff, at the cost of needing to come up with the symmetric pair. I’ll try to prototype something.

idrougge · April 29, 2019, 10:21pm

It's not as though the ease of typing characters has bothered Swift designers before, what with its reliance upon characters such as [ ] { | }.

Both « and » are valid Swift operator characters, though maybe not in the standard library.

Michael_Ilseman · April 29, 2019, 11:58pm

@nnnnnnnn, gist updated. It works out pretty well!

zwaldowski · April 30, 2019, 3:36am

Yeah! I don’t think the benefit is necessarily intrinsic. The only way I can think to phrase the benefit would be tautological; hiding the type names would remove (what I hazard would be) a large part of the proposal discussion centered around the exact spelling of new types, that we ultimately don’t care much to name.

@nnnnnnnn has already phrased all my thinking way better w.r.t. a single RelativeRange type, which I’d be happy with. The spellings of the handful of relative range types read a little bit like the Range types of yore, the combinatorial explosion of which I know annoyed people.

Morten_Bek_Ditlevsen · April 30, 2019, 8:50am

I have followed this discussion along with many previous discussions about offset based indexing - specifically into Strings.

I am not a teacher of Swift, but my feeling about this is basically that it might be hard for newcomers to the language to understand the necessity for the ++ operator (or what it may end up being called).

I feel that basically we are trying to say:

We do not allow integer-based offsets for non-RandomAccessCollections since there is an implicit expectation that subscript lookups are constant in time.
But if you add this ++ operator in front of your integer-based index, then it's ok because then you are made aware by the new annotation that something fishy is going on.

There are a few things about this that worries me.
First of all it feels like the wrapper is a bit artificial. To me it's kind of like saying: "You should not think of an integer as being a relative offset, but instead you can convert it to one". In my mind an integer is perfectly capable of representing a relative offset, so the wrapping feels a bit strange.
Secondly, if you have a RelativeBound and use it as a subscript into a String, why is the expectation that the lookup should be constant suddenly gone? Because now it's not an integer, but something else that nobody has any reason to expect constant lookup for?

Even though the previously discussed version with an explicitly named subscript parameter (myString[offset: 10]) is perhaps not as flexible as what the new pitch proposes, I have a feeling that it is much easier to grasp.

After reading this pitch and trying to guess what it might feel like for a newcomer to Swift to use this API, I have started to think about whether it's really worth it to be so strict about the "constant time subscript lookup" versus the extra overhead for newcomers that have to understand something like the outcome of this pitch or the explicitly named offset subscript parameter version. If the problem of using integer indexes into Strings is really so rare, couldn't we just give in and allow it even though someone might implement a slow algorithm using this? - And I just want to say that I have been argumenting for the current verbose String indexing since forever - it's only now that the workarounds are being discussed that I'm starting to feel like something is a bit 'off'.

DeFrenZ · April 30, 2019, 12:29pm

Unluckily, as highlighted in the pitch, the issue is broader than that. Sub-collections are a big issue as well.

I kinda feel that if this works well we should almost deprecate Array subscripting with Int (I do realise it's extremely breaking in all the ways it could)

lukasa · April 30, 2019, 4:11pm

I definitely feel like the fact that some collections have elected to try to be zero-indexed has led to issues with understanding the indexing system in Swift. In SwiftNIO we've moved to opaquify all of the indices into our custom collections, partly to enable safe self-slicing behaviour, but mostly to discourage users from relying on the idea that integer indexing might be sensible in this context.

However, I don't believe this particular genie can be put back in the bottle.

jrose · April 30, 2019, 4:35pm

I'm with @Morten_Bek_Ditlevsen on this. I thought the labeled subscripts provided a nice balance of convenience and callouts, and that if you need something complicated you can do it manually (usually using dropFirst(_:) and dropLast(_:)). @xwu's example of a general "one off the beginning, one off the end" is clever and even useful sometimes, but I'm not sure it's worth the additional complexity and confusion around indexes.

The remaining important case that the labeled subscripts don't handle is indexes offset from endIndex. That's not hard for elements if we come up with an appropriate name—foo[reverseOffsetFromEnd: 3]—but it's trickier for ranges, especially if we want something like "N from the beginning ..< M from the end". So I see how this solution manages to encompass all the use cases.

The case I've actually needed most is what this proposal spells idx++1, for when you want to start a seach after a known index instead of before it, but even that has a potential spelling already given our other range operators: idx<...

Do you have use cases on hand where indexes from the end are interesting, and the improvement over dropLast(_:) using offset syntax? Bonus points if they're not just for Strings.

Michael_Ilseman · April 30, 2019, 8:46pm

Here is a gist using --> and <--. The single line -> is not supported as an operator.

I noticed when updating the use cases that without extra spaces, a lot of usage seems visually ambiguous with the range operators due to the > and < characters. I added some explicit spacing. I can also post examples with and without visual spacing:

  let str = "abcdefghijklmnopqrstuvwxyz"
  let idx = str.firstIndex { $0 == "n" }!
  print("-- single element subscript --")
  print(str[14<--]) // m
  print(str[100<--idx]) // a
  print(str[idx-->1]) // o
  print(str[10<--(idx-->1)]) // e
  print("-- relative range --")
  print(str[-->1 ..< 2<--]) // bcdefghijklmnopqrstuvwx
  print(str[2<--idx ..< 2<--]) // lmnopqrstuvwx
  print(str[idx ..< 2<--]) // nopqrstuvwx
  print(str[2<--idx ..< idx]) // lm
  print(str[2<--idx ..< idx-->3]) // lmnop
  print(str[4<-- ..< 2<--]) // wx
  print("-- relative range through --")
  print(str[2<--idx ... 2<--]) // lmnopqrstuvwxy
  print(str[idx ... 2<--]) // nopqrstuvwxy
  print(str[2<--idx ... idx]) // lmn
  print(str[2<--idx ... idx-->3]) // lmnopq
  print(str[4<-- ... 2<--]) // wxy
  print("-- partial relative range up to --")
  print(str[..<idx-->2]) // abcdefghijklmno
  print(str[..<2<--idx]) // abcdefghijk
  print(str[..<(-->20)]) // abcdefghijklmnopqrst
  print(str[..<20<--]) // abcdef
  print("-- partial relative range through --")
  print(str[...idx-->2]) // abcdefghijklmnop
  print(str[...2<--idx]) // abcdefghijkl
  print(str[...(-->20)]) // abcdefghijklmnopqrstu
  print(str[...20<--]) // abcdefg
  print(str[...(2<--20<--)]) // abcde
  print(str[...((20<--)-->2)]) // abcdefghi
  print("-- partial relative range from --")
  print(str[idx-->2...]) // pqrstuvwxyz
  print(str[2<--idx...]) // lmnopqrstuvwxyz
  print(str[-->20...]) // uvwxyz
  print(str[20<--...]) // ghijklmnopqrstuvwxyz

jrose · April 30, 2019, 9:10pm

Side note: I'd also request using something other than "14 from the end" as an example, since it's easy for someone to miscount and therefore misunderstand what it's doing!

Michael_Ilseman · April 30, 2019, 9:43pm

Good point, I adjusted the pitch and will do that going forwards