Offset Indexing and Slicing

The array is highlighting the asymmetry, but there are plenty of times where you might know that an index always exists at offset five or offset ten or whatever. I can very reasonably imagine an API that is generic over a Collection but has a precondition to have at least n elements present. Or in scripting contexts, you are being rough and ready and you just know you are only getting strings that have a three-character prefix or whatever. I think these kind of cases are reasonably frequent.

I don't understand the relevancy of this. We don't 'know' the indices when we use bare subscript (index: Index), and that does not return an optional. The implementation can certainly support returning a non-optional result.

1 Like

“Rough and ready” situations are why the ! operator exists. They shouldn’t dictate the design of general API, IMO.

3 Likes

Yes we do know. Besides iterating over indices, using startIndex, manipulating startIndex and endIndex with the various index(...) API, and using the firstIndex(...) and lastIndex(...) API give us known starting indexes and ones based off known indexes. Some Index types don't even have public initializers; you have to use the index manipulation API above instead of creating a value on your own.

And in a generic context it doesn't matter if the Index has a public initializer, since you can't use it anyway.

I'm inclined that we don't include .first and .last.

Many precedent languages for location before the end already use -1 for the last element, so people would be familiar with it. Code snippet would also already incorporate that fact.

It feels harmful to me to have choices between very similar concepts, especially when neither case fits perfectly for what I want.

7 Likes

From the description of how overloads are structured I don't see any problems with this approach from type-checker perspective. Type-checker is pretty good at dealing with concrete or partially concrete overloads, so performance shouldn't be a problem.

I don't think so, but although that seems to be an uncommon opinion here, I doubt that the audience in this forum is representative for the majority of actual users of Swift.

However, I'm not questioning that we have a performance-centric API; that's great, and it definitely shouldn't be replaced.
Instead, I'd prefer adding an alternative mechanism that fully focuses on ease of use and flexibility.
In this regard, the pitched solution imho doesn't perform that well compared to possible alternatives.

Correct me if I'm wrong, but the actual problem to solve is that we have to repeat ourselves way too often for accessing elements in a collection.
Obviously, we need to specify the collection itself, but we have to repeat this to get any index for it, and we have to repeat it even more often when we want offset such an index.
This is fine when those positions are stored anyways, but there are many situations where this isn't the case, and all the repetition is just a burden:
It's clear that the index has to come from the base collection, and the pitch addresses the shortcomings of the general case.

However, there is variety of situations where the elements of the collection give us much more options than asking for the first or last element. As soon as we deal with Comparable or Equatable, there can be other interesting indexes in a collection.
Especially for Strings, there are many applications, and with flexible indexing, it would be possible to tackle many problems where people normally reach out for regular expressions with all their downsides.

In the most naive implementation, that "flexible" indexing would utilize closure which take a Collection, and return an index (or range) from that collection.
This could not only be the first or last element, but also the first element bigger than 42 (for a collection of Int), the last occurrence of a newline in a String or something completely different.

Nice! I updated the gist with this simplification.

I originally avoided that to skip a bunch of introspection-branching at the top for RACs. Since OffsetBound as written is fairly resilient, we wouldn’t be able to leverage any static analysis from usage of .start or .end at the call site even if this were inlinable. But it might be worth it anyways. I tried to think of a more generally useful customization hook that could drive this, but couldn’t.

We can certainly bit-pack the anchor into a single word along with the offset. Swift’s address space is effectively capped at 60-bits (which itself is cautiously high). We wouldn’t want this approach, though:

These are the two most important reasons. Also, we only have one zero representation, so we couldn’t have zero mean both .start and .end, and the latter is useful for half-open ranges.

In practice, unless this type is going to commonly appear in large collections, there is not a huge difference between one word and two words (bucket sizes, stack alignment, etc). It is also resilient, so it will be handled indirectly anyways.

Not everyone is as big a fan of hand-packing bits as I tend to be, so unless there’s a strong need for it, I went with a simpler and more understandable approach :grinning:

1 Like

I was planning to add a setter for the slicing subscript for RangeReplaceableCollection, but I was surprised to the find the protocol doesn’t actually require one!

For the single-element one, a setter on MutableCollection would not be able to support nil doing removal, it would have to no-op.

I chose to leave these off. If we wanted to add setters to RRC, then it might make sense to add them there. I’ll update the “Alternatives Considered” with this rationale.

1 Like

This is awesome. Especially for programs that are generic over Collection, this would make indexing operations a lot more readable. Right now, indexing String or a generic Collection whose index type is not Int is extremely cumbersome and this would really help to fix this. Big +1 for me on this proposal. I really hope this makes it into Swift.

I just posted a new version, with these changes:

Elaboration:

I went with dropping .start and .end, since I found .first and .last to be so much clearer in pretty much every example I wrote. It also communicates the level of abstraction better, it’s more like Collection.first than Collection.startIndex. These offsets can be used across collections of different sizes and shapes, so long as they’re shaped enough for the element at that offset to be present. In contrast, indices can only be used on the collection from which they were derived.

Along these lines, I added RRC methods like insert and replace, as well as the subscript setter, which fleshes out this abstraction.

I looked a little deeper into this. There are two limitations that stop us: we can't provide just a setter implementation, nor can we refer to an inherited implementation/requirement. Because Collection’s subscript getter is a requirement of the protocol, RRC cannot provide a default implementation of a setter.

But, since we're defining the getter for the OffsetBound API, we can add a setter! Updated with a setter.

A funny piece of trivia: the value passed in to the subscript setter has to be of type Collection.SubSequence, which limits its convenience in generic code. But, ArraySlice conforms to ExpressibleByArrayLiteral and Substring conforms to ExpressibleByStringLiteral, so concrete use sites with literals work.

6 Likes

A bit of a tangent, but this is a good motivation for why set-only subscripts are useful. It’s the reason I’ve wanted them for a long time, so that one can define an API where the subscript getter returns a concrete type, while the subscript setter can accept any type conforming to certain requirements.

4 Likes

I'm a little bit sad about this.

The justifications you gave are perfectly valid, but they lead to this:

let str = "abcdefghijklmnopqrstuvwxyz"
print(str[.first + 3 ..< .last - 2]) // "defghijklmnopqrstuvw"

which I think is very, very unfortunate from the point of newbie usability. It seems like an active bug magnet for newbies and old fogies alike, in contrast to this:

print(str[.start + 3 ..< .end - 3]) // "defghijklmnopqrstuvw"

Edit: Specifically, the .last version invites off-by-1 errors, and it's unsymmetrical with offset calculations relative to .start/.first. Perfect humans don't make off-by-1 errors, but I'm not none one of them.

5 Likes

Can you explain the problem? It makes sense to me?

Edited above to clarify the problem.

Wouldn't the change mitigate that by separating the naming so there's no longer any confusion. Previously, with start / end, the similarity to startIndex / endIndex is what would likely lead to errors. Frankly, this formulation is what I think most people would expect, with start and end, with the indexes being eternally confusing, at least for me.

Honestly, I'm fine either way, whether to use start/end or first/last. The API is high level enough that we can safely depart from the existing one. What's important is which one would be more ergonomic for the targeted use case.

I can see we'll spend a good amount of time debating and/or comparing between the two.

print(str[(.first + 3) ... (.last - 3)])

makes more sense to me and is symmetrical. I'm not sure that it inherently invites off-by-1 errors more than any indexing scheme, though it is less precedented in other languages.

6 Likes

This proposal is much better than what we have now, but it's been bothering me that it's so fiddly.

Then I realized we could use a custom set of operators instead of .start + 1 and .last - 1.

|3 to represent .start + 3
3| to represent .last - 2

The pipe looks kind of like the edge, get it?

Here's the before / after with the examples given in the original post:

str[.first + 3 ..< .first + 6])
str[|3 ..< |6]

str[.first + 3 ..< .last - 2]
str[|3 ..< 3|]

str[.first + 3 ..< .last - 22]
str[|3 ..< 23|]

str[.last]
str[1|]

str[.last - 1]
str[2|]

str[.first + 26]
str[|26]

str[(.last - 3)...]
str[(4|)...]

And happily the prefix and postfix pipe operator is available.

2 Likes

Custom operators are somewhat covered in the Other Syntax section of the proposal. I can't recall if anyone has proposed the pipe operator before, but I don't immediately find it to be that readable in the examples (e.g. str[|3 ..< |3] vs str[|3 ..< 3|] vs str[3| ..< |3] vs str[3| ..< 3|]), though perhaps I'm just not used to it. I also think defining 3| as .last - 2 could be fairly confusing, and isn't symmetric (e.g. in terms of trimming n characters from the start of end). Why did you choose that interpretation?

1 Like