SE-0265: Offset-Based Access to Indices, Elements, and Slices

That would be very convenient, but actually this proposal really does not pair well with such a hypothetical future:
Your feature can't be built on top of SE-0265, but would rather be a second way to achieve the same (and more). Both could possibly exist at the same time, but that redundancy wouldn't be nice.

How would you resolve the ambiguity of myArray[.startIndex] when there are two possible interpretations?

Okay, people keep bringing this up, but I don't understand it. Why can't a search be built on top of SE-0265? OffsetBounds get converted to indexes before any access happens, and that conversion can basically do whatever the heck it wants. And OffsetBound is an opaque struct, so its representation can change from release to release.

…okay, the place where it falls down is Comparable, but still.

When you say that Comparable is the place where it falls down, do you mean that syntax like the following wouldn't be possible?

var students = ["Ben", "Ivy", "Jordell", "Maxime"]

students[.find("Ivy") ..< .find("Maxime")] // ["Ivy, "Jordell"]

I mostly meant there's no way to implement Comparable for a find case. Even if you arbitrarily stuck it between the first and last cases, you wouldn't have a way to order two find cases if the element of the collection isn't itself Comparable.

…and I realized that this may answer my question of why OffsetBound can't be extended in this way: it's not parameterized on the collection element type.

1 Like

So the feature isn't possible as an extension of SE-0265, is it?
To be honest, I have more been thinking in terms of implementation details when I wrote about incompatibility with future extensions - but you are probably right that this doesn't matter here, and even if the implementation is replaced completely, this could be fine as long as the public interface stays the same... I'm really not sure if there aren't any pitfalls besides Comparable (like, could OffsetIndex become a protocol or a class?)

As I understand it, the order does not depend on any type properties, but rather on the Collection the OffsetIndex is actually used with - so therefore, it cannot conform to Comparable:
In let a = [3, 2, 1], indexing with a[.find(3) ..< .find(1)] should be fine - but without a, there is no natural order which makes sense to me.

Yes, it would. The problem with adding new operator overloads we currently face is related to impact they'd make on other expressions using overloaded operator. SIMD operators do impact expressions which have nothing to do with SIMD rather than use of operators like +. We are trying our best to come up with a solution for this problem which would allow most of the code to type-check in reasonable type but in certain circumstances type-checker would still end up attempting every single overload choice. I do agree with you that designing something around limitations of the type-checker is not going to yield most desirable results, but in reality it might be the best we can right now to balance benefit vs. impact.

Just to clarify, in OffsetBound case overloads added to + and other operators are "less generic" comparing to the ones added by SIMD which would have lower impact on type-checker performance.

1 Like

Is there a use case for this behavior? It still seems to me this could easily be the source of bugs when the intuitive count of range doesn't match the count of the extracted subsequence. This could easily lead to flawed logic.

let values = [0, 1, 2, 3], x = OffsetBound.start - 1

// Should have 2 elements, but has 1 instead
let subsequence = values[x ..< x + 2]

Another one is when insert just snap to the nearest index when offset is out of range. It's especially spooky when you can do this

var values = [0, 1, 2]
values.insert(-1, at: .start - 3) // Snap to start
print(values) // [-1, 0, 1, 2]

Again, all the functions in RangeReplaceableCollection requires the range expression to contain valid indices, so this behavior is unprecedented.

1 Like

Quick Review:

Personally I don't find the current proposal very appealing syntax-wise, but I think the functionality is needed.

I fear a bit overloading the subscript will be confusing because it'll look like you can mix offsets and indices, even though it'll fail miserably:

let a = [1, 2, 3]
let i = a.firstIndex(of: 2)
a[i ... .last] // compile-time error

Now, since it is an Array you'll be able to write this and it'll work:

a[.start + i ... .last] // okay, but only if `a` is an Array

It's quite silly to write this in general and it'll will return wrong results when you try it with ArraySlice. It'll not compile with any collection not using integer indices. Maybe I'm worrying for nothing and people will use a[i ... a.index(at: .last)] in a responsible manner. I guess we'll need to release the feature to see.

I think all this shows that what we really need is range type that can accommodate boundaries of two different types (and not require them to be comparable). Unfortunately changing that would probably be ABI-breaking at many levels.

1 Like

To me, this starts to look like a regular expression pattern. If we had some sort of "regular expressions subscript" syntax in the language, maybe we could do things like this:

let subrange = a[3(.*)1]

And then we could forget the idea of adding those things to OffsetBound.

Brainstorming about regular expression matching

First, I know this exact syntax using brackets won't really work because it's ambiguous, but I find it quite interesting to think about this in term of a subscript.

// Regular expression cheat sheet:
// using . to match one element
// using * to indicate we can match the preceding thing zero or more times
// using {3} to indicate the the preceding thing must be matched 3 times
// using parens () to capture a subsequence to be returned

let a = [5, 4, 3, 2, 1]

let subrange = a[.{3}(.*).{3}]
// three elements, captures any number of elements, followed by three elements.
// returns captured subrange or nil if not matched

let subrange = a[3(.*)1]
// matches element 3, captures any number of element, then matches element 1

let s = "54321"

let subrange = s["3"(.*)"1"]
// same as above, except we're matching characters here instead of integers
// (hence the quotes around each element)

We could allow many captures inside a subscript. We could also capture an index, an element, or a subrange based on type inference:

let a = "123456"
let (index, char, substring) = a["1"()"2"(.)"4"(.*)] as (String.Index, String.Element, String.SubSequence)?
assert(index == a.index(after: a.startIndex))
assert(char == "3")
assert(substring == "56")

While this is an interesting thought, developing it further is not really in scope of this proposal review.

1 Like

Sorry for the delay.

The concern is around clamping behavior vs doing something else across many of these APIs. I propose clamping behavior as being more intuitive and versatile than other alternatives, but there are definitely tradeoffs.

You brought up insert(_:at:). What should the behavior be for something like myCollection.insert(5, at: .first + 10) when myCollection has fewer than 10 elements? Similarly, what should the behavior of myCollection.replaceSubrange(..<(.first+10), with: [1,2,3]) be?

  1. Operation is a nop
  2. Operation traps
  3. Operation clamps

Option #1 is very surprising. If the developer called insert, remove, etc., they expect something to happen. Silently nop-ing is probably the least intuitive alternative.

Option #2 is surprising given that OffsetBound is otherwise a higher level, non-trapping representation of abstract positions within a collection.

And that leave Option #3, where the element(s) will be inserted at the end of the collection, even if the collection is short.

This rationale seems to intuitively extend to new collection APIs for the future, such as move(from:to:).

As for more dubious formulations such as a negative offset from the front of a collection, I think we should pursue some reasonable and consistent behavior for them. But, we shouldn’t significantly change our strategy to accommodate them at the cost of common, sane usage.

For ranges of OffsetBound, there is not an obvious length to the range if the lower bound is relative to the front of the collection and the upper bound relative to the back. It depends on the collection to which it is applied. Clamping in the middle is consistent with the “hole in the middle” interpretation (e.g. that comparison uses). Similarly, there is no run-time or API notion of the distance between two OffsetBounds (even if similarly anchored). As you point out, similarly-anchored offsets could visually imply such a length. However, consistent treatment with differently-anchored offsets means the length is contingent on the length of the collection to which it is applied.

Edit:

Thinking about this more, I think there is a strong argument that a fully-disjoint range or offset supplied to e.g. removeSubrange() should signal the error in some fashion. However, I feel that interior-clamping, that is clamping in the middle for opposing anchors or when only one bound dangles off the end still makes more sense.

I considered an addition, but I’m not sure how independently useful it is. OffsetBound is fairly opaque, so you would use it as an abstract position in some collection size-permitting or the basis for further offsetting. Can you think of any use cases? It can of course be computed as .start + collection.distance(from: collection.startIndex, to: index), though that is obnoxious if this is a frequent need.

This is intentional, see the alternative considered

Actually, I think keeping OffsetBound separate will help disambiguate this future scenario. If the collection instance can be inferred in a subscript, then you want to be using the index-receiving subscript instead of the OffsetBound receiving one. If OffsetBound didn't distinguish itself from indices, there would be ambiguities.


Except that the downsides today are sufficient to bar any improvements in this space.

So, I think an alternative that relies on speculative/significant changes in Swift’s type checking is untenable at this point. If we get those type checker improvements, as well as helpful mechanisms such as one to retroactively insert into a stable protocol hierarchy, then we can add a more expressive formulation. I really do want these improvements, but I’m being realistic.

At that point, OffsetBound may end up becoming a more niche type used to represent abstract positions across any collection type, and that’s ok.

These’s also an option for operation to return (discardable?) Bool indicating whether or not it succeed (and so it would be noop returning false). I do strongly feel that having an api trying to recover from unsound argument is too unsafe for Swift. Then again, it may be tolerable at this level of abstraction.

I’m thinking that an OffsetBound range is valid if the equivalent Index range is valid. So if the api falls back to invalid-argument behavior (whatever that may be), it can be interpreted as either the OffsetBound is invalid, or the collection is too small. This seems to be another behavior that is self-consistent.

I agree that there’s no notion of “length” for offset range with different anchors, but I still see that the same-anchor case to be more common (for extracting a fixed number of element, like argument parsing).

I was suggesting this when people started discussing about apis like .find(_:) which would be more appropriate to be an index-level api. So I thought having a way to cross the level of abstraction easily would be nice.

The more I think about it, the more it’d make sense to have the crossing to go from high level of abstraction to lower one, ie. from OffsetBound to Index, which is already part of the proposal.

Can we use discardable Bool? I vaguely remember seeing it somewhere in the standard library.

Yes, and that would align well with Set.remove. I’ll do a pass over the RRC overloads and see if there’s a consistent story here. Replacement might be harder to express.

1 Like
Terms of Service

Privacy Policy

Cookie Policy