SE-0265: Offset-Based Access to Indices, Elements, and Slices

It behaves as an empty range, positioned at lowerbound (which is relevant for e.g. replaceSubrange).

Where should this documentation be? On the OffsetBound type, the definition or <, APIs taking RangeExpression where Bound == OffsetBound?

I definitely think it's on the APIs; it might also make sense to put it on the OffsetBound type as a "recommended behavior" for people implementing their own APIs based on OffsetBound. Alternately, you could expose a new relative(to:) method on RangeExpression where Bound == OffsetBound, which would help ensure everyone did it correctly.

2 Likes

First of all, I'm concerned when we rule out powerful solutions just because of performance issues the type checker currently has - I really hope we'll see improvements here, and see little reason to paint ourselves into a corner for features that can be implemented easily outside the stdlib.
I would be quite disappointed if we could never have proper support for vector math or multiplication of complex numbers because the compiler can't cope with an additional overload for *...

Secondly, I don't think the alternative you considered is the same I'm talking about:

offsetting from a given index. Relative indices, that is indices with offsets applied to them

Correct me if I'm wrong, but I clearly understand that in the sense that you can offset a known, fixed index - like in

var students = ["Ben", "Ivy", "Jordell", "Maxime"]
students[students.firstIndex(of: "Maxime")! - 1] = "Jordan"

Whereas this is nice to have, the solution I've been talking about adds slightly more convenience, because it saves one needles repetition of the used collection:

students[.find("Maxime") - 1] = "Jordan"

The first variant really bugs me, because it is absolutely clear that the first index has to come from students (yes, you can cheat because it's Array - but even then you can't determine the right index without consulting the collection).

Thanks for responding!

2 Likes

While the details of the linked alternative are not identical, it seems to me that your alternative must be (and indeed is, in the prototype) designed in basically the same way, and therefore has the same downsides. So I was linking that section of the proposal because I think it would be helpful to get your response to those issues, whether it is that you don't believe type-checking performance will be an issue (either now, or with future possible improvements), or that you have a clever new design that avoids the same pitfalls, etc. e.g. one possible alternative I can think of, to achieve a design like yours at the cost of extra language complexity/special casing, would be a new subscript parsing mode that could somehow refer to instance methods of the collection without having to repeat the variable name.

Before I dive into my comments, some minor nits worth addressing:

The description of OffsetBound contains the following comment above the new Collection.index(at:) function:

  /// - Complexity:
  ///   - O(1) if the collection conforms to `RandomAccessCollection`.
  ///   - O(*k*) where *k* is equal to the offset if the collection conforms to
  ///     `BidirectionalCollection`.
  ///   - O(*k*) if `position` is `.first + n` for any n, or `.last + 1`.
  ///   - Otherwise, O(*n*) where *n* is the length of the collection.

The third entry in that list says the complexity is O(k) without defining k, or when that condition is true. I assume it is supposed to read O(k) if position is .first + k for any n on a Collection, but I'm genuinely not sure. This also applies to the comment above Collection.subscript(position:), as well as the range subscript. I think all of those just need to be slightly clarified.

Additionally, you've defined subscript(position: OffsetBound) -> Element? { get set } and subscript<ORE: RangeExpression>(range:) -> SubSequence where ORE.Bound == OffsetBound { get set } on RangeReplaceableCollection. I think these should go on MutableCollection instead, as it seems to be the companion piece to the two major customisation points on that protocol.

The comment for the new Collection._reverseOffsetIndex(_:by:) says "Other methods such as index(_:offetBy:) must not be passed a negative offset if the collection is bidirectional.". I think that comment is logically inverted: don't you mean "must not be passed a negative index unless the collection is bidirectional"?

Assuming the above comments are addressed, the below is my feedback.

+1. While I have reservations about the application of this offsetting index to Collection I feel somewhat mollified by the section addressing my concerns in the proposal. I definitely think that the usability guarantees here are important. In particular, they address the most common annoyance when defining Collection types with opaque indices, which is that they cannot be integer-indexed without lots of ceremony. This is a good forward step and I'm glad to see it.

Yes. Collection types are very important to Swift, and improving their ergonomics and encouraging good index design as this patch does is well worth the change.

I believe so.

Multiple re-reads of the proposal and participation in the original pitch thread.

RangeReplaceableCollection is correct for the range-based subscript; there's no way to check that the subscript range has the same number of elements as the new value. You're correct about the single-element subscript, though.

Does that not apply to this customisation point though?

Uh…that's concerning that the docs don't even mention the requirement for the slices to be equal size.

3 Likes

SR–10030 tracks the documentation issue.

2 Likes

Isn't that because you can use the single element subscript to remove an element by setting it to nil?

Type checker performance is a priority for the Swift project. I don't think we can justify a significant type checker performance regression for this feature. The type checking issue is not an attribute of the current implementation specifics (though those can be improved as well), but is inherent to the way Swift's type system works. @xedin can provide more details.

As far as "easily implemented outside the stdlib", customization hooks (which is part of this proposal) cannot be implemented outside the stdlib. Beyond technical limitations, standard library additions pick a specific API design and solution among a set of tradeoffs, which is valuable. The fact that the proposed addition went through years and multiple rounds of debate and prototyping demonstrates that this is not "easy".

I don't see how your point about other math operations is analogous.

It is the same concept: applying an offset to some position in a collection other than its bounds. The mechanisms through which this concept would be implemented seem to be identical to the mechanisms I identified as problematic. Do you have a different set of mechanisms or new findings as to why it would not be problematic?

If there were no issues, then of course I'd prefer a solution that could offset abstract positions other than its bounds. It would synergize nicely with CollectionSearcher (which I have an interest in :slight_smile:), as you brought up earlier.

I think it'd be useful to include

Collection.offsetFromFirst(of index: Index) -> OffsetBound
Collection.offsetFromLast(of index: Index) -> OffsetBound

. It's a good fit to work with special offset like .find("=") mentioned above. It'd easily add expressivity to the current design.

Edit:

I'm still undecided if we should separate offsetFromFirst vs offsetFromLast, but I'll settle with simpler

Collection.offset(of index: Index) -> OffsetBound

for now.

Ah, it was using the same k as the previous bullet point. @nnnnnnnn and I had debated how to present this, and we found a long run-on or complex condition was worse than a separate bullet point. I don't recall if there was precedent for "inheriting" the definition of a variable from a prior point elsewhere in documentation. We could say :

  /// - Complexity:
  ///   - O(1) if the collection conforms to `RandomAccessCollection`.
  ///   - O(*k*) where *k* is equal to the offset if the collection conforms to
  ///     `BidirectionalCollection`.
  ///   - O(*k*) where *k* is equal to the offset if `position` is `.first + n` for any n, or `.last + 1`.
  ///   - Otherwise, O(*n*) where *n* is the length of the collection.

or

  /// - Complexity:
  ///   - O(1) if the collection conforms to `RandomAccessCollection`.
  ///   - O(*k*) where *k* is equal to the offset if the offset is positive or if the collection conforms to `BidirectionalCollection`.
  ///   - Otherwise, O(*n*) where *n* is the length of the collection.

but that doesn't quite define what is meant by "positive", and whether k would be the magnitude of the offset, etc.

MutableCollection operations provide the lowest-level interface for element swap and element replacement, which enable efficient generic algorithm implementations.

I don't think OffsetBound provides a good programming model at this low level, as it is most intuitive at an optional-returning, clamping level. You don't want Optional wrapping in generic contexts for newValue at this level. Range-clamping does "what you want" for RangeReplaceableCollection but is less directly tied to the same-size invariant required by MutableCollection. Efficient generic algorithms will not target these interfaces in their implementations anyways. Finally, we need some behavior for nil assignment or empty-range with non-empty newValue; trapping is an odd semantic departure from the RangeReplaceableCollection and general OffsetBound behavior.

In lieu of all that, you can get an optional index from an OffsetBound, and if that doesn't produce nil go on to use it with MutableCollection as intended. And as @jrose mentioned we should add relative(to:) for ranges. I think this is the lesser of evils: treat OffsetBound as higher level than MutableCollection, which exists to enable generic in-place algorithms implemented in terms of swap or replacement.

You are correct.

1 Like

Excerpt from the proposal, regarding subscription over range of OffsetBound:

Offset ranges are clamped, resulting in empty slice return values, just like other partial ranges.

Existing subscriptions requires the indices to be valid, while offset ones do not. This difference should be enough to warrant a separate consideration.

Offset ranges operate on the assumption that collection is sufficiently large (this is the basis for < operator algorithm). So I'd say that it returns nil when collection is not large enough.

Current design has a somewhat confusing scenario:

let data = ...
// This MAY return less than 3 elements if `data` is small.
let array = data[.start + 3 ..< .start + 6]

This ends up forcing user to check the value of count to make sure that the resulting array has enough elements, which is somewhat divorced from the returned value.

guard data.count >= 6 else {
  // Not enough data, do something
}
let array = data[.start + 3 ..< .start + 6]

VS

let array = data[.start + 3 ..< .start + 6]
guard array.count == 3 else {
  // Not enough data, do something
}

VS

guard let array = data[.start + 3 ..< .start + 6] else {
  // Not enough data, do something
}

The first two requires the logic to be applied twice, once to check if array has all that is needed, and again to retrieve array.


While it is implicit from the < operator, I think it's also worth mentioning in the proposal that the current design disallow putting .last offsets before .first offsets even if a collection is small enough to contains all offsets in the range:

// invalid even if data.count >= 1000
let array = data[.last - 100 ..< .first + 1000]
2 Likes

I personally prefer the first of the two alternatives you proposed, but you might also be able to define k and n separately (like mathematical equations usually do):

/// - Complexity:
///   - O(1) if the collection conforms to `RandomAccessCollection`
///   - O(*k*) if the collection conforms to `BidirectionalCollection`
///   - O(*k*) if `position` is `.first + n` for any n, or `.last + 1`
///   - Otherwise, O(*n*)
///   where:
///   - *k* is equal to the offset
///   - *n* is the length of the collection

The advantage of this approach is that it avoids repetition and makes the individual list items more concise; the disadvantage is that it separates the definitions from their usage.

It also might not hurt to choose a different letter than n in the following sentence to prevent the reader from accidentally conflating it with *n*:

///   - O(*k*) if `position` is `.first + n` for any n, or `.last + 1`
4 Likes

Yeah, that's a tricky one. I don't want to bike shed the code comment too much, so maybe we can leave it as it is and make a much smaller tweak. Right now it reads:

  ///   - O(*k*) if `position` is `.first + n` for any n, or `.last + 1`

Maybe it could read:

  ///   - O(*k*) if `position` is `.first + k` for any k.

This omits the confusing .last + 1 case (yes, it's O(k), but k == 1 there and the result is nil anyway). This change makes it clearer what the complexity actually is related to, at least in my view.

This feedback is basically of a piece with @kansaichris's feedback, both of which address the issue the same way.

I think this is the most compelling argument for placing this on RRC. I think the efficient generic algorithm implementations argument carries little water with me because we don't care about it when working with Collection, but the fact that these subscripts all operate in a realm of optionality makes much more sense as an argument. So I think that's probably ok.

Emphatic +1.

Swift has taken the principled approach to make collections with non-integral indexes not be indexable by integers. This upsets many people. Meanwhile, the recommendations we instead tell them to reach for are either non-obvious (prefix(_:), et. al.) or are too verbose (index(_:offsetBy:)). To improve upon the status quo, we need something that composes well with all Collection. This is absolutely that proposal.

This proposal does a good job of "bringing back" some of the desirable syntax of this behavior while performing as close as possible to the optimal syntax we tell people to write.

I've worked a lot with Swift since SE-0172 and feel comfortable that the situation could be improved. I've used other languages with an emphasis on slicing, and enjoy how the proposed syntax compares with those; I also appreciate that this makes better safety tradeoffs than those.

I've followed all the iterations of the pitch.

2 Likes

One nice thing about the proposal is that you added code to try the feature without checking out a complete branch of Swift. As I read that, _reverseOffsetIndex is the only customization point - is that correct?
I don't think this hook is that important, but it could also be added in a separate proposal (especially as you point out that it is beneficial for other functionality as well).

This is just another situation where overloads of established operators are desirable - and you argue very strong against doing that:

It is a total non-starter for operators such as + and - , which already have complex resolution spaces.

Something like Vector<Double> would have the exactly same drawbacks that you describe, wouldn't it?
So that leads to the last part of my answer:

The downsides might be the same, but if you take that as the only criteria, you rule out any new overloads which share this issue (including vector math etc).
So I think you have to take into account not only the price to pay, but also the merit you get in return - and that's imho slightly higher when you allow context-sensitive anchors. Actually, I can't remember a single situation where SE-0265 would have been helpful for my work, but more situations than I can count where I had to work with substrings marked by certain delimiters...

In general, I think it's very subjective which amount of slowdown for the type checker is tolerable; it's also impossible to show definite numbers, but can you at least show the benchmarks that lead to your decision?

1 Like

I generally like this proposal, but one thing I don't like is that the special indices are named first and last. In the context of Collection, first and last already refer to elements, not indices. I don't think it's a good idea to overload those terms. Perhaps startIndex and endIndex would be better names as they match the existing Collection.startIndex and .endIndex and thus require little additional cognitive burden.

This would also clear the way for some (magical?) functionality whereby all index-returning methods on Collection be made available within subscript brackets, e.g., arr[.firstIndex("x")! + 2]

3 Likes

Seeing that the standard subscript for RangeReplaceableCollection is get only, I was about to rant that you made a mistake making the OffsetBound version get/set. That element mutation is the domain of MutableCollection, and the two mutation philosophies are kept distinct in protocols that do not refine each other. But someone beat me to the main point:

Then I discovered a good counter-point:

But I think both OffsetBound-based subscripts for RangeReplaceableCollection should explicitly mention that changes to the targeted subsequence of the collection during a set occur via element(-range) replacement, not element value mutation.