SE-0265: Offset-Based Access to Indices, Elements, and Slices

Tino · October 13, 2019, 1:15pm

That would be very convenient, but actually this proposal really does not pair well with such a hypothetical future:
Your feature can't be built on top of SE-0265, but would rather be a second way to achieve the same (and more). Both could possibly exist at the same time, but that redundancy wouldn't be nice.

How would you resolve the ambiguity of myArray[.startIndex] when there are two possible interpretations?

jrose · October 14, 2019, 5:52pm

Okay, people keep bringing this up, but I don't understand it. Why can't a search be built on top of SE-0265? OffsetBounds get converted to indexes before any access happens, and that conversion can basically do whatever the heck it wants. And OffsetBound is an opaque struct, so its representation can change from release to release.

…okay, the place where it falls down is Comparable, but still.

kansaichris · October 16, 2019, 1:09am

When you say that Comparable is the place where it falls down, do you mean that syntax like the following wouldn't be possible?

var students = ["Ben", "Ivy", "Jordell", "Maxime"]

students[.find("Ivy") ..< .find("Maxime")] // ["Ivy, "Jordell"]

jrose · October 16, 2019, 1:21am

I mostly meant there's no way to implement Comparable for a find case. Even if you arbitrarily stuck it between the first and last cases, you wouldn't have a way to order two find cases if the element of the collection isn't itself Comparable.

…and I realized that this may answer my question of why OffsetBound can't be extended in this way: it's not parameterized on the collection element type.

Tino · October 16, 2019, 10:34am

So the feature isn't possible as an extension of SE-0265, is it?
To be honest, I have more been thinking in terms of implementation details when I wrote about incompatibility with future extensions - but you are probably right that this doesn't matter here, and even if the implementation is replaced completely, this could be fine as long as the public interface stays the same... I'm really not sure if there aren't any pitfalls besides Comparable (like, could OffsetIndex become a protocol or a class?)

As I understand it, the order does not depend on any type properties, but rather on the Collection the OffsetIndex is actually used with - so therefore, it cannot conform to Comparable:
In let a = [3, 2, 1], indexing with a[.find(3) ..< .find(1)] should be fine - but without a, there is no natural order which makes sense to me.

xedin · October 16, 2019, 8:08pm

Yes, it would. The problem with adding new operator overloads we currently face is related to impact they'd make on other expressions using overloaded operator. SIMD operators do impact expressions which have nothing to do with SIMD rather than use of operators like +. We are trying our best to come up with a solution for this problem which would allow most of the code to type-check in reasonable type but in certain circumstances type-checker would still end up attempting every single overload choice. I do agree with you that designing something around limitations of the type-checker is not going to yield most desirable results, but in reality it might be the best we can right now to balance benefit vs. impact.

Just to clarify, in OffsetBound case overloads added to + and other operators are "less generic" comparing to the ones added by SIMD which would have lower impact on type-checker performance.

Lantua · October 16, 2019, 10:47pm

Is there a use case for this behavior? It still seems to me this could easily be the source of bugs when the intuitive count of range doesn't match the count of the extracted subsequence. This could easily lead to flawed logic.

let values = [0, 1, 2, 3], x = OffsetBound.start - 1

// Should have 2 elements, but has 1 instead
let subsequence = values[x ..< x + 2]

Another one is when insert just snap to the nearest index when offset is out of range. It's especially spooky when you can do this

var values = [0, 1, 2]
values.insert(-1, at: .start - 3) // Snap to start
print(values) // [-1, 0, 1, 2]

Again, all the functions in RangeReplaceableCollection requires the range expression to contain valid indices, so this behavior is unprecedented.

michelf · October 17, 2019, 12:13pm

Quick Review:

Personally I don't find the current proposal very appealing syntax-wise, but I think the functionality is needed.

I fear a bit overloading the subscript will be confusing because it'll look like you can mix offsets and indices, even though it'll fail miserably:

let a = [1, 2, 3]
let i = a.firstIndex(of: 2)
a[i ... .last] // compile-time error

Now, since it is an Array you'll be able to write this and it'll work:

a[.start + i ... .last] // okay, but only if `a` is an Array

It's quite silly to write this in general and it'll will return wrong results when you try it with ArraySlice. It'll not compile with any collection not using integer indices. Maybe I'm worrying for nothing and people will use a[i ... a.index(at: .last)] in a responsible manner. I guess we'll need to release the feature to see.

I think all this shows that what we really need is range type that can accommodate boundaries of two different types (and not require them to be comparable). Unfortunately changing that would probably be ABI-breaking at many levels.

michelf · October 17, 2019, 12:14pm

To me, this starts to look like a regular expression pattern. If we had some sort of "regular expressions subscript" syntax in the language, maybe we could do things like this:

let subrange = a[3(.*)1]

And then we could forget the idea of adding those things to OffsetBound.

Brainstorming about regular expression matching

First, I know this exact syntax using brackets won't really work because it's ambiguous, but I find it quite interesting to think about this in term of a subscript.

// Regular expression cheat sheet:
// using . to match one element
// using * to indicate we can match the preceding thing zero or more times
// using {3} to indicate the the preceding thing must be matched 3 times
// using parens () to capture a subsequence to be returned

let a = [5, 4, 3, 2, 1]

let subrange = a[.{3}(.*).{3}]
// three elements, captures any number of elements, followed by three elements.
// returns captured subrange or nil if not matched

let subrange = a[3(.*)1]
// matches element 3, captures any number of element, then matches element 1

let s = "54321"

let subrange = s["3"(.*)"1"]
// same as above, except we're matching characters here instead of integers
// (hence the quotes around each element)

We could allow many captures inside a subscript. We could also capture an index, an element, or a subrange based on type inference:

let a = "123456"
let (index, char, substring) = a["1"()"2"(.)"4"(.*)] as (String.Index, String.Element, String.SubSequence)?
assert(index == a.index(after: a.startIndex))
assert(char == "3")
assert(substring == "56")

While this is an interesting thought, developing it further is not really in scope of this proposal review.

Michael_Ilseman · October 18, 2019, 11:22pm

…

Sorry for the delay.

The concern is around clamping behavior vs doing something else across many of these APIs. I propose clamping behavior as being more intuitive and versatile than other alternatives, but there are definitely tradeoffs.

You brought up insert(_:at:). What should the behavior be for something like myCollection.insert(5, at: .first + 10) when myCollection has fewer than 10 elements? Similarly, what should the behavior of myCollection.replaceSubrange(..<(.first+10), with: [1,2,3]) be?

Operation is a nop
Operation traps
Operation clamps

Option #1 is very surprising. If the developer called insert, remove, etc., they expect something to happen. Silently nop-ing is probably the least intuitive alternative.

Option #2 is surprising given that OffsetBound is otherwise a higher level, non-trapping representation of abstract positions within a collection.

And that leave Option #3, where the element(s) will be inserted at the end of the collection, even if the collection is short.

This rationale seems to intuitively extend to new collection APIs for the future, such as move(from:to:).

As for more dubious formulations such as a negative offset from the front of a collection, I think we should pursue some reasonable and consistent behavior for them. But, we shouldn’t significantly change our strategy to accommodate them at the cost of common, sane usage.

For ranges of OffsetBound, there is not an obvious length to the range if the lower bound is relative to the front of the collection and the upper bound relative to the back. It depends on the collection to which it is applied. Clamping in the middle is consistent with the “hole in the middle” interpretation (e.g. that comparison uses). Similarly, there is no run-time or API notion of the distance between two OffsetBounds (even if similarly anchored). As you point out, similarly-anchored offsets could visually imply such a length. However, consistent treatment with differently-anchored offsets means the length is contingent on the length of the collection to which it is applied.

Edit:

Thinking about this more, I think there is a strong argument that a fully-disjoint range or offset supplied to e.g. removeSubrange() should signal the error in some fashion. However, I feel that interior-clamping, that is clamping in the middle for opposing anchors or when only one bound dangles off the end still makes more sense.

Michael_Ilseman · October 18, 2019, 11:23pm

I considered an addition, but I’m not sure how independently useful it is. OffsetBound is fairly opaque, so you would use it as an abstract position in some collection size-permitting or the basis for further offsetting. Can you think of any use cases? It can of course be computed as .start + collection.distance(from: collection.startIndex, to: index), though that is obnoxious if this is a frequent need.

Michael_Ilseman · October 18, 2019, 11:23pm

This is intentional, see the alternative considered

Actually, I think keeping OffsetBound separate will help disambiguate this future scenario. If the collection instance can be inferred in a subscript, then you want to be using the index-receiving subscript instead of the OffsetBound receiving one. If OffsetBound didn't distinguish itself from indices, there would be ambiguities.

Except that the downsides today are sufficient to bar any improvements in this space.

So, I think an alternative that relies on speculative/significant changes in Swift’s type checking is untenable at this point. If we get those type checker improvements, as well as helpful mechanisms such as one to retroactively insert into a stable protocol hierarchy, then we can add a more expressive formulation. I really do want these improvements, but I’m being realistic.

At that point, OffsetBound may end up becoming a more niche type used to represent abstract positions across any collection type, and that’s ok.

Lantua · October 19, 2019, 12:55am

These’s also an option for operation to return (discardable?) Bool indicating whether or not it succeed (and so it would be noop returning false). I do strongly feel that having an api trying to recover from unsound argument is too unsafe for Swift. Then again, it may be tolerable at this level of abstraction.

I’m thinking that an OffsetBound range is valid if the equivalent Index range is valid. So if the api falls back to invalid-argument behavior (whatever that may be), it can be interpreted as either the OffsetBound is invalid, or the collection is too small. This seems to be another behavior that is self-consistent.

I agree that there’s no notion of “length” for offset range with different anchors, but I still see that the same-anchor case to be more common (for extracting a fixed number of element, like argument parsing).

Michael_Ilseman:

I'm still undecided if we should separate offsetFromFirst vs offsetFromLast , but I'll settle with simpler
Collection.offset(of index: Index) -> OffsetBound
for now.
[...] Can you think of any use cases?

I was suggesting this when people started discussing about apis like .find(_:) which would be more appropriate to be an index-level api. So I thought having a way to cross the level of abstraction easily would be nice.

The more I think about it, the more it’d make sense to have the crossing to go from high level of abstraction to lower one, ie. from OffsetBound to Index, which is already part of the proposal.

Lantua · October 19, 2019, 1:02am

Can we use discardable Bool? I vaguely remember seeing it somewhere in the standard library.

Michael_Ilseman · October 19, 2019, 1:40am

Yes, and that would align well with Set.remove. I’ll do a pass over the RRC overloads and see if there’s a consistent story here. Replacement might be harder to express.

lorentey · October 19, 2019, 4:54am

So Michael was kind enough to explain this part of the discussion to me, and it turns out I have Opinions!

There is no single option that fits what I would expect to happen in all cases. It all depends on the exact operation and perhaps even the exact situation at hand. Let's consider some cases for a ten-element collection -- to be specific, let's use the string "collection" as the collection value. (For simplicity, I'm going to treat the string literal "collection" as mutable. This obviously wouldn't compile; but I think this illustrates the operations better.)

Inserting at a given offset from the first/last element

My observation here is that clamping leads to results that seem undesirable for the majority of problems. If I have a line of text and I want to insert the character "#" at position 78 from the start, then the result should either have a "#" at position 78, or I should get an obvious error. If the line I started with only had 34 characters, inserting a "#" at position 34 is not going to fly as a default -- it would almost always lead to corrupt results.

At first glance, this means that inserting outside the boundaries of the collection should trap:

"collection".insert("A", at: .first + 15)   // ⚠️
"collection".insert("B", at: .last + 5)     // ⚠️
"collection".insert("C", at: .first - 5)    // ⚠️
"collection".insert("D", at: .last - 15)    // ⚠️

This won't produce corrupt results, but I agree with Michael that it seems too harsh for an API that is supposed to provide an escape hatch from strict indexing rules.

However, there is an obvious way to fix this, and it's Option #4 -- explicitly supplying a padding element, and automatically growing the collection to achieve the desired size:

"collection".insert("A", at: .first + 15, padding: ".") 
    // ⟹ "collection.....A"
"collection".insert("B", at: .last + 6, padding: ".")
    // ⟹ "collection.....B"
"collection".insert("C", at: .first - 5, padding: ".")
    // ⟹ "C....collection"
"collection".insert("D", at: .last - 14, padding: ".")
    // ⟹ "D....collection"

The results are consistent, predictable and I believe they provide convenient behavior for what I expect to be the most common case with such insertions.

If we want to make it fancy, the padding argument could be an optional sequence -- with the empty sequence standing in for the clamping behavior, and nil for trapping:

"collection".insert("A", at: .first + 15, padding: ".") 
    // ⟹ "collection.....A"
"collection".insert("B", at: .first + 15, padding: ". ") 
    // ⟹ "collection. . .B"
"collection".insert("C", at: .first + 15, padding: "")
    // ⟹ "collectionC"
"collection".insert("D", at: .first + 15)
    // ⚠️ Trying to insert an element at an offset outside of the collection.
    //    Did you mean to specify a padding value?

Removing an element at a given offset from the first or last element

The trivial cases are, of course, trivial.

"collection".remove(at: .first + 0)   // ⟹ "c"   (remainder: "ollection")
"collection".remove(at: .first + 3)   // ⟹ "l"   (remainder: "colection")
"collection".remove(at: .first + 9)   // ⟹ "n"   (remainder: "collectio")
"collection".remove(at: .last - 0)    // ⟹ "n"   (remainder: "collectio")
"collection".remove(at: .last - 3)    // ⟹ "t"   (remainder: "collecion")
"collection".remove(at: .last - 9)    // ⟹ "c"   (remainder: "ollection")

But what if we go off the end of the collection?

"collection".remove(at: .first + 15) // nil    (with the collection unchanged)
"collection".remove(at: .last + 5)   // nil    (with the collection unchanged)

I think this is perfectly fine. Trying to remove an element that could exist but doesn't is not an error. This fits well within the mandate to provide Set/Dictionary-like failable operations on regular collections, and this behavior isn't surprising or confusing to me. Set.remove works the same way when given a valid element that's not in the set.

Things are arguably different when we go below the front, though:

"collection".remove(at: .first - 5) // ???
"collection".remove(at: .last - 15) // ???

Trying to remove an element from position -5 is somewhat like trying to remove the string "foo" from a Set of integers -- it's nonsense. As such, my instinct is that it should probably trap, because it indicates that the programmer forgot to consider an edge case. However, since this is supposed to be a somewhat looser API, I'm okay with a nil return value here, too.

However, Option #3 would clearly be unacceptable to me:

"collection".remove(at: .first - 5) // ⟹ "c" (remainder: "ollection") ?!?!?!
"collection".remove(at: .last + 1000) // ⟹ "n" (remainder: "collectio") ?!?!?!

I don't see any situation where I'd desire this. If I ever come across a problem that needs such results, I'd much rather having to spell this out explicitly, even if it requires slightly more typing.

Inserting a sequence of elements

This is analogous to inserting a single element, and it should work in a similar way. There should be an argument to specify padding.

"collection".insert("FOO", at: .first + 15, padding: ".") 
    // ⟹ "collection.....FOO"
"collection".insert("FOO", at: .last + 6, padding: ".") 
    // ⟹ "collection.....FOO"
"collection".insert("FOO", at: .first - 5, padding: ".") 
    // ⟹ "FOO..collection"

"collection".insert("FOOBAR", at: .first - 7, padding: ".") 
    // ⟹ "FOOBAR.collection"
"collection".insert("FOOBAR", at: .first - 5, padding: ".") 
    // ⟹ "FOOBARcollection"
"collection".insert("FOOBAR", at: .first - 3, padding: ".") 
    // ⟹ "FOOBARcollection"
"collection".insert("FOOBAR", at: .first - 1, padding: ".") 
    // ⟹ "FOOBARcollection"

Note the slight irregularity of the last group of examples. Inserting before the first element rebases the collection, so it behaves somewhat weirdly. However, the result is still intuitively useful, so I'm inclined to accept it.

Removing a range of elements

Again, this is similar to the individual case; we should allow the operation to remove fewer elements than specified. To allow detecting this case (in cases where it actually matters), the operation should return the number of elements that were actually removed from the collection.

"collection".removeRange(.first + 7 ..< .last + 10)
    // ⟹ 3 (remainder: "collect")
"collection".removeRange(.first - 3 ..< .first + 2)
    // ⟹ 2 (remainder: "llection")
"collection".removeRange(.first + 100 ..< .first + 200)
    // ⟹ 0 (remainder: "collection")

Note that the standard name removeSubrange would be slightly misleading here, although the chance of confusion seems minimal, so I wouldn't object to keeping the existing name.

The degenerate case of a negative range is weird, but it seems the least harmful thing is to keep the collection intact:

"collection".removeRange(.first + 7 ..< .last - 7)
    // ⟹ 0 (remainder: "collection")

Replacing a range of elements with a sequence

I think this should behave the same as a remove followed by an insertion, which means it should pad even if the replacement sequence is empty:

"collection".replaceRange(.first + 7 ..< .first + 23, with: "FOO", padding: ".")
   // ⟹ "collectFOO"
"collection".replaceRange(.first + 15 ..< .first + 23, with: "FOO", padding: ".")
   // ⟹ "collection.....FOO"
"collection".replaceRange(.first - 5 ..< .first + 2, with: "FOO", padding: ".")
   // ⟹ "FOOllection"
"collection".replaceRange(.first - 5 ..< .first - 2, with: "FOO", padding: ".")
   // ⟹ "FOO..collection"

"collection".replaceRange(.first + 15 ..< .first + 20, with: "", padding: ".")
   // ⟹ "collection....."  (!!)
"collection".replaceRange(.first - 5 ..< .first - 5, with: "", padding: ".")
   // ⟹ ".....collection"  (!!)

This does mean that replaceRange would not be a universal operation, unlike replaceSubrange in the current RangeReplaceableCollection. I don't think that's a problem.

The degenerate case of a negative range is weird; perhaps this is the one case where trapping would still be the sensible choice. Or if we truly hate the idea of trapping in these APIs, I am fine with doing something arbitrary as long as we're consistent and we properly document it.

"collection".replaceRange(.first + 7 ..< .last - 7, with: "FOO", padding: ".")
    // ⟹ "collectFOOion" ?
    // ⟹ "colleFOOction" ?
    // ⟹ "coFOOllection" ?

What do you think? Am I completely off the rails here?

xwu · October 19, 2019, 12:11pm

lorentey:

"collection".insert("FOOBAR", at: .first - 7, padding: ".")
  // ⟹ "FOOBAR.collection"
"collection".insert("FOOBAR", at: .first - 5, padding: ".")
  // ⟹ "FOOBARcollection"
"collection".insert("FOOBAR", at: .first - 3, padding: ".")
  // ⟹ "FOOBARcollection"
"collection".insert("FOOBAR", at: .first - 1, padding: ".")
  // ⟹ "FOOBARcollection"

I haven’t proceeded past this point, but I’d argue that the result should be different.

It seems more consistent if the result were to be as though there were a sufficient number of padding characters to begin with prior to insertion. In other words, .first - 7 should cause the correct number of padding characters to be inserted; then, the string is rebased with the additional characters (mirroring, if you will, the .last + n example).

Nevin · October 19, 2019, 4:33pm

I do not think we should introduce any new behaviors or collection APIs with this feature. Any code written with offsets should be treated as simply shorthand for the equivalent code that advances an index.

The out-of-bounds behavior should be identical to using indices, the insertion behavior should be identical, and so forth. If new behaviors are desired, then they should be proposed separately and work with indices as well.

Lantua · October 19, 2019, 4:44pm

That's the problem, out-of-bound indices are very hard to created, and is usually tied to logic error where trapping makes sense. Offset on the other hand is easier to be applied to inappropriate collections, so I don't think trapping is desirable.

michelf · October 19, 2019, 6:25pm

I actually agree somewhat with both of these seemingly contradictory statements.

I don't feel like using .first and .last is differentiated enough from the index-based API to justify a radically different behavior. It was said earlier that the offset-based accessors are a higher-level API, but you don't feel that at all when reading the code as they look exactly the same except for those mysterious constants .first and .last. Relying on type inference alone to distinguish between trapping and clamping behavior makes me uncomfortable.

If the API was named differently (for instance by adding the word "offset" in functions and subscripts), then it'd be distinctive and could adopt different behaviors without fear of being too confusing. Alternatively, if the constants were .firstOffset and .lastOffset that could probably work too.