Shorthand for Offsetting startIndex and endIndex

Hello, Swift-Evolution

Using collection indexes are a bit of a bother when you want to do simple slicing and a type is not indexed by Int.

For a simple example take the code here:

let s = "Hello, Swift"
let m = s[...s.index(s.startIndex, offsetBy: 4)]

The intent of advancing startIndex gets a bit muddled.

So to ease this I think we could add startIndex(offsetBy:) and endIndex(offsetBy:) to Collection .

Making the example above:

let s = "Hello, Swift"
let m = s[...s.startIndex(offsetBy: 4)]

Thoughts?

4 Likes

What about a index(atOffset:)?

let s = "Hello, Swift"
let m = s[...s.index(s.startIndex, offsetBy: 4)]

becomes

let s = "Hello, Swift"
let m = s[...s.index(atOffset: 4)]

Even better...

let s = "Hello, Swift"
let m = s[...4]

:wink:

5 Likes

There is also a key path based alternative at Make offset index available for String - #19 by Karl

Only viable if there is some adornment. If subscripts could enforce labels, then the following would work:

let s = "Hello, Swift"
let m = s[offset: 0...4]

edit: they can! see below

1 Like

Which index is it offsetting? It's not obvious that it's startIndex.

My example would become:

let m = s[...s.index(\.startIndex, offsetBy: 4)]

Which I don't think helps since this isn't necessarily about having less to type.

I think this suffers from the same thing index(atOffset:) suffers from. And the idea of adding a range to a scalar is a bit odd

startIndex + a...b == (startIndex + a)...(startIndex + b)

BTW, just as a PSA, if you are always anchored at the start or end of a String, you can use prefix / suffix / dropFirst / dropLast:

let str = "abcdefghijkl"

str.prefix(4) // abcd
str.suffix(2) // kl
str.dropFirst(3) // defghijkl
str.dropLast(7) // abcde

I'm going to assume you want to address the more general problem of offset-based slices, which may be entirely in the middle of a String.

1 Like

Yes, that's why I also want to add endIndex(offsetBy:). I should have probably used a better example.

index(atOffset:) is a method on String, not on String.Index. It's for fetching indices from a collection, not modifying an existing index. index(_:offsetBy:) and index(after/before:)also fetch indices, in this case relative, to a given index.

In every sense I can think of, subscripts can enforce labels. What am I missing?

2 Likes

I may be holding it wrong, but here's what I get:

edit: Nevermind, need to provide the label and local name. This works:

extension String {
    subscript(offset offset: Int) -> Character {
        return self[self.index(self.startIndex, offsetBy: offset)]
    }
}

let str = "abcdefghij"

str[offset: 2]// <--- returns "c"
//str[2] // <--- compilation error
1 Like

Sure, or just skip the label..

extension String {
    // [i]
    subscript(i: Int) -> Character {
        let index = self.index(self.startIndex, offsetBy: i)
        return self[index]
    }

    // [i..<j]
    subscript(r: Range<Int>) -> String {
        let i = self.index(self.startIndex, offsetBy: r.lowerBound)
        let j = self.index(self.startIndex, offsetBy: r.upperBound)
        return String(self[i..<j])
    }

    // [i...j]
    subscript(r: ClosedRange<Int>) -> String {
        let i = self.index(self.startIndex, offsetBy: r.lowerBound)
        let j = self.index(self.startIndex, offsetBy: r.upperBound)
        return String(self[i...j])
    }

    // [..<i]
    subscript(r: PartialRangeUpTo<Int>) -> String {
        let i = self.index(self.startIndex, offsetBy: r.upperBound)
        return String(self[..<i])
    }

    // [...i]
    subscript(r: PartialRangeThrough<Int>) -> String {
        let i = self.index(self.startIndex, offsetBy: r.upperBound)
        return String(self[...i])
    }

    // [i...]
    subscript(r: PartialRangeFrom<Int>) -> String {
        let i = self.index(self.startIndex, offsetBy: r.lowerBound)
        return String(self[i...])
    }
}

let str = "abcdefghij"
str[2]
str[...4]
str[1..<4]
// ...

EDIT: or return String.SubSequence, I'm just being lazy.

1 Like

The label is fulfilling a critical role here: it is indicating that this isn't just a range lookup, it's calculating offsets (possibly in linear time, but perhaps in constant time, e.g. in case of a non-zero-based array slice). People weren't proposing it as an implementation detail to make the subscript work.

Yes I know the reasons, I just don't think they pull their weight. We are optimizing for the case where string subscripts are a performance bottleneck and the author has no idea about strings and unicode.

In the meantime people have to jump through enormous amounts of hoops just to get s[1..<5].

1 Like

I've written up a draft proposal for this below. The number of methods I've had to add is unfortunate (see 'Detail Design'). If anyone one has ideas on how to improve this please let me know. Another thought, since IndexDistance is going to be Int, should this only support Countable* ranges and PartialRange(UpTo|Through)?


Offset Range Subscript

Introduction

A collection that has an Index type that cannot be offset independently of its collection can cause overly verbose code that obfuscates one's intent. To help improve this we propose adding subscript(offset:) methods to Collection and MutableCollection that would accept an offsetting range.

Swift-evolution thread: Discussion thread topic for that proposal

Motivation

Working with an index that cannot be offset independently, without its corresponding collection, causes the intent of code to get lost in an overly verbose call site.

As an example; to get a slice of a String, not anchored at the start or end of the collection, one would use the following subscript method:

let s = "Hello, Swift!"
let subject = s[s.index(s.startIndex, offsetBy: 7)...s.index(s.startIndex, offsetBy: 11)]

Proposed solution

A solution we propose to this problem is to extend Collection and MutableCollection with subscript methods that take ranges which would be used to offset the starting index of a collection.

Using the above example, along with our solution, we will be able to write the following.

let subject = s[offset: 7...11]

Detailed design

Extend Collection with implementations of the methods listed below, as well as, add implementations of getter/setter variants to MutableCollection.

subscript(offset offset: ClosedRange<IndexDistance>) -> SubSequence {}
subscript(offset offset: Range<IndexDistance>) -> SubSequence {}
subscript(offset offset: PartialRangeFrom<IndexDistance>) -> SubSequence {}
subscript(offset offset: PartialRangeThrough<IndexDistance>) -> SubSequence {}
subscript(offset offset: PartialRangeUpTo<IndexDistance>) -> SubSequence {}
subscript(offset offset: CountablePartialRangeFrom<IndexDistance>) -> SubSequence {}
subscript(offset offset: CountableClosedRange<IndexDistance>) -> SubSequence {}
subscript(offset offset: CountableRange<IndexDistance>) -> SubSequence {}

Source compatibility

None

Effect on ABI stability

N/A

Effect on API resilience

N/A

Alternatives considered

Add methods to offset startIndex and/or endIndex

Adding convenience methods to offset startIndex and endIndex would help make intent more obvious, however, it still is not ideal.

let subject = s[s.startIndex(offsetBy: 7)...s.endIndex(offsetBy: -2)]

Only add a method to offset startIndex

If we were to include only a startIndex(offsetBy:) we might want to reconsider a rename. One suggested name was index(atOffset:).

Use a KeyPath

Add an index(_:offsetBy:) method that would take a KeyPath as its first argument. This will give us the following usage.

let subject = s[s.index(\.startIndex, offsetBy: 7)..<s.index(\.endIndex, offsetBy: -1)]

While this will shorten code, when the collection instance name is long, it is still relatively verbose.

6 Likes

It would be highly reckless IMO introduce a subscript that is near-identical to subscripts that run in constant time, but that runs in linear time. This would be actively encouraging users into an accidentally-quadratic performance trap.

It would also be inappropriate to make this extension String-specific. An offset-from-start range subscript would be good for all collections (especially things like array and unsafe buffer slices which aren't zero-based), but would need the subscript label because otherwise it would be problematic with integer-indexed collections.

3 Likes

Hi @Letanyan_Arumugam – I think this is a very promising idea to explore, so long as it can be shown that it's obvious at the call site that this subscript is different from regular range-based subscripts.

I'd suggest checking out recent changes to Swift that eliminated IndexDistance (it's now always an Int) and introduced RangeExpression, which ought to make it possible to avoid all the overloads (and the Countable variants ought to be gone too by the time a proposal like this would make it into the std lib).

Also worth checking out some of @beccadax's previous explorations of revising the prefix/suffix zoo, which is in a similar area.

First off, I'm hugely in favor of addressing this issue. Thank you for the pitch!

I would mention the prefix/suffix/drop variants explicitly, then point out why they are not great. They require discovery and/or cognitive load and an unnatural rearranging of the call. For example, something like:

This looks like a viable approach. Have you thought about the convenience of using negative offsets to specify offset-from-the-end? I don't really know how that would look in code, and perhaps it would require a language feature and/or syntactic support. Might be worth mentioning under Alternatives, perhaps as a future direction or open for more consideration.

Thank you for pushing this forward!

Sorry @Letanyan_Arumugam I totally missed the para you put at the top acknowledging the IndexDistance aspect.

Proposals really only need to reflect the very latest Swift on master, which already has eliminated IndexDistance. So for the purposes of a proposal, it is an Int now. And as of this morning, CountableRange is dead too :) so that addresses that.

But further overloading ought to be eliminatable via RangeExpression

1 Like

That would be nice, but also hobbled by how ranges work. In particular, a range requires lowerBound ≤ upperBound, so you can’t express “start at the fourth element and continue through the second-to-last element” as someCollection[offset: 3...-2].

2 Likes