Shorthand for Offsetting startIndex and endIndex

This was not the best example, but the point I’m trying to explain is still there. Which is that you can’t use non-literals without explicitly having to tell the compiler to make it an IndexOffset.

I like @xwu enum IndexOffset solution, as it's fully descriptive even though slightly verbose. I think having the "magical rule" for negative Ints mean "backwards from the end", while common, is not that good.

Though I also share the worry of @ben-cohen that a subscript shouldn't hide linear costs. I think that a label on it might not be enough to describe the difference... but at this point, if we require functions, then string[.start(3) ..< .end(2)] is not that different from string.dropFirst(3).dropLast(2), which doesn't have any of the issues listed above.

I like the idea of having a single range, but at this point I wonder if it's a good idea :thinking:

Whilst convenience range accessors to collections are clearly desirable, I struggle to see how adding a label sufficiently denotes the underlying time complexity, which is a key point of the existing collection indexing APIs. I think time might be better spent reviving this proposal to bring the naming of operations like dropLast(_:) into modern Swift style, which accommodates many of the use-cases described in this thread and has more urgency as far as ABI / source stability is concerned.

6 Likes

I understood exactly what you meant :)

I might go so far as to suggest that ExpressibleByIntegerLiteral conformance might not be a great idea, and that even literals might best be written .start(1) ... .end(2).

For Collection, time complexity isn't really reflected in the methods you call, but in the generic constraints you have.

For example, the exact same code that might have linear or quadratic complexity when operating on a String may run in constant time on an Array. The best way to process a Collection depends on how much you know about it. If you're doing lots of index-offsetting and you want stronger complexity guarantees, require that the operation takes a RandomAccessCollection.

But Collections always have constant complexity for subscript, because they use a fully-informed Index type for it. In the case of String, the cost is not in the subscripting but in the offsetting. By "hiding" the offsetting operation inside the subscript, you effectively hide the complexity of it.

The argument against the subscript is exactly this: I believe that in general, at least in the standard library, properties and subscripts are always O(1).

String is an important exception, because Unicode makes things complicated.

  • String.subscript(i: Index) is not guaranteed to be a constant-time operation, because it sometimes needs to find the end of a grapheme cluster.
  • Similarly, String.first/String.last are examples of O(n) properties.

This is a bit surprising to me, I believed that String.Index already had the necessary info for knowing where a grapheme cluster ended.
Also, I can see why .last has that issue (does it get cached though?), but why would .first have it?

String.Index often includes the end of the grapheme cluster, but not always: for example, the index may come from one of the string views.

String does not currently cache the size of its first character, so String.first has to measure it.

(Of course, it all depends on the unit on which we choose to base our performance guarantees.)

1 Like

But we need to be careful about what n is in the different cases.

In those cases Karoy describes, n is the length of the worst-case grapheme (graphemes can be arbitrarily long, but are almost always very short), rather than the length of the string. Whereas the guarantees about accessing an element are usually in terms of the number of elements in the collection.

So .first is linear in terms of the grapheme – it takes longer to get the first Unicode 13 extended-family emoji with grandparents pets and individual skin tones, than it does to get the first letter a. But getting the first(/last) character doesn't take longer on a 5-character string versus a 1 million-character string.

But regarding guarantees for subscripts, they are really just functions with additional capabilities. Putting a label on a subscript to differentiate it is sufficient to distinguish it from the O(1)-guaranteed directly-indexed element and slicing subscripts IMO.

Of course they are, like properties are really just functions with no arguments , or even better subscripts with no arguments (doesn't really work that well since there are multiple return types).
The difference I would like to emphasise is on the semantical level: subscripts/properties do quick and direct access to the type's data, while functions perform computations/cause side-effects.
This is the way I usually decide between the two, unless compiler limitations kick in (e.g. throwing subscripts), but of course I know it's in no way a guarantee of sorts. Though I've seen that the standard library more often than not follows this practice as well.

What about if you have a LazyFilterCollection? Almost every operation is O(n).
What about keypaths? A single subscript operation could traverse anywhere down the object-graph.
What about operators? What performance feelings do you get from them?

As you say, basically everything can be seen as a function of some sort, and we don't have guarantees about complexity based on how you call the function. We already have a way to encode that information: it's modelled by conformance to Collection, BidirectionalCollection, and RandomAccessCollection.

1 Like

Personally, I would prefer it not conforming.

Another concern I have, which may seem petty, but it's a niggle we would have to live with. Is that to use a prefix operator on this you have to have the type name xs[offset: ...IndexOffset.end(k)]. This sort of inconsistency is very jarring to me.

Nah, just use parentheses.

Since it seems like most people would appreciate a more explicit syntax, which would mean having labels, for startIndex and endIndex. Why don't we try to include other indices as well?

Heres an implementation of what this might look like (the IndexDistance constraints are there because this is in Swift 4.0):

enum IndexOffsetBase<C: Collection> where C.IndexDistance == Int {
  case keyPath(KeyPath<C, C.Index>)
  case index(C.Index)
}

struct IndexOffset<C: Collection> where C.IndexDistance == Int {
  var base: IndexOffsetBase<C>
  var offset: Int
  
  func next() -> IndexOffset {
    return IndexOffset(base: base, offset: offset + 1)
  }
}

func +<C: Collection>(lhs: C.Index, rhs: Int) -> IndexOffset<C> {
  return IndexOffset(base: .index(lhs), offset: rhs)
}
func -<C: Collection>(lhs: C.Index, rhs: Int) -> IndexOffset<C> {
  return IndexOffset(base: .index(lhs), offset: -rhs)
}

func +<C: Collection>(lhs: KeyPath<C, C.Index>, rhs: Int) -> IndexOffset<C> {
  return IndexOffset(base: .keyPath(lhs), offset: rhs)
}
func -<C: Collection>(lhs: KeyPath<C, C.Index>, rhs: Int) -> IndexOffset<C> {
  return IndexOffset(base: .keyPath(lhs), offset: -rhs)
}

extension Collection where IndexDistance == Int {
  func _index(_ io: IndexOffset<Self>) -> Index {
    switch io.base {
    case let .keyPath(path):
      return index(self[keyPath: path], offsetBy: io.offset)
    case let .index(i):
      return index(i, offsetBy: io.offset)
    }
  }
  
  func _indexRange(_ io: (IndexOffset<Self>?, IndexOffset<Self>?)) -> Range<Index> {
    let start = io.0 != nil ? _index(io.0!) : startIndex
    let end = io.1 != nil ? _index(io.1!) : endIndex
    
    return start..<end
  }
}

extension Collection where IndexDistance == Int {
  subscript(offset io: (IndexOffset<Self>?, IndexOffset<Self>?)) -> SubSequence {
    return self[_indexRange(io)]
  }
}

extension RangeReplaceableCollection 
where IndexDistance == Int, SubSequence: Collection {
  subscript(offset io: (IndexOffset<Self>?, IndexOffset<Self>?)) -> SubSequence {
    get {
      return self[_indexRange(io)]
    }
    set {
      replaceSubrange(_indexRange(io), with: newValue)
    }
  }
}

func ..< <C>(lhs: IndexOffset<C>, rhs: IndexOffset<C>) 
  -> (IndexOffset<C>?, IndexOffset<C>?) {
  return (lhs, rhs)
}

func ... <C>(lhs: IndexOffset<C>, rhs: IndexOffset<C>) 
  -> (IndexOffset<C>?, IndexOffset<C>?) {
  return (lhs, rhs.next())
}

prefix func ..< <C>(bound: IndexOffset<C>) -> (IndexOffset<C>?, IndexOffset<C>?) {
  return (nil, bound)
}

prefix func ... <C>(bound: IndexOffset<C>) -> (IndexOffset<C>?, IndexOffset<C>?) {
  return (nil, bound.next())
}

postfix func ... <C>(bound: IndexOffset<C>) -> (IndexOffset<C>?, IndexOffset<C>?) {
  return (bound, nil)
}

let s = "Hello, Swift!"

let i = s.index(of: "!")!

let y = s[offset: \.startIndex + 7 ... i - 1] // Swift

Yes, this seems similar to the IndexExpression demo I posted in the previous thread, with a range operator. I think that might be the best way to go.

It's a very minimal syntax, but after DynamicMemberLookup got accepted, the need for very explicit syntax has disappeared IMO. I don't believe it is possible to simultaneously say that it's too confusing to mix a minimal offset syntax with a subscript on Collection, but that I can call non-existent members with dot syntax on variables which happen to conform to DML.

Either these kind of syntax affordances are generally acceptable, and you should look to generic constraints to guide your understanding of what objects do, or they are not and everything in the language should be very explicit.

There is a + operator. That's all the indication you need that you're not accessing directly by index.

So sorry I completely forgot I saw this. I must of been subconsciously influenced. :worried:

Not at all; it's on a public forum for anybody to use. I was just commenting that I was looking at a similar design, but I was a bit worried it wouldn't get accepted because it might be too minimal. But these days I'm less worried about that.

The operators that take an Index and IndexDistance are problematic for non-RandomAccess collections where both of those types are, eg., Int.

This operator from your demo:

func + <C: Collection>(idx: C.Index, distance: C.IndexDistance) -> IndexExpression<C> {
  return IndexExpression(base: .index(idx), distance: distance)
}

And the nearly-identical operator from Letanyan’s post:

func +<C: Collection>(lhs: C.Index, rhs: Int) -> IndexOffset<C> {
  return IndexOffset(base: .index(lhs), offset: rhs)
}

Will not actually get called in an expression like c[c.startIndex + 4] if c uses Int indexes. That is totally fine if c is RandomAccess such as Array, but if not then it is a problem.

I never suggested removing the offset label in the subscript. I still thinks it’s important to have it.