Offset Indexing and Slicing

I like this. I just have one comment and some thoughts on future directions:

Slicing from the start/end is certainly the common case, but some kinds of exotic collections may expose other salient indexes as properties. In my demo gist from a while back, I added an operator which allows you to construct an OffsetBound by adding an integer offset to a KeyPath. Would something similar be possible with this?

Some thoughts on future directions:

  • Optimising offsets

    I think it would also be interesting to consider a protocol requirement on Collection which 'optimises' an OffsetBound - e.g. if the Collection knows its own count in constant-time, it might decide to transform an OffsetBound of the form .start + x to the form .end - (count - x) if x > (count/2). This would be useful for BidirectionalCollections which cache their count (e.g. String). When you're slicing Strings in LTR languages, it's just easier to write things in terms of offsets from the start (because that's how a human reads it), even if that isn't the most performant way of walking to the desired location.

  • String slicing performance

    I think it's fair to say that the biggest motivating use-case for this is slicing Strings (although things like correcting buggy code with integer-indexed collections are useful, too). I imagine lots of code will look like:

    let something       = myString[.start + 4..<.start + 10]
    let anotherThing    = myString[.start + 12..<.start + 20]
    let yetAnotherThing = myString[.start + 25..<.start + 30]
    

    ...which we can expect to have pretty poor performance because String is not a RandomAccessCollection. I think it's worth considering how we might improve index calculations in general (which particular focus on String). Taking inspiration from @dabrahams post from way back about "upgrading" all Sequences to be Collections (by adding a buffer), I think it would be cool to have a generic wrapper than can upgrade any Collection to be a RandomAccessCollection by adding an index cache (similar to what we do with String's UTF16 breadcrumbs).

    Then, if anybody needs to repeatedly slice a Collection which for some reason can't be a RAC, we can advise them to wrap the data:

    // Add this line to shadow 'myString' with a wrapped version.
    // Performs a single O(n) scan of the String and builds an index cache for later lookups in O(log n).
    let myString = myString.withRandomAccess()
    
    let something       = myString[.start + 4..<.start + 10]
    let anotherThing    = myString[.start + 12..<.start + 20]
    let yetAnotherThing = myString[.start + 25..<.start + 30]
    

    We would want this to also be a protocol requirement, so RACs can simply return themselves from withRandomAccess().

1 Like