[Rant] Indexing into ArraySlice

taylorswift · July 6, 2018, 6:53pm

A long time ago, we thought of taking advantage of this behavior to implement partial-buffer operations on UnsafeMutableBufferPointer and relatives. However, it turns out the slice types are actually heavier than the original, since they have to both retain a reference to the original (in the case of UnsafeMutableBufferPointer, the entire value) and keep track of two range pointers. So, four pointers need to be stored, even though only two pointers worth of information is there.

palimondo · July 6, 2018, 7:03pm

I’ll just drop a link to related discussion about relative indexing here for reference…

hlovatt · July 6, 2018, 10:47pm

I have found slicing sharing indices to be confusing because of two scenarios:

Passing to functions that use 0 instead of startIndex. Yes its a bug, but there is plenty of code out there that do this.
Porting code from other languages. The Swift choice is unusual and therefore easy to mis that the array is sliced and the index is now incorrect.

Was this a good design decision? How on earth can it be changed now? I think we just have to be aware of the differences and remember to add startIndex everywhere.

hlovatt · July 6, 2018, 10:53pm

I often want to slice the array by the number of processors, split into number-of-processors roughly equally sized chunks. Whilst you can do this with any indexing system, many languages start at 1 for example and Swift starts at startIndex, it is a lot simpler if they all start at 0.

taylorswift · July 7, 2018, 1:14am

The problem lies in that

let array1 = [0, 1, 2], 
    array2 = array1[1...], 
    array3 = array2.map{ $0 }

have different indices. There’s a lot of things in Swift that I don’t like but this one is one of the few I consider to be actively harmful. Also if you hop one thread over you’ll find a whole bunch of FP-fans who would very much like slices and maps to be mentally transparent. This destroys that.

hooman · July 7, 2018, 1:47am

Thank you bringing this up.

We need to find a nice and lightweight syntax for working with offsets (as opposed to indexes).

Offsets are always integer, and start at zero regardless of the actual index type or startIndex. We also don't need to involve the collection itself for offset calculation. It should be pretty cheap to use them with true random access collections. We should focus on solving this problem, instead of messing with slices. They already work the way they should.

I am very busy these days and don't have time to actively participate, but I will try to share some ideas about this later this weekend.

taylorswift · July 7, 2018, 2:01am

A lot to unpack here

Integer indexing into array-ish types is dangerous.
startIndex-based indexing is the “correct” way to index.
Since integer indexing is dangerous™, it should be deprecated and removed (just like we did with String).
Now, everyone is using [foo.index(foo.startIndex, offsetBy: i)] as they should
most of “[foo.index(foo.startIndex, offsetBy: i)]” is redundant and never changes
come to think of it, the integer i is the only thing we ever care about
we should shorthand/sugar this so that [i] is the only thing we should have to write

which is basically the same thing as making all slices index zero-based.

hooman · July 7, 2018, 2:14am

To better illustrate what I mean:

extension RandomAccessCollection {
    subscript (o offset: Int) -> Element {
        get { return self[self.index(self.startIndex, offsetBy: offset)] }
    }
}

extension RandomAccessCollection where Self: MutableCollection {
    subscript (o offset: Int) -> Element {
        get { return self[self.index(self.startIndex, offsetBy: offset)] }
        set { self[self.index(self.startIndex, offsetBy: offset)] = newValue }
    }
}

let a = ["a","b","c","d","e","f","g"]

let b = a[2...]

for i in 0..<b.count {
    print(b[o:i])
}

Here we don't need to care about the actual index type and where it starts.

timv · July 7, 2018, 5:04am

And now all your subscripts are O(n) (for non-RandomAccessCollections). That's just begging people to write code such as

for i in 0..<string.count {
    let c = string[i]
    // ...
}

hlovatt · July 7, 2018, 5:49am

Sure. Tons of stuff like reverse falls into this category also. If I slice an array I get the same indices, if I reverse it there is a new set. I would prefer slice etc to have there own index and for int based indexes that the indexing always starts at 0.

@hooman ‘s suggestion might be a way forward.

hlovatt · July 7, 2018, 5:59am

I think the main issue is that subscript should, in hindsight, only apply to random access collections. There is a tension between iterating and random access at the moment, it would be nice if we could split all the concepts into seperate protocols - but probably too late .

hooman · July 8, 2018, 1:13am

Here is a bit more of the design I have in mind:

private extension RandomAccessCollection {
    //This function is the core of the design I am proposing:
    func _index(_ offset: Int) -> Self.Index {
        // Rule: Positive offsets are always relative to `startIndex`,
        // while negative offsets are always relative to `endIndex`.
        let base = offset >= 0 ? startIndex : endIndex
        return index(base, offsetBy: offset)
    }
}

// We need our own range, because standard library ranges are not compatible with this design:
public struct OffsetRange {
    public let start: Int
    public let end: Int
}

// Only (partial) closed ranges make sense in this design:
public prefix  func ... (rhs: Int) -> OffsetRange           { return OffsetRange(start: 0, end: rhs) }
public postfix func ... (lhs: Int) -> OffsetRange           { return OffsetRange(start: lhs, end: -1) }
public         func ... (lhs: Int, rhs: Int) -> OffsetRange { return OffsetRange(start: lhs, end: rhs) }

public extension RandomAccessCollection {
    subscript (o offset: Int) -> Element {
        get { return self[_index(offset)] }
    }
    subscript(o range: OffsetRange) -> Self.SubSequence {
        get { return self[_index(range.start) ... _index(range.end)] }
    }
}

public extension RandomAccessCollection where Self: MutableCollection {
    subscript (o offset: Int) -> Element {
        get { return self[_index(offset)] }
        set { self[_index(offset)] = newValue }
    }
    subscript(o range: OffsetRange) -> Self.SubSequence {
        get { return self[_index(range.start) ... _index(range.end)] }
        set { self[_index(range.start) ... _index(range.end)] = newValue }
    }
}

let a = ["a","b","c","d","e","f","g"]
let b = a[2...]
print("b=\(b)")

// Forward offset:
for i in 0..<b.count { print(b[o: i], terminator: " ") }
print()

// Backward offset (from endIndex):
for i in -b.count..<0 { print(b[o: i], terminator: " ") }
print()

// Offset range:
print("b[o: 1...(-2)]=\(b[o: 1...(-2)])") // Note that -1 gives us the last element.
print("b[o: 1...3]=\(b[o: 1...3])")
print("b[o: -4...(-2)]=\(b[o: -4...(-2)])") // We can define `...-` to eliminate `()`.
print("b[o: -4...3]=\(b[o: -4...3])")
// Offset partial ranges:
print("b[o: 2...]=\(b[o: 2...])")
print("b[o: ...(-3)]=\(b[o: ...(-3)])")

pointum · July 8, 2018, 10:44am

Thank you for bringing this up. Swift got me used to using indices for Strings, but I’d assume Arrays and Data would be 0-based. In a short while after stumbling on this discussion I happened to deal with Data.subdata for processing raw file format. I’m glad I was ready for dealing with non-zero based indices, saved me a lot of head scratching.

Nevin · July 8, 2018, 6:22pm

I notice you must use parentheses here. Have you considered a second operator spelled ...- that would let you omit the parentheses?

• • •

Another possibility would be to emulate the Matlab-esque “end” by introducing simple types called Start and End. Then, by providing suitable operators, we could write things like:

someCollection[Start()+3 ... End()-2]

DevAndArtist · July 8, 2018, 6:33pm

I was wondering what was so special about this code snippet and tried it out in playground. I was surprised that playground emits different informations for the map operation depending on how you write that code block.

let array1 = [0, 1, 2],        //
    array2 = array1[1...],     //
    array3 = array2.map { $0 } // (2 times)
let array4 = array2.map { $0 } // (3 times)

paiv · July 8, 2018, 6:46pm

Try this:

let array4 = array2.map
    { $0 }  // (2 times)

DevAndArtist · July 8, 2018, 6:49pm

Seems like a weird playground bug, right?

hooman · July 8, 2018, 6:51pm

Thanks for your feedback.

I mentioned the possibility of adding ...- operator in a comment a couple of lines down.

Another possibility would be to emulate the Matlab-esque “end” by introducing simple types called Start and End . Then, by providing suitable operators, we could write things like:
someCollection[Start()+3 ... End()-2]

I brought this particular minimalist design up to avoid over-complicating it. As @palimondo mentioned up-thread, and If you remember, a through discussion of your suggested approach and multiple other possibilities is explored in this thread:

I brought this up to show that with random access collections like Array (or its slice) we don't really have to worry about the actual indexes, because offset always gives us a zero based integer "index" into the collection. If added to the standard library, this approach can be completely optimized away in cases like Array (and its slice) and produce identical performance characteristics as using the normal index APIs.

The designs explored in the above thread, drifted towards overly complex solutions, and I think that is why the thread died out without any tangible result and conclusion.

I am trying to revive that idea by showing how it can help here, and show that it can be pretty useful even with a minimalist design. (I don't like the fact that I had to make a custom type just to represent offset ranges)

My issue with this approach is that I don't like labeled subscripts. That is why I reduced the label to o: to make it less distracting. Maybe if we had true callable types we could use parenthesis for relative indexes (offsets). For example, instead of b[o: 2...] we could write b(2...). Don't yell at me for this outlandish suggestion, it does have precedent in programming languages.

taylorswift · July 8, 2018, 6:59pm

Isn’t all you’re trying to do to add the words “start”, “end”, and “offset” into the call site somewhere? I don’t think you can do that and achieve a concise syntax at the same time.

hooman · July 8, 2018, 7:08pm

No, please look at the comments around my _index() function above. I don't want to address offsets relative to any index other than startIndex and endIndex. I don't need to refer to them explicitly: Negative offsets only make sense for endIndex while offsets >=0 only make sense for startIndex. So I only need the label to distinguish it from the normal index API.