Adding 0-based integer signed offset and offset range subscripts to `RandomAccessCollection`

John_McCall · November 18, 2022, 9:38pm

Moderator note: by request, I have extracted this thread from posts in a different thread. The thread picks up from the following quote, which has been removed from its original post:

John_McCall · November 18, 2022, 9:42pm

A post was merged into an existing topic: [API Review for 1.1] Hash-Array Mapped Prefix Trees

Nobody1707 · November 18, 2022, 5:45pm

I don't agree that the standard library should have such subscripts, but one of the nice things about the existing designs is that you can add them yourself.

extension RandomAccessCollection {
    subscript(offset n: Int) -> Element {
        _read { // or get
            precondition(n >= 0 && n < count)
            let i = index(startIndex, offsetBy: n)
            yield self[i] // or return self[i]
        }
    }
}

extension RandomAccessCollection where Self: MutableCollection {
    subscript(offset n: Int) -> Element {
        _read { // or get
            precondition(n >= 0 && n < count)
            let i = index(startIndex, offsetBy: n)
            yield self[i] // or return self[i]
        }
        _modify { // or set
            precondition(n >= 0 && n < count)
            let i = index(startIndex, offsetBy: n)
            yield &self[i] // or self[i] = newValue
        }
    }
}

The biggest issue is that you have to define the getter again for the MutableCollection version. Set only subscripts would make this easier. I've left the range versions as an exercise for the reader.

hooman · November 18, 2022, 8:57pm

I mentioned signed offset; negative offsets would be relative to the end of the collection. Also, Range would not work for the signed interpretation of offsets (a[1..<-1] would be valid and drop the first and the last elements of a). We would need a new type more akin to partial range types, since its interpretation depends on the collection it is applied to.

An offset: label would make that subscript useless. Instead, all collection Index types should have been kept opaque and that (opaqueness) should have been part of the semantic requirements of the collection protocols (with the only "constructors" for them being Start/EndIndex plus index manipulation APIs). This means, Int subscript would only mean offset and nothing else.

Then having this offset subscript in the standard library would let people mostly use some RandomAccessCollection instead of Array while still being able to (almost) keep using integer subscripts as they use Int array indexes today.

Then we would be able to provide some factory and adapter APIs (something like lazy property) that would make it possible to have "arrays" with different performance characteristics (that are not stored as contagious sequence of elements) while keeping the same API as int-indexed simple arrays. We could also provide dynamically adaptive/self-optimizing variants for the cases when we can't determine in advance what specific backing data structure would be the most efficient, or the cases when different operations of the same logical collection would need different representations. Those dynamic variants would have higher baseline overhead, but far better worse case performance.

Note that where possible @lorentey have (appropriately) used integer indexes for array-like structures, but we don't have an umbrella protocol that unifies integer-indexed (mostly) random access collections, so we immediately loose those integer indexes when we step into generic code.

lorentey · November 18, 2022, 9:04pm

This isn't the appropriate topic for discussing extensions on RandomAccessCollection.

This topic is for discussing the proposed persistent hashed collection types, which do not even conform to that protocol.

John_McCall · November 18, 2022, 9:11pm

I agree, this seems completely off-topic. @hooman, is there an existing thread you’d like me to move these posts to, or should I start a new one?

hooman · November 18, 2022, 9:24pm

Apologies, I was supposed to open a new thread, but just replied to @Nobody1707 on this thread. My bad, please move it to a new thread. Thanks.

mickeyl · November 19, 2022, 6:37am

Negative subscription indices make my Python programs so much more clear, I‘d really love to see that in Swift as well.

jrose · November 19, 2022, 6:26pm

Y’all, this ship has sailed. ArraySlice and Data will always have integer indices that don’t start at 0. You may no longer think it’s the best design—I no longer think it’s the best design—but it is too late to change it. Any new subscript available on all RandomAccessCollections will need a label, or will have to use a different operator for the range.

hooman · November 19, 2022, 8:04pm

I know, it just accidentally came to foreground. It is part of my Swift regrets. I wish we could have a massively source breaking release one day to address everything we’ve learned along the way. Could we make an AI bot to automatically check and migrate code to Swift: TNG?

On the other hand, how about making an experimental/research fork of the language as a sandbox to tryout some of these to see if they are actually the answer to these grievances.

scanon · November 19, 2022, 9:09pm

This is obviously why we should abuse callAsFunction to provide integer indexing and finally eliminate the last vestiges of the hated square brackets.

tera · November 20, 2022, 1:33am

My only dislike – lack of symmetry between functions and subscripts: 1) no named subscripts 2) no setters in functions and 3) inconsistent parameter names' rules between the two:

func foo(extName intName: Int)  – subscript(extName intName: Int) ✅
func foo(_ intName: Int)  – subscript(_ intName: Int)             ✅
func foo(_ intName: Int)  – subscript(intName: Int)               🐞

Same as in C (actually even better) there's a nice symmetry in type declaration and usage:

[Int]          foo[0] = 1     [0, 1, 2]
[String: Int]  foo["0"] = 0   ["0" : 0]

We use () for array / dictionary subscripts – that symmetry is broken. Not the end of the world of course.

ksluder · November 20, 2022, 3:57am

I am strongly opposed to negative indices. A logic error that results in too large of a subtrahend should not silently operate on an unintended element. That sounds like the punchline of an information disclosure vulnerability report.

filip-sakel · November 20, 2022, 7:11pm

What would even be the benefit of opaque index types if we ultimately used Int indexing. Couldn’t we just drop the Index associated requirement and require an Int subscript?

bbrk24 · November 20, 2022, 8:35pm

Index is a requirement of Collection in general, so any changes to that would affect things like dictionaries and strings. This subscript would only be applied to RandomAccessCollection, which doesn’t include those types. To add the restriction that Index == Int to any collection protocol that doesn’t already have it would also be source-breaking.

filip-sakel · November 20, 2022, 11:48pm

Makes sense. And to be clear I wasn’t suggesting we fix the Index type on RandomAccessCollection, I was just pondering what an ideal API would look like.

lorentey · November 23, 2022, 11:13pm

I very strongly agree.

SE-0265, Offset-Based Access to Indices, Elements, and Slices made much work towards a nicer way of doing this that would apply to all Collection types. I find the proposal, the subsequent review discussion, and (especially) the Core Team's notes are all extremely insightful, and very relevant to this topic.

filip-sakel · November 24, 2022, 6:43am

My concern with that feature is compile-time performance. Type overloading is used both on the collection subscript and with the operators on the offset type. Also the ‘.first’ and ‘.last’ properties could be hard to resolve in some contexts, resulting in hard-to-understand errors. I agree that negative indices aren’t great but consider something like a subscript label to be a better solution, e.g. “hi”[offset: 1 … .negative(1)]

hooman · November 29, 2022, 3:35pm

Apologies for the late reply. I am really busy as always.

With what I have in mind, negative offsets won't pose a security risk. A negative offset is anchored to the end of the collection and a positive offset to the start of it. Meaning: They are distinct types. Offset is exactly like a discriminated union (enum) with discriminator being the sign. Only a limited subset of integer arithmetic makes sense on an offset and such operations can't flip its direction/anchor.

A signed Offset is expressible by an integer, but it is not Int. I think the best description would be some code. Here is my prototype Offset:

public struct Offset: ExpressibleByIntegerLiteral, 
                      LosslessStringConvertible, 
                      Hashable, Comparable, Strideable 
{
    static var first: Offset { return Offset(0) }
    static var last: Offset { return Offset(-1) }
    
    var isFromStart: Bool { value >= 0 }
    var isFromEnd: Bool { value < 0 }
    
    fileprivate init(_ value: Int) { self.value = value }
    fileprivate var value: Int
}

Along with these operators for convenience:

public func +(lhs: Offset, rhs: Int) -> Offset { lhs.advanced(by: rhs) }
public func -(lhs: Offset, rhs: Int) -> Offset { lhs.advanced(by: -rhs) }
public func -(lhs: Offset, rhs: Offset) -> Int { lhs.distance(to: rhs) }

I choose to only add it to RandomAccessCollection to prevent performance surprises for novice programmers. Here is the core API (using @scanon's trick):

public extension RandomAccessCollection {
    func baseIndex(of offset: Offset) -> Index { offset.isFromStart ? self.startIndex : self.endIndex }
    func index(of offset: Offset) -> Index { self.index(baseIndex(of: offset), offsetBy: offset.value) }
    func callAsFunction(_ offset: Offset) -> Element { return self[index(of: offset)] }
    func forwardOffset(of index: Index) -> Offset { Offset(distance(from: startIndex, to: index)) }
    func backwardOffset(of index: Index) -> Offset {
        precondition(index < endIndex, "Backward (negative) offset for `endIndex` is undefined.")
        return Offset(distance(from: endIndex, to: index))
    }
    func oppositeOffset(_ offset: Offset) -> Offset {
        precondition(offset.value >= -count && offset.value < count, "Offset out of range")
        return offset.isFromStart ? Offset(offset.value - count) : Offset(offset.value + count)
    }
}

Using the above API, we can also add some helpers to Offset itself:

public extension Offset {
    func reverse<C>(_ collection: C) -> Offset where C: RandomAccessCollection { collection.oppositeOffset(self) }
    func index<C>(_ collection: C) -> C.Index where C: RandomAccessCollection { collection.index(of: self) }
}

We can also go fancy and define BoundOffset<C> and BoundIndex<C> with corresponding collection.bind(offset) and collection.bind(index) to create them and hide the base collection in the above API and have offset.index and index.offset properties. There is quite a bit of design space to explore around these things.

Here is the above code put together

You can paste this in a playground and try it out.
Warning: Not tested beyond a couple of sample statements at the end. I just quickly typed it to capture the idea.

public struct Offset {
    static var first: Offset { return Offset(0) }
    static var last: Offset { return Offset(-1) }
    
    fileprivate var value: Int
    
    init(_ value: Int) { self.value = value }

    var isFromStart: Bool { value >= 0 }
    var isFromEnd: Bool { value < 0 }
}

extension Offset: ExpressibleByIntegerLiteral { public init(integerLiteral value: Int) { self.init(value) } }

extension Offset: LosslessStringConvertible {
    public init?(_ description: String) {
        guard let value = Int(description) else { return nil }
        self.init(value)
    }
    public var description: String { value.description }
}

extension Offset: Hashable {}
extension Offset: Comparable { public static func < (lhs: Offset, rhs: Offset) -> Bool {lhs.value < rhs.value} }

extension Offset: Strideable {
    public func advanced(by n: Int) -> Offset {
        let result = Offset(value + n)
        precondition(isFromEnd == result.isFromEnd, "Offset moved past limit.")
        return result
    }
    public func distance(to other: Offset) -> Int {
        precondition(isFromEnd == other.isFromEnd, "Distance is undefined for offsets on opposite ends.")
        return value - other.value
    }
}

public func +(lhs: Offset, rhs: Int) -> Offset { lhs.advanced(by: rhs) }
public func -(lhs: Offset, rhs: Int) -> Offset { lhs.advanced(by: -rhs) }
public func -(lhs: Offset, rhs: Offset) -> Int { lhs.distance(to: rhs) }

public extension RandomAccessCollection {
    func baseIndex(of offset: Offset) -> Index { offset.isFromStart ? self.startIndex : self.endIndex }
    func index(of offset: Offset) -> Index { self.index(baseIndex(of: offset), offsetBy: offset.value) }
    func callAsFunction(_ offset: Offset) -> Element { return self[index(of: offset)] }
    func forwardOffset(of index: Index) -> Offset { Offset(distance(from: startIndex, to: index)) }
    func backwardOffset(of index: Index) -> Offset {
        precondition(index < endIndex, "Backward (negative) offset for `endIndex` is undefined.")
        return Offset(distance(from: endIndex, to: index))
    }
    func oppositeOffset(_ offset: Offset) -> Offset {
        precondition(offset.value >= -count && offset.value < count, "Offset out of range")
        return offset.isFromStart ? Offset(offset.value - count) : Offset(offset.value + count)
    }
}

public extension Offset {
    func reverse<C>(_ collection: C) -> Offset where C: RandomAccessCollection { collection.oppositeOffset(self) }
    func index<C>(_ collection: C) -> C.Index where C: RandomAccessCollection { collection.index(of: self) }
}



public struct BoundIndex<C: Collection> {
    let parentCollection: C
    private var value: C.Index
    fileprivate init(index: C.Index, parent: C) {
        parentCollection = parent
        value = index
    }
}

public extension Collection {
    func bind(_ index: Index) -> BoundIndex<Self> { BoundIndex(index: index, parent: self) }
}


var a = ["a","b","c","d","e","f","g","h"]
print(a(.first), a(0), a(2), a(-1), a(.last))

let o = Offset.first + 5
print (o - .first)

// print(a(.first-2)) // Precondition failed: Offset moved past limit.
// print(a(.last+3)) // Precondition failed: Offset moved past limit.
// print(Offset.last-Offset.first) // Precondition failed: Distance is undefined for offsets on opposite ends.

The corresponding offset range types can also be smarter than Range<Int> about what they represent.

ksluder · November 29, 2022, 3:55pm

hooman:

Along with these operators for convenience:

public func +(lhs: Offset, rhs: Int) -> Offset { lhs.advanced(by: rhs) }
public func -(lhs: Offset, rhs: Int) -> Offset { lhs.advanced(by: -rhs) }
public func -(lhs: Offset, rhs: Offset) -> Int { lhs.distance(to: rhs) }

What happens if I do collection[-1 + myInt], when myInt > 0? Does offset.advanced(by:) trap?

Adding 0-based integer signed *offset* and *offset range* subscripts to `RandomAccessCollection`

Adding 0-based integer signed offset and offset range subscripts to `RandomAccessCollection`